Assembly view pointer

preface

Pointers are often used in development to access variables and modify variable values. How does the compiler translate pointers or what are pointers at the assembly level

mov and lea instructions

Pointer analysis is inseparable from these two instructions. Take a simple look at the mov instruction

mov instruction

When the GNU assembler outputs at & T assembly, it adds a dimension to the mov instruction, in which the length of the data element to be transmitted must be declared
Therefore, the instruction becomes as follows:
movx
Where x can be the following characters:
1, q for 64 bit 4-word value
2, l long word value for 32 bits
3, w for 16 bit word value
4, b byte value for 8 bits

Source code

- (void)asm_point {
    int a = 6;
}

At & T compilation

The syntax order of the at & T assembly and Intel assembly output by the GUN assembler is opposite

YangASM`-[ViewController asm_point]:
    0x10ed88ee0 <+0>:  pushq  %rbp
    0x10ed88ee1 <+1>:  movq   %rsp, %rbp
    0x10ed88ee4 <+4>:  movq   %rdi, -0x8(%rbp)
    0x10ed88ee8 <+8>:  movq   %rsi, -0x10(%rbp)
    0x10ed88eec <+12>: movl   $0x6, -0x14(%rbp)
->  0x10ed88ef3 <+19>: popq   %rbp
    0x10ed88ef4 <+20>: retq   

Assembly conversion memory address

(lldb) p &a
(int *) $4 = 0x00007ffee0e7503c
(lldb) register read rbp
     rbp = 0x00007ffee0e75050
(lldb) p/x 0x00007ffee0e75050-0x8
(long) $5 = 0x00007ffee0e75048
(lldb) p/x 0x00007ffee0e75050-0x10
(long) $6 = 0x00007ffee0e75040
(lldb) p/x 0x00007ffee0e75050-0x14
(long) $7 = 0x00007ffee0e7503c
(lldb)   

analysis

```c
YangASM`-[ViewController asm_point]:
	// rbp = 0x00007ffee0e75050
	// Enable ASM_ Function stack of point function
    0x10ed88ee0 <+0>:  pushq  %rbp
    0x10ed88ee1 <+1>:  movq   %rsp, %rbp
    // Above we get - 0x8 (% RBP) P / X 0x00007ffee0e75050 - 0x8 = 0x00007ffee0e75048
    // From movq, you can see that you need to occupy an interval of 8 bytes of memory
   	// Then this instruction is the rdi value stored in the next 8 bytes starting from 0x00007ffee0e75048
    0x10ed88ee4 <+4>:  movq   %rdi, -0x8(%rbp)
    
    // The value of rsi is stored in 8 bytes from 0x00007ffee0e75040
    0x10ed88ee8 <+8>:  movq   %rsi, -0x10(%rbp)
    
    // int a = 6
    // Store 6 in the interval of 4 bytes from 0x00007ffee0e7503c
    // movl sees that it needs to occupy a 4-byte interval and 4 bytes of int type 
    0x10ed88eec <+12>: movl   $0x6, -0x14(%rbp)
    
    // Reclaim ASM_ Function stack space of point function
->  0x10ed88ef3 <+19>: popq   %rbp
    0x10ed88ef4 <+20>: retq  

The above compilation has movl and movq

In other words, in at & T assembly, mov instructions appear in the form of movx

Indirect addressing

In the above assembly, movq% RDI, - 0x8 (% RBP) has brackets and - sign

The meanings are as follows:

  1. movl %ebx, %edi
    The value in the ebx register is loaded into the edi register

  2. movl %ebx, (%edi)
    edi with brackets is to pass the value in ebx register to the memory address contained in edi register

  3. movl %ebx, 4(%edi)
    Store the value in the edx register in a memory location of 4 bytes after the position pointed to by the edi register

  4. You can also store it in the opposite direction
    movl %ebx, -4(%edi)
    Store the value in the edx register in a memory location 4 bytes before the position pointed to by the edi register

Memory layout:

The above assembly code draws the memory layout as follows

leq instruction

leq is followed by an address, which is directly given to the register
mov is followed by the address, and the data on the address is sent to the register

Source code

- (void)asm_point {
    int a = 6;
    int *p = a;
}

assembly

YangASM`-[ViewController asm_point]:
    0x10da42ee0 <+0>:  pushq  %rbp
    0x10da42ee1 <+1>:  movq   %rsp, %rbp

    // Arrange rdi rsi
    0x10da42ee4 <+4>:  movq   %rdi, -0x8(%rbp)
    0x10da42ee8 <+8>:  movq   %rsi, -0x10(%rbp)

    // Int a = 6 - 0x14(%rbp)
    0x10da42eec <+12>: movl   $0x6, -0x14(%rbp)

    // Say - 0x14(%rbp) this address is assigned to rax
	0x10da42ef3 <+19>: leaq   -0x14(%rbp), %rax

    // The value in rax is stored in - 0x20(%rbp)
    // The value in rax is an address - 0x14(%rbp)  
    0x10da42ef7 <+23>: movq   %rax, -0x20(%rbp)
    
->  0x10da42efb <+27>: popq   %rbp
    0x10da42efc <+28>: retq  

Pointer to modify variable value

After learning leq and mov, let's look at pointer modification

- (void)asm_point {
    int a = 6;
    int *p = &a;
    *p = 12;
}

assembly

YangASM`-[ViewController asm_point]:
    0x10f5e4ed0 <+0>:  pushq  %rbp
    0x10f5e4ed1 <+1>:  movq   %rsp, %rbp
    0x10f5e4ed4 <+4>:  movq   %rdi, -0x8(%rbp)
    0x10f5e4ed8 <+8>:  movq   %rsi, -0x10(%rbp)
    0x10f5e4edc <+12>: movl   $0x6, -0x14(%rbp)
    0x10f5e4ee3 <+19>: leaq   -0x14(%rbp), %rax
    0x10f5e4ee7 <+23>: movq   %rax, -0x20(%rbp)
    0x10f5e4eeb <+27>: movq   -0x20(%rbp), %rax
    0x10f5e4eef <+31>: movl   $0xc, (%rax)
->  0x10f5e4ef5 <+37>: popq   %rbp
    0x10f5e4ef6 <+38>: retq   

Memory address

(lldb) register read rbp
     rbp = 0x00007ffee0619050
(lldb) p/x 0x00007ffee0619050-0x8
(long) $5 = 0x00007ffee0619048
(lldb) p/x 0x00007ffee0619050-0x10
(long) $6 = 0x00007ffee0619040
(lldb) p/x 0x00007ffee0619050-0x14
(long) $7 = 0x00007ffee061903c
(lldb) p/x 0x00007ffee0619050-0x20
(long) $8 = 0x00007ffee0619030
(lldb) 

Memory model

Figure I

 // Schedule RSI RDI   
 0x10f5e4ed4 <+4>:  movq   %rdi, -0x8(%rbp)
 0x10f5e4ed8 <+8>:  movq   %rsi, -0x10(%rbp)

Figure II

 // Int a = 6 - 0x14(%rbp) 
 0x10f5e4edc <+12>: movl   $0x6, -0x14(%rbp)

Figure III

0x10f5e4ee7 <+23>: movq   %rax, -0x20(%rbp)   

Figure IV

 0x10f5e4ee7 <+23>: movq   %rax, -0x20(%rbp)
 0x10f5e4eef <+31>: movl   $0xc, (%rax)  

Complete assembly analysis

YangASM`-[ViewController asm_point]:
    0x10f5e4ed0 <+0>:  pushq  %rbp
    0x10f5e4ed1 <+1>:  movq   %rsp, %rbp
    // Arrange rdi rsi
    0x10f5e4ed4 <+4>:  movq   %rdi, -0x8(%rbp)
    0x10f5e4ed8 <+8>:  movq   %rsi, -0x10(%rbp)

     // Int a = 6 - 0x14(%rbp)
    0x10f5e4edc <+12>: movl   $0x6, -0x14(%rbp)

    // Say - 0x14(%rbp) this address is assigned to rax
    0x10f5e4ee3 <+19>: leaq   -0x14(%rbp), %rax

    // The value in rax is stored in - 0x20(%rbp)
    0x10f5e4ee7 <+23>: movq   %rax, -0x20(%rbp)   

    // Assign the value of - 0x20(%rbp) to rax. Rax now points to - 0x20(%rbp)  
    // It's easy to ask why rax was taken out of memory just now?
    // The above is stored in memory because the source code int * p = & A assembly needs to be completed, and p points to an address
    // This is because * p = 12 appears in the code  
    // *p = 12 is actually two lines of assembly code. First get * p, and the other line is assignment 12   
    // This is for the convenience of rax  
    0x10f5e4eeb <+27>: movq   -0x20(%rbp), %rax

   
    // (% rax) rax plus an inner bracket is to operate the value in this memory area  
    // The movl description is 4 bytes 
    // Change the value of this area to 12 0xc  
    0x10f5e4eef <+31>: movl   $0xc, (%rax)


->  0x10f5e4ef5 <+37>: popq   %rbp
    0x10f5e4ef6 <+38>: retq  

Tags: Assembly Language pointer

Posted on Fri, 19 Nov 2021 08:35:41 -0500 by bguzel