preface
Pointers are often used in development to access variables and modify variable values. How does the compiler translate pointers or what are pointers at the assembly level
mov and lea instructions
Pointer analysis is inseparable from these two instructions. Take a simple look at the mov instruction
mov instruction
When the GNU assembler outputs at & T assembly, it adds a dimension to the mov instruction, in which the length of the data element to be transmitted must be declared
Therefore, the instruction becomes as follows:
movx
Where x can be the following characters:
1, q for 64 bit 4-word value
2, l long word value for 32 bits
3, w for 16 bit word value
4, b byte value for 8 bits
Source code
- (void)asm_point { int a = 6; }
At & T compilation
The syntax order of the at & T assembly and Intel assembly output by the GUN assembler is opposite
YangASM`-[ViewController asm_point]: 0x10ed88ee0 <+0>: pushq %rbp 0x10ed88ee1 <+1>: movq %rsp, %rbp 0x10ed88ee4 <+4>: movq %rdi, -0x8(%rbp) 0x10ed88ee8 <+8>: movq %rsi, -0x10(%rbp) 0x10ed88eec <+12>: movl $0x6, -0x14(%rbp) -> 0x10ed88ef3 <+19>: popq %rbp 0x10ed88ef4 <+20>: retq
Assembly conversion memory address
(lldb) p &a (int *) $4 = 0x00007ffee0e7503c (lldb) register read rbp rbp = 0x00007ffee0e75050 (lldb) p/x 0x00007ffee0e75050-0x8 (long) $5 = 0x00007ffee0e75048 (lldb) p/x 0x00007ffee0e75050-0x10 (long) $6 = 0x00007ffee0e75040 (lldb) p/x 0x00007ffee0e75050-0x14 (long) $7 = 0x00007ffee0e7503c (lldb)
analysis
```c YangASM`-[ViewController asm_point]: // rbp = 0x00007ffee0e75050 // Enable ASM_ Function stack of point function 0x10ed88ee0 <+0>: pushq %rbp 0x10ed88ee1 <+1>: movq %rsp, %rbp // Above we get - 0x8 (% RBP) P / X 0x00007ffee0e75050 - 0x8 = 0x00007ffee0e75048 // From movq, you can see that you need to occupy an interval of 8 bytes of memory // Then this instruction is the rdi value stored in the next 8 bytes starting from 0x00007ffee0e75048 0x10ed88ee4 <+4>: movq %rdi, -0x8(%rbp) // The value of rsi is stored in 8 bytes from 0x00007ffee0e75040 0x10ed88ee8 <+8>: movq %rsi, -0x10(%rbp) // int a = 6 // Store 6 in the interval of 4 bytes from 0x00007ffee0e7503c // movl sees that it needs to occupy a 4-byte interval and 4 bytes of int type 0x10ed88eec <+12>: movl $0x6, -0x14(%rbp) // Reclaim ASM_ Function stack space of point function -> 0x10ed88ef3 <+19>: popq %rbp 0x10ed88ef4 <+20>: retq
The above compilation has movl and movq
In other words, in at & T assembly, mov instructions appear in the form of movx
Indirect addressing
In the above assembly, movq% RDI, - 0x8 (% RBP) has brackets and - sign
The meanings are as follows:
-
movl %ebx, %edi
The value in the ebx register is loaded into the edi register -
movl %ebx, (%edi)
edi with brackets is to pass the value in ebx register to the memory address contained in edi register -
movl %ebx, 4(%edi)
Store the value in the edx register in a memory location of 4 bytes after the position pointed to by the edi register -
You can also store it in the opposite direction
movl %ebx, -4(%edi)
Store the value in the edx register in a memory location 4 bytes before the position pointed to by the edi register
Memory layout:
The above assembly code draws the memory layout as follows
leq instruction
leq is followed by an address, which is directly given to the register
mov is followed by the address, and the data on the address is sent to the register
Source code
- (void)asm_point { int a = 6; int *p = a; }
assembly
YangASM`-[ViewController asm_point]: 0x10da42ee0 <+0>: pushq %rbp 0x10da42ee1 <+1>: movq %rsp, %rbp // Arrange rdi rsi 0x10da42ee4 <+4>: movq %rdi, -0x8(%rbp) 0x10da42ee8 <+8>: movq %rsi, -0x10(%rbp) // Int a = 6 - 0x14(%rbp) 0x10da42eec <+12>: movl $0x6, -0x14(%rbp) // Say - 0x14(%rbp) this address is assigned to rax 0x10da42ef3 <+19>: leaq -0x14(%rbp), %rax // The value in rax is stored in - 0x20(%rbp) // The value in rax is an address - 0x14(%rbp) 0x10da42ef7 <+23>: movq %rax, -0x20(%rbp) -> 0x10da42efb <+27>: popq %rbp 0x10da42efc <+28>: retq
Pointer to modify variable value
After learning leq and mov, let's look at pointer modification
- (void)asm_point { int a = 6; int *p = &a; *p = 12; }
assembly
YangASM`-[ViewController asm_point]: 0x10f5e4ed0 <+0>: pushq %rbp 0x10f5e4ed1 <+1>: movq %rsp, %rbp 0x10f5e4ed4 <+4>: movq %rdi, -0x8(%rbp) 0x10f5e4ed8 <+8>: movq %rsi, -0x10(%rbp) 0x10f5e4edc <+12>: movl $0x6, -0x14(%rbp) 0x10f5e4ee3 <+19>: leaq -0x14(%rbp), %rax 0x10f5e4ee7 <+23>: movq %rax, -0x20(%rbp) 0x10f5e4eeb <+27>: movq -0x20(%rbp), %rax 0x10f5e4eef <+31>: movl $0xc, (%rax) -> 0x10f5e4ef5 <+37>: popq %rbp 0x10f5e4ef6 <+38>: retq
Memory address
(lldb) register read rbp rbp = 0x00007ffee0619050 (lldb) p/x 0x00007ffee0619050-0x8 (long) $5 = 0x00007ffee0619048 (lldb) p/x 0x00007ffee0619050-0x10 (long) $6 = 0x00007ffee0619040 (lldb) p/x 0x00007ffee0619050-0x14 (long) $7 = 0x00007ffee061903c (lldb) p/x 0x00007ffee0619050-0x20 (long) $8 = 0x00007ffee0619030 (lldb)
Memory model
Figure I
// Schedule RSI RDI 0x10f5e4ed4 <+4>: movq %rdi, -0x8(%rbp) 0x10f5e4ed8 <+8>: movq %rsi, -0x10(%rbp)
Figure II
// Int a = 6 - 0x14(%rbp) 0x10f5e4edc <+12>: movl $0x6, -0x14(%rbp)
Figure III
0x10f5e4ee7 <+23>: movq %rax, -0x20(%rbp)
Figure IV
0x10f5e4ee7 <+23>: movq %rax, -0x20(%rbp) 0x10f5e4eef <+31>: movl $0xc, (%rax)
Complete assembly analysis
YangASM`-[ViewController asm_point]: 0x10f5e4ed0 <+0>: pushq %rbp 0x10f5e4ed1 <+1>: movq %rsp, %rbp // Arrange rdi rsi 0x10f5e4ed4 <+4>: movq %rdi, -0x8(%rbp) 0x10f5e4ed8 <+8>: movq %rsi, -0x10(%rbp) // Int a = 6 - 0x14(%rbp) 0x10f5e4edc <+12>: movl $0x6, -0x14(%rbp) // Say - 0x14(%rbp) this address is assigned to rax 0x10f5e4ee3 <+19>: leaq -0x14(%rbp), %rax // The value in rax is stored in - 0x20(%rbp) 0x10f5e4ee7 <+23>: movq %rax, -0x20(%rbp) // Assign the value of - 0x20(%rbp) to rax. Rax now points to - 0x20(%rbp) // It's easy to ask why rax was taken out of memory just now? // The above is stored in memory because the source code int * p = & A assembly needs to be completed, and p points to an address // This is because * p = 12 appears in the code // *p = 12 is actually two lines of assembly code. First get * p, and the other line is assignment 12 // This is for the convenience of rax 0x10f5e4eeb <+27>: movq -0x20(%rbp), %rax // (% rax) rax plus an inner bracket is to operate the value in this memory area // The movl description is 4 bytes // Change the value of this area to 12 0xc 0x10f5e4eef <+31>: movl $0xc, (%rax) -> 0x10f5e4ef5 <+37>: popq %rbp 0x10f5e4ef6 <+38>: retq