Linux memory management: ARM Memory Layout and mmu configuration

Before page initialization and mmu configuration in the kernel, you need to know the whole memory map.

1. ARM Memory Layout

    Start address of Kernel space

  2. lowmem
    Kernel direct-mapped RAM region (1:1 mapping)
    Maximum 896M

    End address of lowmem

  4. pkmap
    Used to permanently map HIGHMEM page to kernel space
    2MB (this size is different for each platform)
    kmap() / kunmap()

  5. Page gap
    To against out-of-bounds errors

  6. vmalloc
    vmalloc() / ioremap() space

  7. DMA
    DMA memory mapping region

  8. Fixmap
    kmap() may fall asleep, so it can't be used in interruption context, etc.
    So Fixmap is used to map high MEM to kernel space in interrupt context.
    Mapping HIGHMEM pages atomically
    kmap_atomic(): Fixmap uses this function, so it can be used in the interrupt context

  9. Vector
    CPU vectors are mapped here

  10. Modules
    Kernel modules inserted via insmod are placed here
    16MB (14MB, if HIGHMEM is enabled)

When the kernel is initialized, some reserved memory needs to be removed from the low memory mentioned above. These reserved memory is for some peripherals. Let's look at how the reserved memory is removed and how the kernel reads the reserved memory.
(There is no specific memory allocation, such as slab or buddy system, etc.).

2. After bootloader determines the physical kernel address range, it modifies the corresponding device tree node.

Taking Qualcomm as an example, the following functions in bootloader are responsible for updating memory node s in device tree

int update_device_tree() {
    ret = fdt_path_offset(fdt, "/memory");
    offset = ret;
    ret = target_dev_tree_mem(fdt, offset);

"/ memory" is generally defined in sekeleton.dtsi, which is why the skeleton.dtsi file needs to include even though it contains empty content.

/ {
    #address-cells = <2>;
    #size-cells = <2>;
    cpus { };
    soc { };
    chosen { };
    aliases { };
    memory { device_type = "memory"; reg = <0 0 0 0>; };

The following functions are then called in the kernel to read the memory size and other assignments to the memblock variable:

    of_scan_flat_dt(early_init_dt_scan_memory, NULL);


int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
                     int depth, void *data)
    const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
    const __be32 *reg, *endp;
    int l;

    /* We are scanning "memory" nodes only */
    if (type == NULL) {
         * The longtrail doesn't have a device_type on the
         * /memory node, so look for the node called /memory@0.
        if (!IS_ENABLED(CONFIG_PPC32) || depth != 1 || strcmp(uname, "memory@0") != 0)
            return 0;
    } else if (strcmp(type, "memory") != 0)
        return 0;

    reg = of_get_flat_dt_prop(node, "linux,usable-memory", &l);
    if (reg == NULL)
        reg = of_get_flat_dt_prop(node, "reg", &l);
    if (reg == NULL)
        return 0;

    endp = reg + (l / sizeof(__be32));

    pr_debug("memory scan node %s, reg size %d, data: %x %x %x %x,\n",
        uname, l, reg[0], reg[1], reg[2], reg[3]);

    while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
        u64 base, size;

        base = dt_mem_next_cell(dt_root_addr_cells, &reg);
        size = dt_mem_next_cell(dt_root_size_cells, &reg);

        if (size == 0)
        pr_debug(" - %llx ,  %llx\n", (unsigned long long)base,
            (unsigned long long)size);

        early_init_dt_add_memory_arch(base, size);

    return 0;

3. When the kernel reads the device tree node, it keeps all memory areas in the memblock. Then remove all pre-reserved memory (such as the memory reserved for modem by Qualcomm msm platform, etc.). Divide the kernel into low memory and high memory, etc.

After the kernel starts,


When printed, the contents of memblock are as follows:

<6>[0.000000]  [0:wapper:0] sanity_check_meminfo memblock.memory.cnt=2
<6>[0.000000]  [0:wapper:0] pys_addr vmalloc_limit = 0xa9c00000
<6>[0.000000]  [0:wapper:0] count = 1 , reg->base =0x80000000 , reg->size =0x2fd00000
<6>[0.000000]  [0:wapper:0] count = 2 , reg->base =0xb0000000 , reg->size =0x30000000
<6>[0.000000]  [0:wapper:0] arm_lowmem_limit =0xa9c00000

Memory is divided into two CS s:
The base address of CS1 is 0x80000000 and its size is 0x30000000.
The base address of CS2 is 0xb0000000 and its size is 0x30000000.
So the physical memory start address is 0x800000000, and the total size is 1.5GB.
But there is no 3MB memory between 0x2fd00000 and 0x30000000. Where is it? (bootloader should be changed to ~, reserving sec_debug-related memory)

This section of 3MB contains sec_dbg, but it's not as big as 3MB. What else is it used for?
<0>[0.000000] [0:swapper:0] sec_dbg_setup: str=@0xaff00008
<0>[0.000000] [0:swapper:0] sec_dbg_setup: secdbg_paddr = 0xaff00008
<0>[0.000000] [0:swapper:0] sec_dbg_setup: secdbg_size = 0x80000

The following functions are then called to read the device tree content related to memory and reserve memory related to modem,audio, etc.


At this time, the printed contents are as follows:

<6>[0.000000]  [0:swapper:0] arm_lowmem_limit =0xa9c00000
<6>[0.000000]  [0:swapper:0] cma: Found external_image__region@0, memory base 0x85500000, size 19 MiB, limit 0xffffffff
<6>[0.000000]  [0:swapper:0] cma: Found modem_adsp_region@0, memory base 0x86800000, size 88 MiB, limit 0xffffffff
<6>[0.000000]  [0:swapper:0] cma: Found pheripheral_region@0, memory base 0x8c000000, size 6 MiB, limit 0xffffffff
<6>[0.000000]  [0:swapper:0] cma: Found venus_region@0, memory base 0x8c600000, size 5 MiB, limit 0xffffffff
<6>[0.000000]  [0:swapper:0] cma: Found secure_region@0, memory base 0x00000000, size 109 MiB, limit 0xffffffff
<6>[0.000000]  [0:swapper:0] cma: Found qseecom_region@0, memory base 0x00000000, size 13 MiB, limit 0xffffffff
<6>[0.000000]  [0:swapper:0] cma: Found audio_region@0, memory base 0x00000000, size 3 MiB, limit 0xffffffff
<6>[0.000000]  [0:swapper:0] cma: Found splash_region@8E000000, memory base 0x8e000000, size 20 MiB, limit 0xffffffff

The contents of the read dts file can be found as follows:

    memory {
        #address-cells = <2>;
        #size-cells = <2>;

/* Additionally Reserved 6MB for TIMA and Increased the TZ app size
 * by 2MB [total 8 MB ]
        external_image_mem: external_image__region@0 {
            reg = <0x0 0x85500000 0x0 0x01300000>;
            label = "external_image_mem";

        modem_adsp_mem: modem_adsp_region@0 {
            reg = <0x0 0x86800000 0x0 0x05800000>;
            label = "modem_adsp_mem";

        peripheral_mem: pheripheral_region@0 {
            reg = <0x0 0x8C000000 0x0 0x0600000>;
            label = "peripheral_mem";

        venus_mem: venus_region@0 {
            reg = <0x0 0x8C600000 0x0 0x0500000>;
            label = "venus_mem";

        secure_mem: secure_region@0 {
            reg = <0 0 0 0x6D00000>;
            label = "secure_mem";

        qseecom_mem: qseecom_region@0 {
            reg = <0 0 0 0xD00000>;
            label = "qseecom_mem";

        audio_mem: audio_region@0 {
            reg = <0 0 0 0x314000>;
            label = "audio_mem";

        cont_splash_mem: splash_region@8E000000 {
            reg = <0x0 0x8E000000 0x0 0x1400000>;
            label = "cont_splash_mem";

After that

Setup_arch() - > arm_memblock_init() - > dma_contiguous_reserve() - > dma_contiguous_early_removal_fixup() also calls sanity_check_meminfo() function once.

Then the printed content becomes

<6>[0.000000]  [0:swapper:0] pys_addr vmalloc_limit = 0xa9c00000
<6>[0.000000]  [0:swapper:0] count = 1 , reg->base =0x80000000 , reg->size =0x5500000
<6>[0.000000]  [0:swapper:0] count = 2 , reg->base =0x8cb00000 , reg->size =0x23200000
<6>[0.000000]  [0:swapper:0] count = 3 , reg->base =0xb0000000 , reg->size =0x30000000
<6>[0.000000]  [0:swapper:0] arm_lowmem_limit =0xb1200000

Comparing the log s printed by calling sanity_check_meminfo() function twice, we can see the deducted memory range. Only external_image_mem, modem_adsp_mem, peripheral_mem and venus_mem are deducted.
Where are the following secure_region, qseecom_region, audio_region, splash_region? (This part is reserved by ion memory!! )

Here are the deductions

external_image_mem: 0x85500000~0x86800000 size is 19MB
 modem_adsp_mem: 0x86800000~0x8C000000 size 88MB
 Peripher_mem: 0x8C000000 ~ 0x8C600000 size 6MB
 venus_mem: 0x8c600000 ~ 0x8cb00000 5MB in size
 Security_mem: 0xd9000000 ~0xe0000000 is 112 MB // This size is adjusted compared with the above 109 MB. Why?
Qseecom_region: The size of 0xd8000000 ~ 0xd9000000 is 16MB/// This size is also adjusted compared with the above 109 MB. Why?
Audio_mem: 0xd7c00000 size is adjusted to 4MB//size
 Spash_region: 0x8E000000-0x8F400000 20 MB in size
 default region: 0xa9400000 ~ 0xa9c00000 size 8MB

Before and after deduction of external_image_mem, modem_adsp_mem, peripheral_mem, venus_mem, memblock's
The contents are as follows:

<6>[0.000000]  [0:swapper:0] count = 1 , reg->base =0x80000000 , reg->size =0x2fd00000
<6>[0.000000]  [0:swapper:0] count = 2 , reg->base =0xb0000000 , reg->size =0x30000000
//The first print is like this, and the second print is like this

<6>[0.000000]  [0:swapper:0] count = 1 , reg->base =0x80000000 , reg->size =0x5500000
<6>[0.000000]  [0:swapper:0] count = 2 , reg->base =0x8cb00000 , reg->size =0x23200000
<6>[0.000000]  [0:swapper:0] count = 3 , reg->base =0xb0000000 , reg->size =0x30000000
<6>[0.000000]  [0:swapper:0] arm_lowmem_limit =0xb1200000
//Vmalloc is set to 340MB by cmdline, so vmalloc_limit= 0xb1200000
//(The value of 0xff000000 - 0x15400000(340MB), which is the value obtained by subtracting the size of vmalloc from 0xff00000.
//This value is adjusted to arm_lowmem_limit = 0xa9c00000. 
//But the second time it was printed by sanity_check_meminfo() function, it was adjusted to 0xb1200000. How?
//arm_lowmem_limit is the ultimate criterion for dividing Lowmemory and other vmalloc regions.
//As you can see below, the largest area of lowmemory address is 0xf000000~0xf120000. The maximum address is 0xf1200000, which is the same as arm_lowmem_limit.
//The starting address of high memory is the value of high_memory, which is as follows:
//high_memory = __va(arm_lowmem_limit - 1) + 1; 
//This value plus VMALLOC_OFFSET is the starting address of vmalloc.
//#define VMALLOC_START ((unsigned long)high_memory + VMALLOC_OFFSET)
//VMALLOC_OFFSET is generally 8MB

A schematic view of the entire memory

<6>[0.000000]  [0:swapper:    0] Memory: 1243908K/1448960K available (10539K kernel code, 1363K rwdata, 4472K rodata, 1417K init, 5844K bss, 205052K reserved, 632832K highmem)
<6>[0.000000]  [0:swapper:    0] Virtual kernel memory layout:
<6>[0.000000]  [0:swapper:    0]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
<6>[0.000000]  [0:swapper:    0]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
<6>[0.000000]  [0:swapper:    0]     arm_lowmem_limit = 0xf1200000 
<6>[0.000000]  [0:swapper:    0]  
<6>[0.000000]  [0:swapper:    0]       start_phys  : 0xf0000000    end_phys :  0x20000000  
<6>[0.000000]  [0:swapper:    0]       vmalloc : 0xf1200000 - 0xff000000   ( 222 MB)
<6>[0.000000]  [0:swapper:    0]       lowmem  : 0xf0000000 - 0xf1200000   (  18 MB)
<6>[0.000000]  [0:swapper:    0]       start_phys  : 0xccb00000    end_phys :  0xefd00000  
<6>[0.000000]  [0:swapper:    0]       vmalloc : 0xefd00000 - 0xf0000000   (   3 MB)
<6>[0.000000]  [0:swapper:    0]       lowmem  : 0xccb00000 - 0xefd00000   ( 562 MB)
<6>[0.000000]  [0:swapper:    0]       start_phys  : 0xc0000000    end_phys :  0xc5500000  
<6>[0.000000]  [0:swapper:    0]       vmalloc : 0xc5500000 - 0xccb00000   ( 118 MB)
<6>[0.000000]  [0:swapper:    0]       lowmem  : 0xc0000000 - 0xc5500000   (  85 MB)
<6>[0.000000]  [0:swapper:    0]       pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
<6>[0.000000]  [0:swapper:    0]       modules : 0xbf000000 - 0xbfe00000   (  14 MB)
<6>[0.000000]  [0:swapper:    0]       .text : 0xc0008000 - 0xc0fa8ec4   (16004 kB)
<6>[0.000000]  [0:swapper:    0]       .init : 0xc1000000 - 0xc1162480   (1418 kB)
<6>[0.000000]  [0:swapper:    0]       .data : 0xc1164000 - 0xc12b8de4   (1364 kB)
<6>[0.000000]  [0:swapper:    0]        .bss : 0xc12c1b3c - 0xc1876b78   (5845 kB)

The Normal and HighMem of node_zones in contig_page_data
zone_start_pfn, spanned_pages correspond to the address above.

        zone_start_pfn = 0x80000000
        zone_start_pfn Add spanned_pages Number of addresses. arm_lowmem_limit Value
    zone_start_pfn And the value is exactly equal to the value of uuuuuuuuuuu arm_lowmem_limit The value.
    zone_start_pfn Add spanned_pages The value is exactly equal to0xE0000000. 

4. Based on the above processing, the size and range of memory available to the kernel are obtained. Then paging is done through mmu configuration and so on.

Whether it is x86 or ARM architecture, most CPU s now access memory, usually through the MMU to achieve virtual memory and physical memory conversion.
Here is a simple sketch. (If you want to analyze in detail, it depends on the MMU layers, how to configure the size of each page and so on!! Reference to ARM Architecture Books)

On ARM platform, secondary and tertiary page tables can be selected. But so far, I haven't seen three-level page table, so I skip three-level page table and only look at the second-level page table.

//Inside the file / kernel/arch/arm/include/asm/pgtable.h
#include <asm/pgtable-3level.h>
#include <asm/pgtable-2level.h>

Set a page size. Let's skip the register settings and page size types. This section can be referred to arm developer's guide.
Let's first see where the page size is defined in Linux.

//Inside the kernel/include/asm-generic/page.h file
#define PAGE_SHIFT  12
#define PAGE_SIZE   (1UL << PAGE_SHIFT)
//12 is the most commonly seen 4k-sized page.

Taking ARM secondary page table as an example, there are two kinds of primary page table and secondary page table.

//The page size is 4K, and the map s can be organized in the following way to maximize the memory address space of 4G.
1. The first page table is4096,The two level page table is256
2. The first page table is2048,The two level page table is512
//In ARM Linux, PTRS_PER_PGD, PTRS_PER_PMD and PTRS_PER_PTE are respectively defined to represent the original three-level page table, but if it is a two-level page table. These three values are defined as follows:
#define PTRS_PER_PTE        512
#define PTRS_PER_PMD        1
#define PTRS_PER_PGD        2048
//The above values correspond to the organization of Level 1 Page Table 2048 and Level 2 Page Table 512. In the secondary page table, PUD and PMD are useless.

//Configurations such as Level 1 Page Table 4096 and Level 2 Page Table 256 can be defined as follows:
#define PTRS_PER_PTE        256
#define PTRS_PER_PMD        1
#define PTRS_PER_PGD        4096

The schematic diagram of the page table is as follows:

The create_mapping() function is specifically responsible for the generation of page tables.

//create_mapping() has several call paths
1. devicemaps_init()->create_mapping()
2. map_lowmem()->create_mapping()
3. iotable_init()->create_mapping()
4. debug_ll_io_init()->create_mapping()

You can see how the create_mapping() function builds a page table based on physical and corresponding virtual memory.

Let's take an example to see how a task accesses a virtual address and turns it into a physical address step by step.

Linux kernel processes, access addresses are within the scope of the kernel, as long as a simple offset can be converted between physical addresses and virtual addresses, not to mention.
User processes, the address of their page tables, are stored in their task struct mm or active_mm pgd. According to this address, it can be calculated according to the allocation of the page table.

From the task_struct of the user process, we can know the address of pgd. Of course, the page table allocation method has been mentioned above. Here is the allocation method of 4096,256. If in this process, the virtual address accessed is 0x01206000. It can be calculated as 0x578DB000 in the following way.

According to the graph in ARM Developer's Guide, let's see how it works out step by step.

  1. The virtual address is: 0x01206000
  2. Translation table base addre is the address of pgd (stored in coprocessor CP15:C2). From task_struct-> active_mm-> pgd above, you can see that 0xD7E3380
  3. Virtual address 0x01206000 * 0xFFF000000, this is to take the first 12 bits of virtual address, and then move 20 bits to the right, that is, 0x12, equal to 18. This value is multiplied by 4, plus the pgd address. Because there are 4096 page tables in the first level and each item in the page table is 4 bytes, it is multiplied by 4. Therefore, the address to be taken is 0xD318048. The value inside this address is 0x53C6381. This value multiplied by 0xFFFFFF00 is the base address 0x53C6300 of the second page table.
  4. Take the middle 8bit of 0x01206000 virtual address, move 8 bits to the right, multiply by 4, and add it to the base address 0x53C6300 of the secondary page table calculated above. The calculated value is 0x53C6318. The value of this address is 0x578DBC7F.
  5. The value of 0x578DBC7F * 0xFFF000 plus the value of virtual address * 0x00000FFF is 0x578DB000. This is the physical address that will be accessed eventually.

Memory management of user processes

1. Process data structure: task_struct
2. Process memory management data structure: mm_struct
mmap: Link header for all memory allocated by a process
Pgd: address of page global directory
3. Processes allocated memory, managed by vm_area_struct
vm_start and vm_end: Start and end addresses of virtual memory

The following figure is a schematic illustration of the conversion of virtual memory accessed by user processes into physical addresses through pgd, which has been described in detail before:

Tags: Linux Attribute

Posted on Sat, 08 Dec 2018 10:27:05 -0500 by phpion