The size of head is 25KB+184B, followed by the main function.
head creates the kernel paging mechanism (page directory table, page table, buffer, GDT, IDT, and overwrites the memory space occupied by the executed code)
0x0000~0x4FFF, 20KB will be used as page directory table
First set the registers ds,es,fs,gs:
.text .globl _idt,_gdt,_pg_dir,_tmp_floppy_area _pg_dir: startup_32: movl $0x10,%eax mov %ax,%ds mov %ax,%es mov %ax,%fs mov %ax,%gs
eax = 0001 0000, that is, the second item of kernel privilege level GDT:
.word 0x07FF ! 8Mb - limit=2047 (2048*4096=8Mb) .word 0x0000 ! base address=0 .word 0x9200 ! data read/write .word 0x00C0 ! granularity=4096, 386
Then set the stack top pointer esp:
lss _stack_start,%esp
kernel/sched.c:
long user_stack [ PAGE_SIZE>>2 ] ; struct { long * a; short b; } stack_start = { & user_stack [PAGE_SIZE>>2] , 0x10 };
Therefore, the stack size is one page (4KB)
After calculation, its starting position is 0x1E25C
head next, set the IDT:
call setup_idt
/* * setup_idt * * sets up a idt with 256 entries pointing to * ignore_int, interrupt gates. It then loads * idt. Everything that wants to install itself * in the idt-table may do so themselves. Interrupts * are enabled elsewhere, when we can be relatively * sure everything is ok. This routine will be over- * written by the page tables. */ setup_idt: lea ignore_int,%edx movl $0x00080000,%eax movw %dx,%ax /* selector = 0x0008 = cs */ movw $0x8E00,%dx /* interrupt gate - dpl=0, present */ lea _idt,%edi mov $256,%ecx rp_sidt: movl %eax,(%edi) movl %edx,4(%edi) addl $8,%edi dec %ecx jne rp_sidt lidt idt_descr ret
edx: (ignore_int high 16 bits) || 0x8E00
eax: 0x0008 | | (ignore_int lower 16 bits)
IDT item:
Interrupt service program offset address 31... 16 bits
P DPL 0 1110 | | 000- ---- 1 00 0 1110 | | 0000 0000
Segment selector 0000 0000 0000 1000
Interrupt service program offset address 15... 0 bits
That is, in this IDT table item setting:
P = 0 Segment presence flag (whether it exists in memory)
DPL = 00 Descriptor Privilege level
43...40 = 1110 type: 0xE indicates IDT
Brush all 256 IDT s as ignore_int
.align 2 .word 0 idt_descr: .word 256*8-1 # idt contains 256 entries .long _idt
head.h:
typedef struct desc_struct { unsigned long a,b; } desc_table[256]; extern unsigned long pg_dir[1024]; extern desc_table idt,gdt;
ignore_int (return with only one line of prompt):
/* This is the default interrupt "handler" :-) */ int_msg: .asciz "Unknown interrupt\n\r" .align 2 ignore_int: pushl %eax pushl %ecx pushl %edx push %ds push %es push %fs movl $0x10,%eax mov %ax,%ds mov %ax,%es mov %ax,%fs pushl $int_msg call _printk popl %eax pop %fs pop %es pop %ds popl %edx popl %ecx popl %eax iret
TODO: unread
Next, abolish the existing GDT and recreate the GDT in a new location in the kernel:
call setup_gdt
/* * setup_gdt * * This routines sets up a new gdt and loads it. * Only two entries are currently built, the same * ones that were built in init.s. The routine * is VERY complicated at two whole lines, so this * rather long comment is certainly needed :-). * This routine will beoverwritten by the page tables. */ setup_gdt: lgdt gdt_descr ret
.align 2 .word 0 gdt_descr: .word 256*8-1 # so does gdt (not that that's any .long _gdt # magic number, but it works for me :^)
_idt: .fill 256,8,0 # idt is uninitialized _gdt: .quad 0x0000000000000000 /* NULL descriptor */ .quad 0x00c09a0000000fff /* 16Mb */ .quad 0x00c0920000000fff /* 16Mb */ .quad 0x0000000000000000 /* TEMPORARY - don't use */ .fill 252,8,0 /* space for LDT's and TSS's etc */
Segment length limit: 0x0FFF (4K * 4KB = 16MB)
Since the segment length limit has changed, reset the segment register:
movl $0x10,%eax # reload all the segment registers mov %ax,%ds # after changing gdt. CS was already mov %ax,%es # reloaded in 'setup_gdt' mov %ax,%fs mov %ax,%gs lss _stack_start,%esp
Then check whether A20 is on:
xorl %eax,%eax 1: incl %eax # check that A20 really IS enabled movl %eax,0x000000 # loop forever if it isn't cmpl %eax,0x100000 je 1b
Detection math coprocessor:
/* * NOTE! 486 should set bit 16, to check for write-protect in supervisor * mode. Then it would be unnecessary with the "verify_area()"-calls. * 486 users probably want to set the NE (#5) bit also, so as to use * int 16 for math errors. */ movl %cr0,%eax # check math chip andl $0x80000011,%eax # Save PG,PE,ET /* "orl $0x10020,%eax" here for 486 might be good */ orl $2,%eax # set MP movl %eax,%cr0 call check_x87 jmp after_page_tables /* * We depend on ET to be correct. This checks for 287/387. */ check_x87: fninit fstsw %ax cmpb $0,%al je 1f /* no coprocessor: have to set bits */ movl %cr0,%eax xorl $6,%eax /* reset MP, set EM */ movl %eax,%cr0 ret .align 2 1: .byte 0xDB,0xE4 /* fsetpm for 287, ignored by 387 */ ret
TODO: not studied in detail
The head program makes final preparations for calling the main function:
after_page_tables: pushl $0 # These are the parameters to main :-) pushl $0 pushl $0 pushl $L6 # return address for main, if it decides to. pushl $_main jmp setup_paging L6: jmp L6 # main should never return here, but # just in case, we know what happens.
Press in the parameters of the main function, the return address of the main function (theoretically impossible), and the address of the main function
Create paging mechanism:
/* * Setup_paging * * This routine sets up paging by setting the page bit * in cr0. The page tables are set up, identity-mapping * the first 16MB. The pager assumes that no illegal * addresses are produced (ie >4Mb on a 4Mb machine). * * NOTE! Although all physical memory should be identity * mapped by this routine, only the kernel page functions * use the >1Mb addresses directly. All "normal" functions * use just the lower 1Mb, or the local data space, which * will be mapped to some other place - mm keeps track of * that. * * For those with more memory than 16 Mb - tough luck. I've * not got it, why should you :-) The source is here. Change * it. (Seriously - it shouldn't be too difficult. Mostly * change some constants etc. I left it at 16Mb, as my machine * even cannot be extended past that (ok, but it was cheap :-) * I've tried to show which constants to change by having * some kind of marker at them (search for "16Mb"), but I * won't guarantee that's all :-( ) */ .align 2 setup_paging: movl $1024*5,%ecx /* 5 pages - pg_dir+4 page tables */ xorl %eax,%eax xorl %edi,%edi /* pg_dir is at 0x000 */ cld;rep;stosl movl $pg0+7,_pg_dir /* set present bit/user r/w */ movl $pg1+7,_pg_dir+4 /* --------- " " --------- */ movl $pg2+7,_pg_dir+8 /* --------- " " --------- */ movl $pg3+7,_pg_dir+12 /* --------- " " --------- */ movl $pg3+4092,%edi movl $0xfff007,%eax /* 16Mb - 4096 + 7 (r/w user,p) */ std 1: stosl /* fill pages backwards - more efficient :-) */ subl $0x1000,%eax jge 1b xorl %eax,%eax /* pg_dir is at 0x0000 */ movl %eax,%cr3 /* cr3 - page directory start */ movl %cr0,%eax orl $0x80000000,%eax movl %eax,%cr0 /* set paging (PG) bit */ ret /* this also flushes prefetch-queue */
The STOSL instruction is equivalent to saving the value in EAX to the address pointed by ES:EDI. If the direction position bit in EFLAGS is set (i.e. STD instruction is used before STOSL instruction), EDI will automatically decrease by 4, otherwise (CLD instruction) EDI will automatically increase by 4
cld sets edi or esi as the increment direction, and rep repeats (% ecx) times
Page entry example:
31.........12 11....2...1...0
0xFFF000 1 1 1
20 bit address u/s r/w present
111 represents: user u, read / write rw, presence p
000 represents: kernel s, read-only r, nonexistent
Then, the first address of the page table is given to the cr3 register (0x0000 0000), and the upper 20 bits contain the physical address of the page storing the page directory table
Setting bit 31 (PG) of cr0 register to 1 indicates that the address mapping adopts paging mechanism
(cr0 can be set only when bit 0 (PE) is 1, and PE is the protection mode)
Finally ret, execute the main() function!