-
-
Notifications
You must be signed in to change notification settings - Fork 604
OSv early boot (MBR)
What happens after the computer is turned on? How do you write the very first lines of code the CPU would execute right after the computer starts?
Here, again, OSv is supplying a nice, self contained answer.
Let's limit our answer to the x86_64
architecture, even though OSv
supports ARM as well.
What happens right after an CPU is started? The CPU instruction pointer is
initialized to its reset vector. The x86 CPU sets their CS:EIP
address
to the fixed address 0xF000:0xFFF0
, this address contains some
code leading you eventually to the BIOS.
See coreboot's
documentation for additional information.
Let's see that in action
$ # Start paused (-S) QEMU with debugger (-s)
$ qemu-system-x86_64 -s -S -nographic
$ # on another tab
$ gdb
...
(gdb) set architecture i8086
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB. Attempting to continue with the default i8086 settings.
The target architecture is assumed to be i8086
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000fff0 in ?? ()
(gdb) info registers eip cs
eip 0xfff0 0xfff0
cs 0xf000 61440
(gdb)
As we start, the CPU start on 0xf000:0xfff0
. To see what the CPU is about to execute,
we'll have to translate the segmented address 0xf000:0xfff0
to a
linear address. Since we start in real mode, we simply have to multiply the segment
address by 0x10
, and add it to the IP, see Wikipedia
for additional details. Let's print the first instruction the CPU which should execute
at 0x10*0xf000+0xfff0=0xffff0
:
(gdb) x/i 0xffff0
0xffff0: ljmp $0xf000,$0xe05b
It's probably a jump to the BIOS code at 0xf000:0xe05b
. Let's step to the next instruction
(gdb) si
0x0000e05b in ?? ()
(gdb) i r eip cs
eip 0xe05b 0xe05b
cs 0xf000 61440
Indeed, we jumped to 0xf000:0xe05b
as expected.
The BIOS then loads the MBR from the disk at address 0x7c00
, and executes
it.
What's in the MBR? In order to understand that, let's check what does
the bare OSv image contains. Looking at build.mk
we can find the files
generating the basic OSv image, loader.img
.
Here is a simplified version of the instruction that creates it from build.mk
:
loader.img: boot.bin lzloader.elf
# first block, ie, 512 bytes, are simply boot.bin
dd if=boot.bin of=$@ > /dev/null 2>&1, DD $@ boot.bin
# Then, after 128 blocks of 512 bytes, ie, 64K, the lzloader.elf
dd if=lzloader.elf of=$@ conv=notrunc seek=128 > /dev/null 2>&1
# set number of blocks boot16.S fetches to lzloader.elf's size
$(src)/scripts/imgedit.py setsize $@ $(image-size), IMGEDIT $@)
# write the command line parameters right after the MBR, after 512 bytes
$(src)/scripts/imgedit.py setargs $@ $(cmdline), IMGEDIT $@)
The first 512 bytes, the MBR, is build/debug/boot.bin
. Then imgedit.py
inserts
the command line parameter for OSv. Next we have zeros until the 64Kth byte.
After that, we put lzloader.elf
, the loader. In ASCII art sketch:
0......512.................65536=64k.....
[boot.bin][cmdline]00000000[lzloader.elf]
Let's see MBR code in action. Let's start the virtual machine, and tell it to stop at load time:
$ ./scripts/run.py -d --wait
On a different terminal, let's connect with gdb, and break at 0x7c00
, the
address the BIOS loads the MBR to. Note that we're specifying commands that gdb
should run at startup with the -ex
switch, so that when gdb
is started,
it'll connect to QEMU, and the correct architecture is set:
$ gdb -ex 'set architecture i8086' -ex 'target remote localhost:1234'
...
(gdb) hbr *0x7c00
Hardware assisted breakpoint 1 at 0x7c00
(gdb) c
Continuing.
Breakpoint 1, 0x00007c00 in ?? ()
Now we finally started to run OSv code, the Master Boot Record, or the MBR.
Let's verify that this is the case.
Let's define a simple alias, that would display files
in gdb
's binary format:
$ alias gdbdump='hexdump -e '\''"0x%04_ax: " 8/1 "0x%02x\t" "\n"'\'''
Now let's verify that the BIOS is indeed loading boot.bin
:
(gdb) x/32b $eip
0x7c00: 0xea 0x5e 0x7c 0x00 0x00 0x00 0x00 0x00
0x7c08: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7c10: 0xdb 0x00 0x10 0x00 0x40 0x00 0x00 0x00
0x7c18: 0x00 0x80 0x80 0x00 0x00 0x00 0x00 0x00
$ gdbdump -n 32 build/release/boot.bin
0x0000: 0xea 0x5e 0x7c 0x00 0x00 0x00 0x00 0x00
0x0008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x0010: 0x00 0x10 0x10 0x00 0x40 0x00 0x00 0x00
0x0018: 0x00 0x80 0x80 0x00 0x00 0x00 0x00 0x00
Looks like this is the case.
How is boot.bin
generated?
The answer is, again, in build.mk
. First boot16.S
is compiled to boot16.o
with regular compilation, by the %.o: %.S
rule. Then, boot.bin
is being linked from boot16.o
and boot16.ld
linker script.
What does boot16.ld
do? At first it defines a memory section of the
available memory at boot time. The first 1MB.
MEMORY { BOOTSECT : ORIGIN = 0, LENGTH = 0x10000 }
Then, it'll take all text sections from the input, and relocate them to
0x7c00
, where the MBR would be loaded in to. It would verity that the text
section fits to the first megabyte of memory we defined previously. That
way, if boot16.S
accidently surpass 1MB, the linker would complain.
SECTIONS { .text 0x7c00 : { *(.text) } > BOOTSECT }
For example, if we'll add a megabyte of data at the end of boot16.S
,
we'll get the following:
$ echo .fill 0x10000, 1, 0 >> arch/x64/boot16.S
$ make mode=debug
...
make[1]: Entering directory `/home/elazar/dev/osv/build/debug'
GEN gen/include/osv/version.h
AS arch/x64/boot16.o
LD boot.bin
ld: address 0x17e00 of boot.bin section `.text' is not within region `BOOTSECT'
Finally, we'll instruct ld
to output the raw assembly instructions,
without any ELF headers.
OUTPUT_FORMAT(binary)
What does boot16.S
do?
At first we can see some x86 bookkeeping. Setting up the A20 line, and the segment registers.
Then, it'll try to load the command line arguments from disk, using interrupt 13
int1342_boot_struct:
.byte 0x10 # size of packet (16 bytes)
.byte 0 # should always be 0
.short 0x3f # fetch 0x3f sectors = 31.5k
.short cmdline # fetch to address $cmdline
.short 0 # fetch to segment 0
.quad 1 # start at LBA 1.
# That is, fetch the first 31.5k from the disk
...
lea int1342_boot_struct, %si
mov $0x42, %ah
mov $0x80, %dl
int $0x13
Indeed after the interrupt, we can see something in cmdline=0x7e00
(gdb) hbr *0x7c81
...
(gdb) x/32b 0x7e00
0x7e00: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7e08: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7e10: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7e18: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) si
0x00007c8c in ?? ()
(gdb) x/32b 0x7e00
0x7e00: 0x2f 0x75 0x73 0x72 0x2f 0x6d 0x67 0x6d
0x7e08: 0x74 0x2f 0x68 0x74 0x74 0x70 0x73 0x65
0x7e10: 0x72 0x76 0x65 0x72 0x2e 0x73 0x6f 0x26
0x7e18: 0x6a 0x61 0x76 0x61 0x2e 0x73 0x6f 0x20
Those bytes are indeed the command line arguments given to OSv
$ gdbdump -n 32 build/debug/cmdline
0x0000: 0x2f 0x75 0x73 0x72 0x2f 0x6d 0x67 0x6d
0x0008: 0x74 0x2f 0x68 0x74 0x74 0x70 0x73 0x65
0x0010: 0x72 0x76 0x65 0x72 0x2e 0x73 0x6f 0x26
0x0018: 0x6a 0x61 0x76 0x61 0x2e 0x73 0x6f 0x20
Let's move on. Now boot16.S
would load OSv's loader from disk.
We have a problem here. On the one hand, we need to be in real mode
in order to use int 0x13h
and access the disk with the BIOS. On the other
hand, we need to be in protected mode in order to access more than the first
1 MB of memory. What boot16.S
does, is, switch to real mode, fetch
a few KB from the disk, move to protected mode, and copy them to memory, back
to real mode, rinse and repeat.
In order to do that, we have to have a GDT that supports both 16-bit and 32-bit segments. Let's see how the GDT is configured:
gdt:
.short gdt_size - 1
.short gdt
.long 0
#
# base flag limit type base limit
.quad 0x00 c f 9b 000000 ffff # 32-bit code segment
.quad 0x00 c f 93 000000 ffff # 32-bit data segment
.quad 0x00 0 0 9b 000000 ffff # 16-bit code segment
.quad 0x00 0 0 93 000000 ffff # 16-bit data segment
...
# set the gdt
cli
lgdtw gdt
The first GDT entry is the zero descriptor, then
two 32 bit flat selectors limit = 0xfffff, base=0x0
whose flag
have the size and granularity bits on in flag
. Next two
identical flat 16 bit segments, so that we'll be able to jump back
to real mode. See GDT section in OSDev
for addition details.
Now let's see the snippets that takes us back and forth from protected mode to real mode:
# set protected mode bit in cr0
mov $0x11, %ax
lmsw %ax
# move to 32 bit code segment (0x8 = first) - protected mode
# ljmp to flush prefetch queue http://goo.gl/JBOnZ5
ljmp $8, $1f
...
# move to 16 bit code segment (0x18 = third)
# then set real mode in cr0
ljmpw $18, $1f
1:
.code16
# clear protected mode bit
mov $0x10, %eax
mov %eax, %cr0
ljmpw $0, $1f
Finally let's see the process of moving memory from the disk. First read_disk
is used to read 0x8000
bytes from the disk to tmp
:
read_disk:
lea int1342_struct, %si
mov $0x42, %ah
mov $0x80, %dl
int $0x13
Then in protected mode we copy them to xfer
, and increment the value
at xfer
by 0x8000
:
mov $0x10, %ax
mov %eax, %ds
mov %eax, %es
mov $tmp, %esi
mov xfer, %edi
mov $0x8000, %ecx
rep movsb
mov %edi, xfer
Now back in real mode, we read more from the disk, unless we already read
count32
bytes. The count32
memory location is being set to the loader's
size by OSv build process.
xor %ax, %ax
mov %ax, %ds
mov %ax, %es
sti
addl $(0x8000 / 0x200), lba
decw count32
jnz read_disk
Finally, the loader would save the memory map to mb_mmap_addr
with
int 15h e820.
The very last step is, jump into the lzloader.elf
code.
We go back to protected mode, jump to a predefined addresses
that would decompress the loader code and another call to the
decompressed loader code at start32
address.
mov $0x10, %ax
mov %eax, %ds
mov %eax, %es
mov %eax, %gs
mov %eax, %fs
mov %eax, %ss
call *lzentry
rdtsc
mov %eax, mb_uncompress_lo
mov %edx, mb_uncompress_hi
mov $mb_info, %ebx
call *start32
From now on, we're salvaged from the x86 assembly land and most of the code would be in C++.
The first part of the journey is done, from reset to the OS loader.