2011-06-08

Crafting a BIOS from scratch

Introduction

Ideally, there would have been an entry between this one and the last, where I'd give you some pointers on how to disassemble the VMware BIOS we extracted using IDA Pro. However, one of the points of this series is to explore ways of improving/revisiting the too often ignored x86 bootloader recovery process (a.k.a. 'panic room'), that really ought to be part of any system boot operations. As such, we might as well jump straight into the fun by creating a VMware BIOS from scratch.

Your first question might be "Why would anyone want to craft their own BIOS from scratch (apart for an academical exercise)?". Well, in an ideal world, chipmakers such as intel and AMD would follow the example of the few SoCs manufacturers who got it right and provide both an UART and a small serial boot recovery ROM, ondie, to relegate the prospect of a non-functional system because of bad/uninitialized flash, to the footnotes of history. Alas, with CPUs well into their 4th decade of existence, that still hasn't happened. Therefore, to compensate for this missing feature, we'll have to implement such a feature ourselves, in the flash ROM, and that means writing a BIOS bootblock from scratch. And if you know how to write a BIOS bootblock, then you know how to write a complete BIOS. As to why one would actually want to replace a fully functional BIOS on actual hardware, just wait till you purchase a 3TB (or larger) HDD, or create a >2TB RAID array, on an not so old machine, with the intent of booting Windows from it...

Of course, crafting a fully featured BIOS, that can actually boot an OS, is something better left to large projects such as coreboot with a payload of either SeaBIOS or Tianocore (UEFI), so we're not going to do that here. Instead, our aim will be to produce a very simple BIOS, that just does serial I/O, and that can be used as a development base for more interesting endeavours, such as 'panic room' type flash recovery, or regular BIOS instrumentation, to help with the development of an UEFI 'BIOS' for legacy platforms. Trying things out with a virtual machine, before jumping onto actual hardware, seems like the smart thing to do.


Know thy enemy, a.k.a. "Which SuperIO?"

Hardware wise, the only subsystem we want to access then is the SuperIO chip, since it provides the (virtual) UART we are after. We're not going to bother about PCI, RAM, Video, or even Cache as RAM (CAR) access: just plain good old serial debug will do. And while I'm not going to go as far as saying that implementing the items listed above would be trivial, the fact is, as long as you have serial debug, at least you don't have to shoot in the dark, and that's big help.

In the case of VMware, all you need to know is that the SuperIO is a (virtual) National Semiconductor PC97338 (datasheet here). Unfortunately neither coreboot's superiotool or lm-sensors's sensors-detect seem to detect it at the moment, which is quite unfortunate (Didn't someone mention they were porting coreboot/LinuxBIOS to VMware some time ago? What happened?), as a lot of time was wasted on the SuperIO errand.

And maybe I missed a step somewhere, but from what I can see, the VMware virtual chip is not set to run in PnP mode. Thus, what I'm going to expose in the code below with regards to accessing the serial port is very specific to the VMware PC97338 non-PnP implementation, and may not translate so well to SuperIO chips that run in PnP mode. Oh well... Also, if you disassemble the VMware BIOS, you'll see some mucking around with a SuperIO chip located at port 0x398 early in the bootblock, with 0x398 being one of the possible bases for the PC97338... Except the VMware SuperIO base is indeed at the 0x2e location, so all that early stuff is a wild goose chase. Thanks a lot guys!

Therefore, just to reiterate, all you need to know is that the VMware SuperIO is a PC97338, at port 0x2e and running in non PnP mode. With that you can run along, and get going implementing early serial in your own BIOS.


Toolchain considerations and software constraints

With the hardware in check, and before we start writing anything, it might help to have a look at our other requirements.

First of all, as far as the development toolchain is concerned, and even more so as what follows is aimed at being usable by the largest number of people, we will use a GNU toolchain all the way. That means, as soon as you have a gcc setup on your platform that can produce x86 code, you should be good to go. And for the record, I have verified that the files I'm presenting below can produce a BIOS ROM on Windows, with either MinGW32, MinGW-w64 or cygwin, as well as Linux x86 or x64, with regular gcc. OSX (with a proper gcc toolchain) as well as cross compilers on other UNIX architectures are expected to work too. So if you don't have gcc setup on your system, go get it now!

Then comes our choice of language. The coreboot and other projects seem to be quite adamant about developing as little as possible in assembly, but I don't see it that way for the two following reasons:
  1. We have no stack after reset but unless we plan on doing non 'panic room' type things, we actually don't have much use for one in the first place. From experience (with Realtek SoCs) I can tell you that if your 'panic room' needs any form of memory to be initialized to be able to run, and that applies to Cache as RAM, you're not doing it right.
  2. RAM space is infinite. BIOS bootcode blocks aren't. If there's one space you want to optimize it's that 4K or 8K BIOS recovery bootblock that you'll keep and never re-flash at the end of your BIOS. Flash manufacturers are providing features to help with flash recovery - make use of them dammit!!
Therefore, assembly it is.

Now, the one caveat is that the GNU assembler seems to be the only tool still around defaulting to the AT&T syntax which, while arguably more sensible than the intel one, nobody else, and especially not IDA, uses. Instead the intel syntax prevails. While this could have been an annoyance, any recent versions of GNU as also supports the Intel syntax, which can be be switched on in your code with .intel_syntax noprefix. Now that's better!

Finally, we know we'll have to follow the following constraints:
  1. First instruction must be located at address 4GB-0x10 (or FFFF:FFF0 if you prefer), and the whole BIOS must reside at the very end of the 32 bit address space. This is an x86 CPU initialization requirement
  2. The processor starts in real address mode on reset. Another x86 reset constraint. Now, some people choose to switch to protected mode as soon as they can (so that they can use C), but we have optimization in mind, so we'll keep real address mode all the way.
  3. The BIOS ROM size must be 512 KB. This time it's a VMware requirement.
  4. We are also supposed to be careful about far jumps in our code, as another x86 boottime constraint. But that won't be an issue for a bootblock section of a few KB, which we plan to locate at the end of the BIOS anyway.


Producing a BIOS ROM

Now we jump into the gory details at last.

Since the reset vector is located at the end of the BIOS, we need to have at least two sections in our sourcecode: one that contains the bulk of our code, which I'll call main and which I'll arbitrarily set to start at 4 KB before the end of the ROM, and another, starting at FFFF:FFF0 and going to the end of the ROM, which I'll call reset and whose only purpose will be to jump into our entrypoint in the main section.

Below is an example of how one can establish these two sections in the assembly source, as well as the associated GNU ld script that ensures they will be located at the right destination address in the ROM. Because our BIOS is short, I'll use a single bios.S source for the code, and I'll call the ld script bios.ld. Hence bios.S:
.section main
init:   <insert useful code here>
        ...

.section reset
        jmp init
        .align 16
NB: the .align 16 at the end is there to ensure that the reset section is exactly 16 bytes. This way, we're sure that our reset section will occupy [FFFF:FFF0 - FFFF:FFFF] and we won't have to do extra padding.

bios.ld:
MEMORY { ROM (rx) : org = 4096M - 512K, len = 512K }
SECTIONS { 
        .main 4096M - 4K    : { *(main) }
        .reset 4096M - 0x10 : { *(reset) }
        }
As you can see above, the ld script simply sets the ROM to be 512 KB in size, located at the end of the 4 GB (=4096M, since GNU ld doesn't know the G suffix yet) and, as indicated, we placed our bootcode segment (main) to start at the last 4 KB block of ROM.
The script should sort out our addresses as we want them then, and once ld has churned through it and produced a new object file (which I'll call bios.out), we should be able to use objcopy with option -j to extract the various binary payloads of interest to us.

Now, the problem is that objcopy -j will only extract the payload data. We could of course use a trick like.align 4K-0x10 at the end of our main section, but that would mean we'd then have to edit our bootcode size in two separate files when we update it. The smarter approach is to use the --gap-fill option of objcopy, to conveniently fill any gap between sections main and reset.

Another problem we face is that the above script only produces the binary data starting with the 4K at the end of the ROM, since the first section we extract (main) starts there. So at most objcopy will create 4 KB of data, far from the 512 KB we actually need. The solution: create a dummy section in our source, which I'll call begin and which I'll also use to put a BIOS ID string, and tell ld either explicitly, or better simply with a >ROM directive (so that we don't have to fill in the ROM size a 3rd time in the script) where it should reside.

After that, if we extract the begin, main and reset sections in order, with the --gap-fill option, we should have a 512 KB binary file with everything mapped where it should be. Neat!


Caveats

Before I present the actual code, a quick summary caveats & gotchas which might be of interest to you if you use this code as a base, and clarifying why everything in our sources isn't exactly as simple as what's exposed above:
  • Gotcha #1: Linux will bother you with a missing .igot.plt section. This looks like a known bug. As a workaround, we added a dummy section for it.
  • Gotcha #2: This is a minor annoyance, but GNU ld doesn't handle constants in the MEMORY section (it's a bug). So the ROM size has to be specified twice in the ld script, and we couldn't use two nice constants at the top, as anybody would think of doing.
  • Gotcha #3: objcopy can only extract sections that have the ALLOC attribute. This attribute is properly set on Windows as soon as you define a section, but not on Linux, where you have to add the flag explicitly (eg: .section main, "ax" for 'ALLOC' and 'CODE'). Note that you always can check how the attributes of your sections are set with objdump -x bios.out
  • Gotcha #4: Using a jmp init in the reset section may result in a target address that is offset by 2 on some platforms (this seems to be a binutils bug). Thus we have to handcraft it.
  • Gotcha #5 (this is getting better and better): On Windows, when using MinGW32 or cygwin (but not MinGW-w64), if you don't define an entrypoint in the linker script, your antivirus may erroneously identify bios.out as containing a Trojan and delete it. "Holy mother of false positives, Batman!" So we need to add an ENTRY(init) statement at the top of our section list.
  • Gotcha #6: DON'T waste your time trying to use XCode on OSX. It is riddled with problems. Use a proper GNU suite instead.

Sourcecode

bios.S:
/********************************************************************************/
/*                         VMware BIOS ROM example                              */
/*       Copyright (c) 2011 Pete Batard (pete@akeo.ie) -  Public Domain         */
/********************************************************************************/


/********************************************************************************/
/* GNU Assembler Settings:                                                      */
/********************************************************************************/
.intel_syntax noprefix  /* Use Intel assembler syntax (same as IDA Pro)         */
.code16                 /* After reset, the x86 CPU is in real / 16 bit mode    */
/********************************************************************************/


/********************************************************************************/
/* Macros:                                                                      */
/********************************************************************************/
/* This macro allows stackless subroutine calls                                 */
.macro  ROM_CALL addr
        mov  sp, offset 1f      /* Use a local label as we don't know the size  */
        jmp  \addr              /* of the jmp instruction (can be 2 or 3 bytes) */
1:      /* see http://sourceware.org/binutils/docs-2.21/as/Symbol-Names.html    */
.endm


/********************************************************************************/
/* Constants:                                                                   */
/********************************************************************************/
/* The VMware platform uses an emulated NS PC97338 as SuperIO                   */
SUPERIO_BASE  = 0x2e    /* Do NOT believe what you see in the BIOS bootblock:   */
                        /* the VMware SuperIO base is 0x2e and not 0x398.       */
PC97338_FER   = 0x00    /* PC97338 Function Enable Register                     */
PC97338_FAR   = 0x01    /* PC97338 Function Address Register                    */
PC97338_PTR   = 0x02    /* PC97338 Power and Test Register                      */

/* 16650 UART setup */
COM_BASE      = 0x3f8   /* Our default COM1 base, after SuperIO init            */
COM_RB        = 0x00    /* Receive Buffer (R)                                   */
COM_TB        = 0x00    /* Transmit Buffer (W)                                  */
COM_BRD_LO    = 0x00    /* Baud Rate Divisor LSB (when bit 7 of LCR is set)     */
COM_BRD_HI    = 0x01    /* Daud Rate Divisor MSB (when bit 7 of LCR is set)     */
COM_IER       = 0x01    /* Interrupt Enable Register                            */
COM_FCR       = 0x02    /* 16650 FIFO Control Register (W)                      */
COM_LCR       = 0x03    /* Line Control Register                                */
COM_MCR       = 0x04    /* Modem Control Registrer                              */
COM_LSR       = 0x05    /* Line Status Register                                 */
/********************************************************************************/


/********************************************************************************/
/* begin : Dummy section marking the very start of the BIOS.                    */
/* This allows the .rom binary to be filled to the right size with objcopy.     */
/********************************************************************************/
.section begin, "a"             /* The 'ALLOC' flag is needed for objcopy       */
        .ascii "VMBIOS v1.00"   /* Dummy ID string                              */
        .align 16
/********************************************************************************/


/********************************************************************************/
/* main:                                                                        */
/* This section will be relocated according to the bios.ld script.              */
/********************************************************************************/
/* 'init' doesn't have to be at the beginning, so you can move it around, as    */
/* long as remains reachable, with a short jump, from the .reset section.       */
.section main, "ax"
.globl init             /* init must be declared global for the linker and must */
init:                   /* point to the first instruction of your code section  */
        cli             /* NOTE: This sample BIOS runs with interrupts disabled */
        cld             /* String direction lookup: forward                     */
        mov  ax, cs     /* A real BIOS would keep a copy of ax, dx as well as   */
        mov  ds, ax     /* initialize fs, gs and possibly a GDT for protected   */
        mov  ss, ax     /* mode. We don't do any of this here.                  */

init_superio:
        mov  dx, SUPERIO_BASE   /* The PC97338 datasheet says we are supposed   */
        in   al, dx             /* to read this port twice on startup, but the  */
        in   al, dx             /* VMware virtual chip doesn't seem to care...  */

        /* Feed the SuperIO configuration values from a data section            */
        mov  si, offset superio_conf    /* Don't forget the 'offset' here!      */
        mov  cx, (serial_conf - superio_conf)/2
write_superio_conf:
        mov  ax, [si]
        ROM_CALL superio_out
        add  si, 0x02
        loop write_superio_conf

init_serial:            /* Init serial port                                     */
        mov  si, offset serial_conf
        mov  cx, (hello_string - serial_conf)/2
write_serial_conf:
        mov  ax, [si]
        ROM_CALL serial_out
        add  si, 0x02
        loop write_serial_conf

print_hello:            /* Print a string                                       */
        mov  si, offset hello_string
        ROM_CALL print_string

serial_repeater:        /* End the BIOS with a simple serial repeater           */
        ROM_CALL readchar
        ROM_CALL putchar
        jmp serial_repeater

/********************************************************************************/
/* Subroutines:                                                                 */
/********************************************************************************/
superio_out:            /* AL (IN): Register index,  AH (IN): Data to write     */
        mov  dx, SUPERIO_BASE
        out  dx, al
        inc  dx
        xchg al, ah
        out  dx, al
        jmp  sp


serial_out:             /* AL (IN): COM Register index, AH (IN): Data to Write  */
        mov  dx, COM_BASE
        add  dl, al     /* Unless something is wrong, we won't overflow to DH   */
        mov  al, ah
        out  dx, al
        jmp  sp


putchar:                /* AL (IN): character to print                          */
        mov  dx, COM_BASE + COM_LSR
        mov  ah, al
tx_wait:
        in   al, dx
        and  al, 0x20   /* Check that transmit register is empty                */
        jz   tx_wait
        mov  dx, COM_BASE + COM_TB
        mov  al, ah
        out  dx, al
        jmp  sp


readchar:               /* AL (OUT): character read from serial                 */
        mov  dx, COM_BASE + COM_LSR
rx_wait:
        in   al, dx
        and  al, 0x01
        jz   rx_wait
        mov  dx, COM_BASE + COM_RB
        in   al, dx
        jmp  sp


print_string:           /* SI (IN): offset to NUL terminated string             */
        lodsb
        or   al, al
        jnz  write_char
        jmp  sp
write_char:
        shl  esp, 0x10  /* We're calling a sub from a sub => preserve SP        */
        ROM_CALL putchar
        shr  esp, 0x10  /* Restore SP                                           */
        jmp  print_string


/********************************************************************************/
/* Data:                                                                        */
/********************************************************************************/
superio_conf:
/* http://www.datasheetcatalog.org/datasheet/nationalsemiconductor/PC97338.pdf  */
        .byte PC97338_FER, 0x0f         /* Enable COM, PAR and FDC              */
        .byte PC97338_FAR, 0x10         /* LPT=378, COM1=3F8, COM2=2F8          */
        .byte PC97338_PTR, 0x00         /* Make sure COM1 test mode is cleared  */
serial_conf:    /* See http://www.versalogic.com/kb/KB.asp?KBID=1395            */
        .byte COM_MCR,     0x00         /* RTS/DTS off, disable loopback        */
        .byte COM_FCR,     0x07         /* Enable & reset FIFOs. DMA mode 0.    */
        .byte COM_LCR,     0x80         /* Set DLAB (access baudrate registers) */
        .byte COM_BRD_LO,  0x01         /* Baud Rate 115200 = 0x0001            */
        .byte COM_BRD_HI,  0x00
        .byte COM_LCR,     0x03         /* Unset DLAB. Set 8N1 mode             */
hello_string:
        .string "\r\nHello BIOS world!\r\n" /* .string adds a NUL terminator    */
/********************************************************************************/


/********************************************************************************/
/* reset: this section must reside at 0xfffffff0, and be exactly 16 bytes       */
/********************************************************************************/
.section reset, "ax"
        /* Issue a manual jmp to work around a binutils bug.                    */
        /* See coreboot's src/cpu/x86/16bit/reset16.inc                         */
        .byte  0xe9
        .int   init - ( . + 2 )
        .align 16, 0xff /* fills section to end of ROM (with 0xFF)              */
/********************************************************************************/

bios.ld:
OUTPUT_ARCH(i8086)                      /* i386 for 32 bit, i8086 for 16 bit       */

/* Set the variable below to the address you want the "main" section, from bios.S, */
/* to be located. The BIOS should be located at the area just below 4GB (4096 MB). */
main_address = 4096M - 4K;              /* Use the last 4K block                   */

/* Set the BIOS size below (both locations) according to your target flash size    */
MEMORY {
        ROM (rx) : org = 4096M - 512K, len = 512K
}

/* You shouldn't have to modify anything below this                                */
SECTIONS {
        ENTRY(init)                     /* To avoid antivirus false positives      */
        /* Sanity check on the init entrypoint                                     */
        _assert = ASSERT(init >= 4096M - 64K, 
                "'init' entrypoint too low - it needs to reside in the last 64K.");
        .begin : {      /* NB: ld section labels MUST be 6 letters or less         */
                *(begin)
        } >ROM          /* Places this first section at the beginning of the ROM   */
        /* the --gap-fill option of objcopy will be used to fill the gap to .main  */
        .main main_address : {
                *(main)
        }
        .reset 4096M - 0x10 : {         /* First instruction executed after reset  */
                *(reset)
        }
        .igot 0 : {                     /* Required on Linux                       */
                *(.igot.plt)
        }
}

Makefile (IMPORTANT: if you copy/paste, you will have to restore the tabs at the beginning of each line that start with a space):
ASM       = gcc
CC        = gcc
LD        = ld
OBJDUMP   = objdump
OBJCOPY   = objcopy

CFLAGS    = -m32
LDFLAGS   = -nostartfile

OBJECTS   = bios.o
TARGET    = bios
MEMLAYOUT = xMemLayout.map

.PHONY: all clean

all: $(TARGET).rom

clean:
 @-rm -f -v *.o $(TARGET).out $(MEMLAYOUT)

%.o: %.c Makefile
 @echo "[CC]  $@"
 @$(CC) -c -o $*.o $(CFLAGS) $<

%.o: %.S Makefile
 @echo "[AS]  $<"
 @$(ASM) -c -o $*.o $(CFLAGS) $<

# Produce a disassembly dump of the main section, for verification purposes
dis: $(TARGET).out
 @echo "[DIS] $<"
 @$(OBJCOPY) -O binary -j .main --set-section-flags .main=alloc,load,readonly,code $< main.bin
 @$(OBJDUMP) -D -bbinary -mi8086 -Mintel main.bin | less
 @-rm -f main.bin

$(TARGET).out: $(OBJECTS) $(TARGET).ld
 @echo "[LD]  $@"
 @$(LD) $(LDFLAGS) -T$(TARGET).ld -o $@ $(OBJECTS) -Map $(MEMLAYOUT)

$(TARGET).rom: $(TARGET).out
 @echo "[ROM] $@"
 @# Note: -j only works for sections that have the 'ALLOC' flag set
 @$(OBJCOPY) -O binary -j .begin -j .main -j .reset --gap-fill=0x0ff $< $@

Compiling and testing
  • Copy the Makefile, bios.S and bios.ld from above, or extract the files from the archive below to a directory
  • run make. You should end up with a 512 KB bios.rom file
  • Copy bios.rom to your target VMware image directory and manually edit your .vmx file to have the line:
    bios440.filename = "bios.rom"
  • Edit your virtual machine settings to make sure it has a serial port.
    The preferred method to access the serial console is to use a null modem emulator, such as com0com, a signed version of which I made available in the next post.
    Otherwise, you can use either use an actual host COM port (you'll need a null modem cable to another serial port), output to file (which is the easiest way to confirm that the BIOS works, as you will see some output there, but you won't be able to test the repeater) or a named pipe (with the end of pipe set for an application such as putty - the problem with using a pipe however being that you can only connect to it after the VM is started, so you will likely miss the initial serial output).
    The Serial port needs to be set to 115200 bauds, 8N1.
  • Run the machine. You should see an "hello world" message printed out, and, provided your serial configuration allows input, anything you type should be echoed back on your terminal

Goodies
  • vmbios-1.0.tgz: an archive containing all the files above, as well as the generated BIOS.
  • Note that you can issue 'make dis' to get a disassembly output of the main section if needed.
  • Note that for reference, a memory map called xMemLayout.map is also produced during the build.
  • BIOS Disassembly Ninjutsu Uncovered, by Darmawan Salihun (PDF): If you're going to do start with BIOS modding, this should be your reference. Or see the author's page.

Happy BIOS hacking!

8 comments:

  1. Wow! I am really impressed.

    However, I have a small nit-pick: you were saying "the GNU assembler seems to be the only tool still around defaulting to the AT&T syntax [...]" and I disagree with you; and no, I am not speaking about Yasm, but rather at Sun's x86 assembler (one of the few pieces inherited from SysV which cannot made its way through OpenSolaris in source form) on one side, and at R. Nordier's (the guy from wrote FreeBSD's loader) "asx", available at http://www.nordier.com/software/asx.html

    ReplyDelete
  2. WOOOW. Nice article. I'm not in assembler but i'm iterested in CPU architectures and one thing made my brain to go in "brainstorm mode". Quote ".code16 /* After reset, the x86 CPU is in real / 16 bit mode" is it BIOS switching CPU in to 16bit mode or our 32/64 bit CPUs are starting in 16bit mode by default after reset without any program/bios/firmware? If it starts without BIOS in that mode so Is it only on PC machines or same thing is with Mac x86_64 machines? If Mac's CPUs acts in diferent way so what part of PC mainboard switches PC CPUs to 16bit mode? May be there is a way to switch that off and in theory use Macintosh Firmware on PC. In other way if CPUs on Mac machines act the same way like on PC and they go into 16bit real mode then I think it is possible to set PC BIOS on MAC and Turn Mac in to ordinary PC. Am I right? If not then which part of motherbord makes a diference between PC hardware and MAC hardware? Boom so many questions but brainstorm is brainstorm. Sometimes you can't stop it until you get to the answers.

    ReplyDelete
    Replies
    1. Hi Aleksander,
      The CPU is in 16bit mode after reset always. Even modern 64 bit CPUs start in 16 bit mode. This is for compatibility with the original x86 CPUs that were only 16 bit. It is up to the BIOS or whatever firmware gets executed after reset to enable 32 bit mode.
      And unfortunately for you, this has nothing to do with turning a Mac into a PC or a PC into a Mac.
      If you are interested in finding out what makes a Mac a Mac, I suggest you google 'Hackintosh', as it highlights the effort of people who want to turn a Mac into a PC, and it should clarify the differences (mostly, Macs have a specific UEFI firmware and OS-X is set to recognize only specific hardware)

      Delete
  3. Hi Pete,
    very interesting tutorial. I am not in ASM, too, but would like some help in controlling Southbridge I/O ports (writeblockers).
    If I go further in this article then I will able to obtain such result or it require a totally different vision?
    Thanks

    ReplyDelete
    Replies
    1. You may want to check my code for UBRX where I've been doing some SB initialization for AMD chips in assembly.

      Delete
  4. Its not working with the Vmware8.0...also i have not tested it with another vmware

    ReplyDelete
    Replies
    1. It works fine with VMWare Player/Workstation 12.1.1. Are you sure you added a serial port device to your Virtual Machine? As mentioned in the post, the VM must be configured with a serial port for anything to happen.

      Delete