Code injection into SMM mode in 2021

Home / Research / Code injection into SMM mode in 2021

Motivation

SMM is the highest privileged execution level. It is protected from non SMM accesses, locked during pre-boot and is an ideal candidate to host attestation software.

We decided to inject a remote attestation prover within SMM mode for the aforementionned reason. The major drawback from this approach is that thesaid mode is pretty well closed from Intel outsiders and one very often struggle to succeed in legitimately loading software in this region.

For this specific project, we have selected a Gygabyte GB-BKi5A-7200 host machine machine because of its compact form factor and its ability to host a M.2 form factore PCIe peripheral were we would connect a security peripheral (remote attestation verifier).

CMID++ architecture

Several approaches are available in order to extend SMM mode :

Sign a NDA :)
direct platform flash update using an available free firmware or modified manufacturer firmware when available (signature ?). This approach is very hard while very likely to be time consuming and faulty
a slightly easier second approach is to inject pre-boot firmware code using a PCIe expansion ROM, when SMRAM is still not locked.

We decided to use our security PCIe peripheral to go for the second approach since it supports expansion ROMs.

Unfortunately, due to COVID-19, we struggled to order the M.2 form factor PCIe FPGA we required for this experiment. Consequently, we had to find another solution.

This solution is called Intel DCI, an USB like connection interface to the embedded hardware processor’s hardware debugger which is normally non-available to non Intel partners.

This debugger basically enables us to trap System Management Interrupts (SMI) and write into the SMRAM.

Activating DCI

DCI is locked at boot time for obvious security reasons. We succeded to unlock it by impacting modifications of non protected DXE drivers configuration data, which are stored in the platform flash close from EFI variables. Indeed, some of those configuration variable are conditioning DCI feature lock. To succeed, we adapted some publicly available reversed procedures from anonymous contributors.

Once DCI is enabled and that we are able to discuss with the debugger using DFx Abstraction Layer (DAL) python wrappers, we are able to move on to arbitrary code injection step.

Injection method

We use the following steps :

Load and execute smm_stage_2 ELF64 application using a custom made bare metal UEFI ELF64 loader. This application basically prepares itself to be called from SMM. It is the SMM injected payload. We load it to address @a.
Then, we hook the last bytes of firmware SMI handler starting right at the rsm instruction. We use there DCI and DAL python wrapper to inject smm_stage_1 hook. Its only job is to configure cr3 with smm_state_2 id mapped page tables (@a + pml4_offset) and jump to its entry point (@a + entry_offset), for the second time
The ultimate purpose of smm_state_2 is to be a SMI Transfer Monitor loader

Stage 1

sources/smm_stage_1/hook.s

.global hook
hook:

  // Save GPRs

  push %r15
  push %r14
  push %r13
  push %r12
  push %r11
  push %r10
  push %r9
  push %r8
  push %rdi
  push %rsi
  push %rbp
  push %rbx
  push %rdx
  push %rcx
  push %rax

  // Save cr3

  mov %cr3, %rax
  push %rax

  // Set smm_stage_2 overall id mapping,
  //   see sbss symbol in smm_stage_2.elf
  // Page tables symbol is "pages".
  // It has to be id mapping to fit with efi
  //   current memory configuration
  movabs 0x100005000, %rax
  mov %rax, %cr3

  sub $0x8, %rsp
  // See kernel_start symbol in smm_stage_2.elf
  // Pushq $0x100000000
  movq $0x100000000, %rax
  pushq %rax
  // Call hook
  call *(%rsp)
  add $0x8, %rsp

  // Restore cr3

  pop %rax
  mov %rax, %cr3

  // Restore GPRs

  pop %rax
  pop %rcx
  pop %rdx
  pop %rbx
  pop %rbp
  pop %rsi
  pop %rdi
  pop %r8
  pop %r9
  pop %r10
  pop %r11
  pop %r12
  pop %r13
  pop %r14
  pop %r15

  // Breakpoint placeholder
  nop

stage 2

sources/smm_stage_2/main.c

void __attribute__((section(".start")))
    kernel_start(void) {

  // First call by the loader
  if (initialized == 0) {
    initialized = 1;
    kernel_init();
    return;
  }

  // Elsewise we are in SMM mode, we can play !
  INFO("Party harder\n");
}

Injection procedure using DAL python wrapper

Before going into details, there is an example of a ia32e paging page walk algorithm, using DAL, just to illustrate the power of the interface.

hook.py

def pageWalking(addr):
  cr0 = t0.arch_register("cr0")
  cr3 = t0.arch_register("cr3")
  cr4 = t0.arch_register("cr4")
  efer = t0.arch_register("efer")

  print("Page walking @0x%x" % addr)

  print("efer.LMA %d" % ((efer >> 10) & 1))
  print("efer.LME %d" % ((efer >> 8) & 1))

  print("cr0.PG %d" % ((cr0 >> 31) & 1))
  print("cr0.PE %d" % ((cr0 >> 0) & 1))

  print("cr4.PAE %d" % ((cr4 >> 5) & 1))
  print("cr4.LA57 %d" % ((cr4 >> 12) & 1))

  # If ia32 paging && no 5 level paging
  if ((efer >> 10) & 1) and ((cr4 >> 12) & 1) == 0:
    print("cr3 0x%x" % cr3)
    cr3 = cr3 & 0x00000000fffffffffffff000

    # 1st stage : PML4E
    pmle = t0.mem("%x" % \
        (cr3 + 8 * ((addr >> (12 + 3 * 9)) & 0x1ff)), 8)
    print("PML4E 0x%x" % pmle)

    if (pmle & 1) == 0:
      print("Non present PML4E")
      return
    # get the real address
    pmle = pmle & ~(0x3fffffff | (0xfff << 52))

    # 2nd stage : PDPTE
    pdpt = t0.mem("%x" % \
        (pmle + 8 * ((addr >> (12 + 2 * 9)) & 0x1ff)), 8)
    print("PDPTE 0x%x" % pdpt)
    if (pdpt & 1) == 0:
      print("Non present PDPTE")
      return

    if (pdpt >> 7) & 1:
      taddr = ((pdpt & ~(0x3fffffff | \
          (0xfff << 52))) | (addr & 0x3fffffff))
      print("PDPTE 1 GB mapped @%x" % taddr)
      return
    # get the real address
    pdpt = pdpt & ~(0x3fffffff | (0xfff << 52))

    # 3rd stage : PDE
    pd = t0.mem("%x" % \
        (pdpt + 8 * ((addr >> (12 + 1 * 9)) & 0x1ff)), 8)
    print("PDE 0x%x" % pd)
    if (pd & 1) == 0:
      print("Non present PDE")
      return

    if (pd >> 7) & 1:
      taddr = ((pd & ~(0x3fffff | \
          (0xfff << 52))) | (addr & 0x3fffff))
      print("PDPTE 2Mb GB mapped @%x" % taddr)
      return
    # get the real address
    pd = pd & ~(0x3fffffff | (0xfff << 52))

    # 4th stage : PT
    pt = t0.mem("%x" % \
        (pd + 8 * ((addr >> (12 + 0 * 9)) & 0x1ff)), 8)
    print("PTE 0x%x" % pt)
    if (pt & 1) == 0:
      print("Non present PTE")
      return

    # Get the physical address :
    physicalAddr = ((pt & ~(0xfff | \
        (0xfff << 52))) | (addr & 0xfff))

    print("PT 4kB mapped @0x%x" % physicalAddr)

We can see that every core register plus physical memory etc are freely accessible using python objects. We can move on to injection.

hook.py

def brhook():
  brsmm()
  t0.memload("abyme/sources/smm_stage_1/hook.bin", \
      "0x8b7c4172")
  t0.brnew("0x8b7c41a2")
  itp.go()
  modifSMRR()
  t0.brnew("0x8b7c41c4")
  itp.go()
  t0.msr(0x1f3, 0xff800800)
  t0.msr(0x1f2, 0x8b000006)

  t0.mem("0x8b7c4172", 1, 0x0f)
  t0.mem("0x8b7c4173", 1, 0xAA)

  t0.arch_register("rip", 0x8b7c4172)

First we set a SMI breakpoint and generate an SMI thanks to brsmm() function.

hook.py

def brsmm():
  itp.halt()
  t0.breaks.smmentry = 1
  t0.port(0xb2, 1)
  itp.go()
  t0.breaks.smmentry = 0
  t0.brremove()
  t0.step("Over", 50)

We are now stopped at the last instruction of SMI handler which is rsm. Then we hook rsm instruction with our smm_stage_1 bootstrap. We’ve made sure that we have enough space after… ;) We break right before smm_stage_2 call. We change SMRR MSR configuration to extend the location on the SMRAM, basically it is extended to the whole memory :D.

hook.py

def modifSMRR():
  SMRR_PHYSBASE = 0x1f2
  SMRR_PHYSMASK = 0x1f3
  SMM_FEATURE_CONTROL = 0x4E0

  print("SMM_Code_Chk_En : %d" % \
      (t0.msr(SMM_FEATURE_CONTROL) >> 2))

  print("Current state")
  smbase = (t0.msr(SMRR_PHYSBASE) >> 12) & 0xfffff
  range = (t0.msr(SMRR_PHYSMASK) >> 12) & 0xfffff

  print("SMRAM : [%x ; %x]" % (smbase, (smbase + range)))
  print("No worries...")

  t0.msr(SMRR_PHYSBASE, t0.msr(SMRR_PHYSBASE) & \
      (0xffffffff00000fff))
  t0.msr(SMRR_PHYSMASK, t0.msr(SMRR_PHYSMASK) & \
      (0xffffffff00000fff))
  print("Done")

  smbase = (t0.msr(SMRR_PHYSBASE) >> 12) & \
      0x000000000000fffff
  range = (t0.msr(SMRR_PHYSMASK) >> 12) & \
      0x0000000000000fffff
  print("New SMRAM :  [%x ; %x]" % \
      (smbase, (smbase + range)))

We break at the end of smm_stage_1 to capture at the very position of its last instruction and we move on the smm_stage_2 execution to debug whatever we need to play it hard. We lastly restore the rsm instruction, original SMRR configuration and jump back to restore system execution.

This procedure can be used to load arbitrary execute arbitrary code into SMM and restore a normal state.

Lessons learned and future works

That’s doable, dirty and quite unstable (debugger connection).

But it works \°/ !

The next (big) step for this project is to load a SMI Transfer Monitor to virtualize SMM.

Statement

This project has been funded by Toulouse Tech Transfer company. Many thanks to Camille Garo-Sail who worked on DAL python wrapper during his 1^st master’s degree intership.

Benoît Morgan

Research

Teaching resources

TLS-SEC Trainings

ACADIE team @ IRIT

INP-ENSEEIHT University

Motivation

Activating DCI

Injection method

Stage 1

stage 2

Injection procedure using DAL python wrapper

Lessons learned and future works

Statement