Research and teaching in information system security
SMM is the highest privileged execution level. It is protected from non SMM accesses, locked during pre-boot and is an ideal candidate to host attestation software.
We decided to inject a remote attestation prover within SMM mode for the aforementionned reason. The major drawback from this approach is that thesaid mode is pretty well closed from Intel outsiders and one very often struggle to succeed in legitimately loading software in this region.
For this specific project, we have selected a Gygabyte GB-BKi5A-7200 host machine machine because of its compact form factor and its ability to host a M.2 form factore PCIe peripheral were we would connect a security peripheral (remote attestation verifier).
Several approaches are available in order to extend SMM mode :
:)
We decided to use our security PCIe peripheral to go for the second approach since it supports expansion ROMs.
Unfortunately, due to COVID-19, we struggled to order the M.2 form factor PCIe FPGA we required for this experiment. Consequently, we had to find another solution.
This solution is called Intel DCI, an USB like connection interface to the embedded hardware processor’s hardware debugger which is normally non-available to non Intel partners.
This debugger basically enables us to trap System Management Interrupts (SMI) and write into the SMRAM.
DCI is locked at boot time for obvious security reasons. We succeded to unlock it by impacting modifications of non protected DXE drivers configuration data, which are stored in the platform flash close from EFI variables. Indeed, some of those configuration variable are conditioning DCI feature lock. To succeed, we adapted some publicly available reversed procedures from anonymous contributors.
Once DCI is enabled and that we are able to discuss with the debugger using DFx Abstraction Layer (DAL) python wrappers, we are able to move on to arbitrary code injection step.
We use the following steps :
smm_stage_2
ELF64 application using a custom made bare
metal UEFI ELF64 loader. This application basically prepares itself to be called
from SMM. It is the SMM injected payload. We load it to address @a.rsm
instruction. We use there DCI and DAL python wrapper to inject
smm_stage_1
hook. Its only job is to configure cr3 with smm_state_2
id
mapped page tables (@a + pml4_offset) and jump to its entry point (@a +
entry_offset), for the second timesmm_state_2
is to be a SMI Transfer Monitor loadersources/smm_stage_1/hook.s
.global hook
hook:
// Save GPRs
push %r15
push %r14
push %r13
push %r12
push %r11
push %r10
push %r9
push %r8
push %rdi
push %rsi
push %rbp
push %rbx
push %rdx
push %rcx
push %rax
// Save cr3
mov %cr3, %rax
push %rax
// Set smm_stage_2 overall id mapping,
// see sbss symbol in smm_stage_2.elf
// Page tables symbol is "pages".
// It has to be id mapping to fit with efi
// current memory configuration
movabs 0x100005000, %rax
mov %rax, %cr3
sub $0x8, %rsp
// See kernel_start symbol in smm_stage_2.elf
// Pushq $0x100000000
movq $0x100000000, %rax
pushq %rax
// Call hook
call *(%rsp)
add $0x8, %rsp
// Restore cr3
pop %rax
mov %rax, %cr3
// Restore GPRs
pop %rax
pop %rcx
pop %rdx
pop %rbx
pop %rbp
pop %rsi
pop %rdi
pop %r8
pop %r9
pop %r10
pop %r11
pop %r12
pop %r13
pop %r14
pop %r15
// Breakpoint placeholder
nop
sources/smm_stage_2/main.c
void __attribute__((section(".start")))
kernel_start(void) {
// First call by the loader
if (initialized == 0) {
initialized = 1;
kernel_init();
return;
}
// Elsewise we are in SMM mode, we can play !
INFO("Party harder\n");
}
Before going into details, there is an example of a ia32e paging page walk algorithm, using DAL, just to illustrate the power of the interface.
hook.py
def pageWalking(addr):
cr0 = t0.arch_register("cr0")
cr3 = t0.arch_register("cr3")
cr4 = t0.arch_register("cr4")
efer = t0.arch_register("efer")
print("Page walking @0x%x" % addr)
print("efer.LMA %d" % ((efer >> 10) & 1))
print("efer.LME %d" % ((efer >> 8) & 1))
print("cr0.PG %d" % ((cr0 >> 31) & 1))
print("cr0.PE %d" % ((cr0 >> 0) & 1))
print("cr4.PAE %d" % ((cr4 >> 5) & 1))
print("cr4.LA57 %d" % ((cr4 >> 12) & 1))
# If ia32 paging && no 5 level paging
if ((efer >> 10) & 1) and ((cr4 >> 12) & 1) == 0:
print("cr3 0x%x" % cr3)
cr3 = cr3 & 0x00000000fffffffffffff000
# 1st stage : PML4E
pmle = t0.mem("%x" % \
(cr3 + 8 * ((addr >> (12 + 3 * 9)) & 0x1ff)), 8)
print("PML4E 0x%x" % pmle)
if (pmle & 1) == 0:
print("Non present PML4E")
return
# get the real address
pmle = pmle & ~(0x3fffffff | (0xfff << 52))
# 2nd stage : PDPTE
pdpt = t0.mem("%x" % \
(pmle + 8 * ((addr >> (12 + 2 * 9)) & 0x1ff)), 8)
print("PDPTE 0x%x" % pdpt)
if (pdpt & 1) == 0:
print("Non present PDPTE")
return
if (pdpt >> 7) & 1:
taddr = ((pdpt & ~(0x3fffffff | \
(0xfff << 52))) | (addr & 0x3fffffff))
print("PDPTE 1 GB mapped @%x" % taddr)
return
# get the real address
pdpt = pdpt & ~(0x3fffffff | (0xfff << 52))
# 3rd stage : PDE
pd = t0.mem("%x" % \
(pdpt + 8 * ((addr >> (12 + 1 * 9)) & 0x1ff)), 8)
print("PDE 0x%x" % pd)
if (pd & 1) == 0:
print("Non present PDE")
return
if (pd >> 7) & 1:
taddr = ((pd & ~(0x3fffff | \
(0xfff << 52))) | (addr & 0x3fffff))
print("PDPTE 2Mb GB mapped @%x" % taddr)
return
# get the real address
pd = pd & ~(0x3fffffff | (0xfff << 52))
# 4th stage : PT
pt = t0.mem("%x" % \
(pd + 8 * ((addr >> (12 + 0 * 9)) & 0x1ff)), 8)
print("PTE 0x%x" % pt)
if (pt & 1) == 0:
print("Non present PTE")
return
# Get the physical address :
physicalAddr = ((pt & ~(0xfff | \
(0xfff << 52))) | (addr & 0xfff))
print("PT 4kB mapped @0x%x" % physicalAddr)
We can see that every core register plus physical memory etc are freely accessible using python objects. We can move on to injection.
hook.py
def brhook():
brsmm()
t0.memload("abyme/sources/smm_stage_1/hook.bin", \
"0x8b7c4172")
t0.brnew("0x8b7c41a2")
itp.go()
modifSMRR()
t0.brnew("0x8b7c41c4")
itp.go()
t0.msr(0x1f3, 0xff800800)
t0.msr(0x1f2, 0x8b000006)
t0.mem("0x8b7c4172", 1, 0x0f)
t0.mem("0x8b7c4173", 1, 0xAA)
t0.arch_register("rip", 0x8b7c4172)
First we set a SMI breakpoint and generate an SMI thanks to brsmm()
function.
hook.py
def brsmm():
itp.halt()
t0.breaks.smmentry = 1
t0.port(0xb2, 1)
itp.go()
t0.breaks.smmentry = 0
t0.brremove()
t0.step("Over", 50)
We are now stopped at the last instruction of SMI handler which is rsm
.
Then we hook rsm
instruction with our smm_stage_1
bootstrap. We’ve made
sure that we have enough space after… ;)
We break right before smm_stage_2
call. We change SMRR
MSR configuration to extend the location on the SMRAM,
basically it is extended to the whole memory :D
.
hook.py
def modifSMRR():
SMRR_PHYSBASE = 0x1f2
SMRR_PHYSMASK = 0x1f3
SMM_FEATURE_CONTROL = 0x4E0
print("SMM_Code_Chk_En : %d" % \
(t0.msr(SMM_FEATURE_CONTROL) >> 2))
print("Current state")
smbase = (t0.msr(SMRR_PHYSBASE) >> 12) & 0xfffff
range = (t0.msr(SMRR_PHYSMASK) >> 12) & 0xfffff
print("SMRAM : [%x ; %x]" % (smbase, (smbase + range)))
print("No worries...")
t0.msr(SMRR_PHYSBASE, t0.msr(SMRR_PHYSBASE) & \
(0xffffffff00000fff))
t0.msr(SMRR_PHYSMASK, t0.msr(SMRR_PHYSMASK) & \
(0xffffffff00000fff))
print("Done")
smbase = (t0.msr(SMRR_PHYSBASE) >> 12) & \
0x000000000000fffff
range = (t0.msr(SMRR_PHYSMASK) >> 12) & \
0x0000000000000fffff
print("New SMRAM : [%x ; %x]" % \
(smbase, (smbase + range)))
We break at the end of smm_stage_1
to capture at the very position of its last
instruction and we move on the smm_stage_2
execution to debug whatever we need
to play it hard. We lastly restore the rsm
instruction, original SMRR
configuration and jump back to restore system execution.
This procedure can be used to load arbitrary execute arbitrary code into SMM and restore a normal state.
That’s doable, dirty and quite unstable (debugger connection).
But it works \°/
!
The next (big) step for this project is to load a SMI Transfer Monitor to virtualize SMM.
This project has been funded by Toulouse Tech Transfer company. Many thanks to Camille Garo-Sail who worked on DAL python wrapper during his 1st master’s degree intership.