I was experimenting with running revNg or revamb (aka rev NextGen) on Singlesource benchmarks and found that they are able to generate at-least as many passes as McSema (master branch) is providing. As per, lifting-bits/mcsema#304, I have been asked to use McSema's remill branch, which at first had problems passing even the simple programs Issue 308 and Issue 156.
After fixing those, we got the following results which shows that we can switch to Mcsema people's recommended, remill, branch as its better ( like better FP instruction support) than revamb.
UnitTests
Tool | Total | Exec Pass | Exec Fail | Failed bc -> exe | Failed exe -> cfg | Failed cfg -> bc |
---|---|---|---|---|---|---|
McSema(Remill) | 111 | 98 | 12 | 1 | 0 | 0 |
Revamb | 101 | 88 | 13 | NA | NA | NA |
McSema(master) | 111 | 74 | 34 | 0 | 2 | 1 |
Regression
Tool | Total | Exec Pass | Exec Fail | Failed bc -> exe | Failed exe -> cfg | Failed cfg -> bc |
---|---|---|---|---|---|---|
McSema(Remill) | 65 | 52 | 10 | 3 | 0 | 0 |
Revamb | 65 | 52 | 12 (1 noexe) | NA | NA | NA |
McSema(master) | 65 | 40 | 9 | 0 | 5 | 11 |
Benchmark
Tool | Total | Exec Pass | Exec Fail | Failed bc -> exe | Failed exe -> cfg | Failed cfg -> bc |
---|---|---|---|---|---|---|
McSema(Remill) | 140 | 99 | 41 | 0 | 0 | 0 |
Revamb | 140 | 111 | 29 | NA | NA | NA |
There are the list of notable static decompilations projects.
Tool | Links | Open Sourced | Intermediate IR | Functional |
---|---|---|---|---|
ANgR | 1 | Yes | Not LLVM | Not |
BAP | 1 | Yes | BIL | Not |
Bitblaze | 1 | Yes | Not LLVM | Not |
CodeSurfer | 1 | No | Not LLVM | Not |
JackStab | 1 2 | Yes | not LLVM | Not |
Second Write | 1 | No | LLVM | Yes |
Rev.Ng | 1 2 | Yes | LLVM | Yes |
Revgen | 1 | Yes | LLVM | Limited |
- Mentioning "no" to Function means that there is no demonstrated/published prrof of the fact that the IR is function.
- Revgen have one dynamic soluion Link and they quote
We compare the time it takes to boot differentOSes with two variants of
QEMU 0.9.0. The first variant uses the original x86 to x86 DBT, while the
second uses the x86 to LLVM DBT. We evaluate OSes of increasing complexity:
MS-DOS 6.22, RedHat Linux 2.2.5, Debian Linux 2.6, and Windows XP SP2.
We observe that QEMU runs out of memory while booting Linux and Windows
crashes during logon (the Winlogon process terminates unexpectedly
shortly after the login screen appears).
and another static one Link and I quote
we convert an x86micro-benchmark to LLVMusing RevGen and run the result
in CoreDet [5] . The micro-benchmark has several threads that access
unprotected shared variables, whose value is printed at the end of each
run. Without CoreDet, the printed output differs fromrun to run. With
CoreDet, the output stays the same. This shows that RevGen enabled the
reuse of CoreDet to render binary programs behave deterministically.
- RevNg: Functional column is "Limited" because of the limited evidence [1] (https://github.com/revng/revamb#example-run) and 2
My personal take is still on McSema which is actively suported. Also as pointed out in soe literature that dynamic translators recover an IR by merging the translated blocks, but the recovered IR is incomplete and is only valid for current execution; consequently, various whole program analyses will provide incomplete information.
- Enhance an exiting tool?
- Where do we have the novelty: The binar lifting part or the CFI sing instrumentation or both? Binary liftng part
- What about targetting a subset of benchmarks We dont have to support all the benchmark but a subset.
- what is the reason that McSema lacks in floating point?
- Try to know whic constructs SR might have problems; by lloking at the becnhmarks they support or from the knowledge of the benchmark!
Currently we have ida generating the proto file and mcsema is consuming that. That proto file contain beside other information, the stack variable information (like name and offset with the function frame). The plan was to let mcsema consume two proto files instead of one; the second being the proto file from the debug information containg the debug to the stack and global variables. Mcsema was supposed to associate the stack variable information in the first proto file with the type information from the second file.
Proto file containing debug tpe information ----------------------- ---------|
|
proto file generated by [Mcsema Disass + IDA] containg variable information --> [McSema Lift] to to the binding
Now inorder to have minimum clutter in the mcsema code,
Proto file containing debug tpe information --------------------------------- |
|
proto file generated by [Mcsema Disasm + IDA] containg variable information --> [Type binder] --> proto file --> [McSema Lift]
Stack varibale promotion consist of 2 activities
- Create alloca for each stack with appropriate type [Done]
- Each instruction need to know which stack variabe is accessed and how (e.g. a struct variable V may be accessed like V.field_name OR a pointer variable P, may be accessed like *P)
- Binary Drarf reader link is almost (function pointers soon to be handled) ready and dumpng the output in protobuf binary.
- Basic variable recovery is done.source c file ll file
- The only changes are made in the IDA cfg recovery end. Very little changes in cfg to bc
- Structure recovery script (to extract structure info from IDA) is ready to be deployed.struct recovery
- Pointers : IDA does have much information, may need to look into the debug info.
Consisit of two phases:
- Basic type reconstruction which consists of
- Determine if an object holds a pointer, floating-point or integral value.
- Determine the size of the integral and floating-point type, stored in an object.
- For integral type determine if an object holds a signed or unsigned value.
-
The binary decompilation repository is now compatible to current master of McSema(formerly know as new_reg_assign)
-
All the driver related issues are resolved
- The driver is sharing the procress stack with the McSema regState stack. For example say the driver is calling the extracted fuction with mcsema_regstate.stack == rsp. The expectation is once the extracted function returns the condition mcsema_regstate.stack == rsp must again be true, becuase the driver then uses this mcsema_regstate.stack in the driver code.
- The driver code sometimes uses rbp register for storing result of some computation. Again when it calls to some extracted function, it stores rbp in McSema.regstate.rbp and again the expectation is once the function is returned McSema.regstate.rbp must contain the original rbp.
- Driver being in the assembly format prohibits allexe creation. Wrapped the asm as inline asm in a bc file and use that as the driver toi create allexe.
-
Current work on type recovery
- Read some basic litarature on type recovery.
- The first phase to do pointer recovery to distinguish poointers and integers. (Ongoing )
- The second phase will be to recover the type of the pointed to memory.
- Folloing is the change in supported count from the previous runs on llvm testsuite
Suite | Total | Prev | Unsupp | Supp | Seg | Diff | Pass | Curr | Unsupp | Supp | Seg | Diff | Pass |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UnitTests | 110 | 22 | 88 | 2 | 14 | 72 | 5 | 105 | 2 | 24 | 79 | ||
Regression | 65 | 17 | 48 | 1 | 7 | 40 | 13 | 52 | 2 | 8 | 42 | ||
Benchmark | 138 | 92 | 46 | 6 | 13 | 27 | 40 | 98 | 37 | 24 | 37 | ||
Multisource | 185 | 156 | 29 | NA | NA | NA | 63 | 122 | 38 | 29 | 55 |
NA
- Supported the following vector instructions
Vector Inst | Frequency | Extra variants implemented | Comments |
---|---|---|---|
SHUFPDrri | 169 | SHUFPDrmi | |
MOVUPDrm | 42 | ||
MULPSrr | 12 | ||
MOVUPDmr | 9 | ||
MOVHPDrm | 6 | ||
UNPCKHPDrr | 6 | ||
MULPDrr | 5 | ||
CMPSDrr | 4 | Not Implemented | |
CVTDQ2PSrr | 4 | ||
MULPDrm | 3 | ||
MAXPSrr | 2 | MAXPSrm MAXPDRR MAXPDrm | |
MULPSrm | 2 | ||
PSHUFHWri | 2 | PSHUFHWmi | |
DIVPDrr | 2 | ||
CMPPSrri | 2 | Not Implemented | |
CMPSSrr | 2 | Not Implemented | |
MOVHPDmr | 1 | ||
CMPSDrm | 1 | Not Implemented | |
DIVPDrm | 1 | ||
PMULLWrm | 1 | Not Implemented |
-
Following are already been supported in the recent commits in branch new_reg_assign
Vector Inst Frequency ADDPSrr 13 ADDPSrm 11 PSLLDQri 11 SUBPDrm 6 ADDPDrr 4 ADDPDrm 3 SUBPDrr 3
- McSema future
- Moving to new_reg_assing
- variable recovery: Not based on VSA; "We are starting with the stack variable deduction that IDA gives us during CFG recovery, and iterating from there."
-
After the cleanup, we want to start defining instruction semantics in C++ and building the semantics to bitcode with clang, instead of writing C++ that writes bitcode that implements the semantics.
- Supporing instructiions ??
- What variable info IDA can give us?
-
IDA license generated
-
The local testsuite is tested on IDA cfgs. Enhanced the code to handle some of the constrcts generated by McSema from IDA cfg. Jenkins is green again.
-
Experimented with dagger. Code seems to fail on recognizing external fucntion calls ABIs, which MCsema handles using map file with entries like
printf 1 C N
-
Todo: Recheck these results with the IDA we have. Fix few failures with stack deconstruction analysis
-
Use type info to transform the code so that the pointer types are promoted to pointer typed registers.
Testsuite | Total | Upsupported | Supported | Seg | Diff | Pass |
---|---|---|---|---|---|---|
UnitTests | 111 | 7 | 104 | 8 | 23 | 73 |
Regression | 65 | 13 | 52 | 4 | 8 | 40 |
Benchmark | 138 | 71 | 67 | 26 | 17 | 24 |
Mutisource | 206 | 178 | 28 | 2 | 3 | 23 |
- Single source test run
Testsuite | Total | Original Supported/Unsupported | Current Supported/Unsupported | Seg Faults | Diff | Pass |
---|---|---|---|---|---|---|
UnitTests | 111 | 94 /17 | (94 + 10) /(17 - 10) | 8 | (16 + 7) | (70 + 3) |
Regression | 65 | 49/16 | (49+3) / (16 - 3) | 4 | 7 + 1 | 38 + 2 |
Benchmark | 138 | 47/91 | (47+20) / (91 - 20) | 26 | (2 +15) | 19 + 5 |
Multisource | 206 | 48/158 | 19/187 | 124 | 35 | 126 |
- Original Unsupported: cfg there but bc cannot be obtained due to some errors in the cfg2bc process.
Testsuite | Total IDA CFG created | Supported(by mcsema cfg2bc) | Unsupported(by mcsema cfg2bc) | ^ Successfully generated non-vectorized executable |
---|---|---|---|---|
SingleSource | 314 | 190 | 124 | 76 |
Multisource | 206 | 45 | 161 | ^^ |
^: These are the test cases from the label "CFGS Unsupported(by mcsema cfg2bc)" which I can convert to non-vectored executable. The rest cannot be converted because of error "sse register return with sse disabled"
^^: In this case, most of the unsupported testcases (#161 ) are getting the error "sse register return with sse disabled" or the unsupported instructions are other than vector instructions (so no point in generating non-vectored instructions). Here is the full list (https://github.com/sdasgup3/llvm-test-suite-mcsema/blob/master/SingleSource/unsupported.txt)
- Ed's test suite status
Total cfg generated: 314
CFG cannot get converted to BIN: 124
Get converted to BIN : 190
- Single source test run
Testsuite | Total | Output Diff | CFG to BC Error | Seg Faults | Pass |
---|---|---|---|---|---|
UnitTests | 111 | 14 | 17 | 8 | 72 |
Regression | 65 | 7 | 16 | 3 | 39 |
Benchmark | 138 | 8 | 91 | 24 | 15 |
314 | 29 | 124 | 35 | 126 |
- The diff errors are due to fp instructions and va args.
- The cfg recovery failure are due to unsupported instructions list of unsupported inst
- For Multisource, the driver need to handle the input passed through registers. For Single source the inputs are embedded in the source file only.
-
After looking into the reasons why AA is not able to disambiguate some of the memory references, we tried applying the following series of passes
-mme2reg -dce -early-cse-memssa
.Lets first go through the reasons for applying these:
-
Use of -early-cse-memssa
%arr = alloca i8, i8 16 %ptr = alloca i8* store i8* %arr, i8** %ptr %loadptr1 = load i8*, i8** %ptr %gep_4_loadptr1 = getelementptr inbounds i8, i8* %loadptr1, i8 4 %loadptr2 = load i8*, i8** %ptr %gep_8_loadptr2 = getelementptr inbounds i8, i8* %loadptr2, i8 8
Ander's AA (-cfl-anders-aa) gives MAY_ALIAS(%gep_4_loadptr1, %gep_8_loadptr2) == true But if we can do commom sub-expression elimination, the later part of the code becomes
%loadptr1 = load i8*, i8** %ptr %gep_4_loadptr1 = getelementptr inbounds i8, i8* %loadptr1, i8 4 %gep_8_loadptr2 = getelementptr inbounds i8, i8* %loadptr1, i8 8
and AA can disambiguate %gep_4_loadptr1 and %gep_8_loadptr2 as NO_ALIAS
** NOTE: CSE may not be possible always, and that is when we have to live with spurious may alias results. **
- Use of -mme2reg -dce
%RAX = getelementptr inbounds %struct.regs, %struct.regs* %0, i64 0, i32 0 %1 = load i64, i64* %RAX store i64 %1, i64* %RAX_val ... store i64 %Y, i64* %RAX_val ... %87 = load i64, i64* %RAX_val store i64 %87, i64* %RAX
After mem2reg:
%RAX = getelementptr inbounds %struct.regs, %struct.regs* %0, i64 0, i32 0 %1 = load i64, i64* %RAX ... ... store i64 %Y, i64* %RAX
After dce:
%RAX = getelementptr inbounds %struct.regs, %struct.regs* %0, i64 0, i32 0 ... ... store i64 %Y, i64* %RAX
I tried the sequences of passes on test_1.c Transforming mcsema output test_1.clang.ll --> test_1.clang.trans.trio.ll
In the transformed output, AA can disambiguate
%_new_gep_ , %_new_gep_1, %_new_gep_4
because of the reason of cse mentioned earlier.Similarly, applying the same on test_0.c and able to disambiguate the pointer and the structure variable. Transformation is from mcsema generated IR test_0.clang.ll to test_0.clang.trans.trio.ll
In the transformed output, AA can disambiguate
%_new_gep_ , %_new_gep_1, %_new_gep_3, %_new_gep_6
-
- Memory dependence analysis in general
define i32 @foo() {
%x = alloca i32, align 4
%xp = alloca i32*, align 8
store i32 0, i32* %x
store i32* %x, i32** %xp, align 8
%1 = load i32*, i32** %xp, align 8
%2 = load i32, i32* %1, align 4
ret i32 0
}
Def from: %x = alloca i32, align 4
store i32 0, i32* %x
Def from: %xp = alloca i32*, align 8
store i32* %x, i32** %xp, align 8
Def from: store i32* %x, i32** %xp, align 8
%1 = load i32*, i32** %xp, align 8
Clobber from: store i32 0, i32* %x
%2 = load i32, i32* %1, align 4
define i32 @foo() {
%1 = alloca i32, align 4
%result = alloca i32, align 4
%x = alloca i32, align 4
%xp = alloca i32*, align 8
store i32 0, i32* %1
store i32 5, i32* %result, align 4
store i32 7, i32* %x, align 4
store i32* %x, i32** %xp, align 8
%2 = load i32, i32* %x, align 4
%3 = icmp eq i32 %2, 4
br i1 %3, label %4, label %7
; <label>:4 ; preds = %0
%5 = load i32*, i32** %xp, align 8
%6 = load i32, i32* %5, align 4
store i32 %6, i32* %result, align 4
br label %8
; <label>:7 ; preds = %0
store i32 42, i32* %result, align 4
br label %8
; <label>:8
ret i32 0
}
; opt -memdep -print-memdeps -gvn -analyze test_3.ll
;Def from: %1 = alloca i32, align 4
; store i32 0, i32* %1
;Def from: %result = alloca i32, align 4
; store i32 5, i32* %result, align 4
;Def from: %x = alloca i32, align 4
; store i32 7, i32* %x, align 4
;Def from: %xp = alloca i32*, align 8
; store i32* %x, i32** %xp, align 8
;Def from: store i32 7, i32* %x, align 4
; %2 = load i32, i32* %x, align 4
;Def in block %0 from: store i32* %x, i32** %xp, align 8
; %5 = load i32*, i32** %xp, align 8
;Unknown in block %4
; %6 = load i32, i32* %5, align 4
; Def in block %0 from: store i32 5, i32* %result, align 4
; store i32 %6, i32* %result, align 4
; Def in block %0 from: store i32 5, i32* %result, align 4
; store i32 42, i32* %result, align 4
; opt -basicaa -aa-eval -print-all-alias-modref-info test_2.ll -disable-output
;NoAlias: i32* %1, i32* %result
;NoAlias: i32* %1, i32* %x
;NoAlias: i32* %result, i32* %x
;NoAlias: i32* %1, i32** %xp
;NoAlias: i32* %result, i32** %xp
;NoAlias: i32* %x, i32** %xp
;NoAlias: i32* %1, i32* %5
;NoAlias: i32* %5, i32* %result
;MayAlias: i32* %5, i32* %x
;NoAlias: i32* %5, i32** %xp
- Memory dependence analysis on mcsema generated IR (test)[https://github.com/sdasgup3/binary-decompilation/blob/master/test/variable_recovery/test_1/Output/test_1.clang.trans.ll]
-
All the testsuite testcases are converted to allexe and tested
Format::binary --> tool::Mcsema --> Format::LLVM IR --> Tool::ALLIN --> Format::IR --> Tool::bc2allvm --> Format::allexe | | | | | | | |\ Tool::clang -> O2 |\ Tool::alley --> O3 | / / |__\ Tool::clang --> O1 Passing Definition: O 1 == O 2 == O 3 /
- Problem with indirect calls [calgrapph](Figs/test_23_1.callgraph.pdf)
- Till now first we identify the calls and agment them with actual arguments (parent %rsp and parent %rbp pointer) and go to the called function to augment the formal arguments.
- This is not possible for idirect calls
- Proposed Soln: Modify all the internal functions and call to non library function
- Experiment with `opt -cfl-anders-aa -aa-eval -print-all-alias-modref-info`
define void @main() { entry: %X = alloca i8* %Y = alloca i8*
%a = alloca i8, i64 32 %b = alloca i8 %c = alloca i8
store i8* %a, i8** %X store i8* %b, i8** %Y
%LX = load i8* , i8** %X %LY = load i8* , i8** %Y } Anders Query NoAlias: i8* %LX, i8* %LY
define void @main() { entry: %X = alloca i8* %Y = alloca i8*
%a = alloca i8, i64 32 %b = alloca i8 %c = alloca i8
store i8* %a, i8** %X store i8* %b, i8** %Y
%LX = load i8* , i8** %X %LY = load i8* , i8** %Y
%new_addr = getelementptr i8, i8* %LX, i64 8 %new_val = ptrtoint i8* %LY to i8 store i8 %new_val , i8* %new_addr store i8* %new_addr , i8** %X ; The folllowing makes %LX %LY May Alias store i8* %new_addr , i8** %Y }
[anders AA](Figs/anders_AA.jpg)
### 06 Oct 2016
#### Variable Recovery
##### Stage I: Transforming the mcsema generated IR so as to facilitate alias analysis
- Because of the presenseof int2ptr in mcsema generated code, AA like basic AA is not
giving any meaningful results. So we decide to avoid the int2ptr/ptrtoints as much as possible.
```llvm
; Consider the code before the transformation, test.ll
define internal void @foo() {
entry:
%RSP_val = alloca i64
%_local_stack_start_ptr_ = alloca i8, i64 32
%_local_stack_end_ptr_ = getelementptr inbounds i8, i8* %_local_stack_start_ptr_, i64 32
%_local_stack_end_ = ptrtoint i8* %_local_stack_end_ptr_ to i64
store i64 %_local_stack_end_, i64* %RSP_val
%0 = load i64, i64* %RSP_val
%1 = inttoptr i64 %0 to i64*
store i64 0, i64* %1
%2 = add i64 %0, -16
%3 = inttoptr i64 %2 to i64*
store i64 1, i64* %3
ret void
}
$ opt -basicaa -aa-eval -print-alias-sets test.ll -disable-output
Alias sets for function 'foo':
Alias Set Tracker: 1 alias sets for 3 pointer values.
AliasSet[0x27a2c90, 3] may alias, Mod/Ref Pointers: (i64* %RSP_val, 8), (i64* %1, 8), (i64* %3, 8)
$ opt -basicaa -aa-eval -print-all-alias-modref-info test.ll -disable-output
Function: foo: 5 pointers, 0 call sites
NoAlias: i64* %RSP_val, i8* %_local_stack_start_ptr_
NoAlias: i64* %RSP_val, i8* %_local_stack_end_ptr_
NoAlias: i8* %_local_stack_end_ptr_, i8* %_local_stack_start_ptr_
MayAlias: i64* %1, i64* %RSP_val
MayAlias: i64* %1, i8* %_local_stack_start_ptr_
MayAlias: i64* %1, i8* %_local_stack_end_ptr_
MayAlias: i64* %3, i64* %RSP_val
MayAlias: i64* %3, i8* %_local_stack_start_ptr_
MayAlias: i64* %3, i8* %_local_stack_end_ptr_
MayAlias: i64* %1, i64* %3
; Consider the code after the transformation, test.trans.ll
define internal void @foo() {
entry:
%_RSP_ptr_ = alloca i8*
%_local_stack_start_ptr_ = alloca i8, i64 32
%_local_stack_end_ptr_ = getelementptr inbounds i8, i8* %_local_stack_start_ptr_, i64 32
store i8* %_local_stack_end_ptr_, i8** %_RSP_ptr_
%_load_rsp_ptr_ = load i8*, i8** %_RSP_ptr_
%_allin_new_bt_ = bitcast i8* %_load_rsp_ptr_ to i64*
store i64 0, i64* %_allin_new_bt_
%_new_gep_ = getelementptr i8, i8* %_load_rsp_ptr_, i64 -16
%_allin_new_bt_2 = bitcast i8* %_new_gep_ to i64*
store i64 1, i64* %_allin_new_bt_2
ret void
}
$ opt -basicaa -aa-eval -print-alias-sets test.trans.ll -disable-output
Alias sets for function 'foo':
Alias Set Tracker: 3 alias sets for 3 pointer values.
AliasSet[0x3760c40, 1] must alias, Mod/Ref Pointers: (i8** %_RSP_ptr_, 8)
AliasSet[0x3760ce0, 1] must alias, Mod Pointers: (i64* %_allin_new_bt_, 8)
AliasSet[0x3760d80, 1] must alias, Mod Pointers: (i64* %_allin_new_bt_2, 8)
$ opt -basicaa -aa-eval -print-all-alias-modref-info test_2.trans.ll -disable-output
Function: foo: 7 pointers, 0 call sites
MustAlias: i64* %_allin_new_bt_, i8* %_load_rsp_ptr_
MustAlias: i64* %_allin_new_bt_2, i8* %_new_gep_
NoAlias: i8* %_local_stack_start_ptr_, i8** %_RSP_ptr_
NoAlias: i8* %_local_stack_end_ptr_, i8** %_RSP_ptr_
NoAlias: i8* %_local_stack_end_ptr_, i8* %_local_stack_start_ptr_
NoAlias: i8* %_load_rsp_ptr_, i8** %_RSP_ptr_
NoAlias: i64* %_allin_new_bt_, i8** %_RSP_ptr_
NoAlias: i8* %_new_gep_, i8** %_RSP_ptr_
NoAlias: i8* %_load_rsp_ptr_, i8* %_new_gep_
NoAlias: i64* %_allin_new_bt_, i8* %_new_gep_
NoAlias: i64* %_allin_new_bt_2, i8** %_RSP_ptr_
NoAlias: i64* %_allin_new_bt_2, i8* %_load_rsp_ptr_
NoAlias: i64* %_allin_new_bt_, i64* %_allin_new_bt_2
MayAlias: i8* %_load_rsp_ptr_, i8* %_local_stack_start_ptr_
MayAlias: i8* %_load_rsp_ptr_, i8* %_local_stack_end_ptr_
MayAlias: i64* %_allin_new_bt_, i8* %_local_stack_start_ptr_
MayAlias: i64* %_allin_new_bt_, i8* %_local_stack_end_ptr_
MayAlias: i8* %_local_stack_start_ptr_, i8* %_new_gep_
MayAlias: i8* %_local_stack_end_ptr_, i8* %_new_gep_
MayAlias: i64* %_allin_new_bt_2, i8* %_local_stack_start_ptr_
MayAlias: i64* %_allin_new_bt_2, i8* %_local_stack_end_ptr_
- With this transformation all the tests are passing.
Fromat::binary --> tool::Mcsema --> Format::LLVM IR --> Tool::ALLIN --> Format::IR
| |
| |
| |____\ Tool::clang --> Output2
| /
|__\ Tool::clang --> Output1 Passing Definition: Output 1 == Output 2
/
- Implementation Details
-
[test] (https://github.com/sdasgup3/binary-decompilation/blob/var_recovery/test/variable_recovery/test_0/test_0.c) diff
-
[test] (https://github.com/sdasgup3/binary-decompilation/blob/var_recovery/test/variable_recovery/test_1/test_1.c) diff
-
The transformation happens in 2 phases. First new instructions are added based on fact that loads/stores of i64* RSP_val (or RBP_val) need to be replaced with corresponding loads/stores of i8** RSP_ptr (or RBP_ptr). But the old insructions are still kept. In the second phase dce removes most if the dead instructions.
; %rax = MEM[rbp + 40] %84 = load i64, i64* %RBP_val %85 = add i64 %84, 40 %86 = inttoptr i64 %85 to i64* %87 = load i64, i64* %86 store i64 %87, i64* %RAX_val ; Phase 1 %_new_load_4 = load i8*, i8** %_RBP_ptr_ %87 = load i64, i64* %RBP_val %_new_gep_5 = getelementptr i8, i8* %_new_load_4, i64 40 %88 = add i64 %87, 40 %_new_bt_6 = bitcast i8* %_new_gep_5 to i64* %89 = inttoptr i64 %88 to i64* %90 = load i64, i64* %_new_bt_6 store i64 %90, i64* %RAX_val ; Phase 2; After dce %_new_load_4 = load i8*, i8** %_RBP_ptr_ %_new_gep_5 = getelementptr i8, i8* %_new_load_4, i64 40 %_new_bt_6 = bitcast i8* %_new_gep_5 to i64* %90 = load i64, i64* %_new_bt_6
-
- Whats is going to be the pointer type of RSP_ptr or RBP_ptr
/* if we use RSP_ptr as i64** */
RSP_ptr = i64* alloca
...
// Usage 1
%1 = load i64** RSP_ptr
%2 = bitcast i64* %1 to i8*
%3 = getelementptr i8, i8* %2, i32 offset
%4 = bitcast i8* %3 to i32*
store i32 val, i32* %4
// Usage 2
%1 = load i64** RSP_ptr
%2 = bitcast i64* %1 to i8*
%3 = getelementptr i8, i8* %2, i32 offset
%4 = bitcast i8* %3 to i64*
store i64* %4, i64** %RSP_ptr
/* if we use RSP_ptr as i8** */
RSP\_ptr = i8*
...
// Usage 1
%1 = load i8** RSP_ptr
// NOT REQUIRED %2 = bitcast i64* %1 to i8*
%3 = getelementptr i8, i8* %2, i32 offset
%4 = bitcast i8* %3 to i32*
store i32 val, i32* %4
// Usage 2
%1 = load i8** RSP_ptr
// NOT REQUIRED %2 = bitcast i64* %1 to i8*
%3 = getelementptr i8, i8* %2, i32 offset
// NOT REQUIRED %4 = bitcast i8* %3 to i64*
store i64* %4, i8** %RSP_ptr
Currently the register context contain all all the registers except the stack which we managed to chop off from it. The idea here is to get pass and return only those registers which are relevant.
Consider the folowing snippet of MCSema extracted IR This is to show that some registers like %CTX.XMM15 are not "really" used in the function as opposed to %CTX.RSP
define internal x86_64_sysvcc void @foo(%struct.regs* %CTX) {
entry:
; Allocating the local stack of size 32*64 bits, 32 is inferred from static stack height computation
%_local_stack_alloc_ = alloca i64, i64 32
%_local_stack_start_ptr_ = getelementptr inbounds i64* %_local_stack_alloc_, i32 0
%_local_stack_start_ = ptrtoint i64* %_local_stack_start_ptr_ to i64
%_local_stack_end_ = add i64 %_local_stack_start_, 32
; Local allocas for register variables
%XMM15_val = alloca i128
%RSP_val = alloca i64
; Storing values to to local allocas
%XMM15 = getelementptr inbounds %struct.regs* %0, i64 0, i32 69
%74 = load i128* %XMM15
store i128 %74, i128* %XMM15_val
store i64 %_local_stack_end_, i64* %RSP_val
; "Real" Use of RBP_val
; Code for mov %rbp to %rsp
%77 = load i64* %RBP_val, !mcsema_real_eip !2
%78 = load i64* %RSP_val, !mcsema_real_eip !2
%79 = add i64 %78, -8
%80 = inttoptr i64 %79 to i64*, !mcsema_real_eip !2
store i64 %77, i64* %80, !mcsema_real_eip !2
; Using just to store it back to register context
%168 = load i128* %XMM15_val, !mcsema_real_eip !8
store i128 %168, i128* %XMM15, align 1, !mcsema_real_eip !8
call bar();
%238 = load i128* %XMM15, align 1, !mcsema_real_eip !8
store i128 %238, i128* %XMM15_val, !mcsema_real_eip !8
...
%338 = load i128* %XMM15_val, !mcsema_real_eip !13
store i128 %338, i128* %XMM15, align 1, !mcsema_real_eip !13
The take away is
- Cuurently all the registers are passed on as arguments to all the functions
- All the registers are returned from all the function using the struct context register pointer.
- The aim for "Function prototype detection" to is to figure out
- The arguments to functions, which will then be passed as arguments
- The return values; which will be returned from the function and in case of multiple values, returned as a struct.
Consider the example which demonstrats that a register potentially written in a caller should be passed as argument of all the subsequent callee.
foo () { bar () { car() {
reg_defined car(); reg_used;
bar();
} } }
Consider the example which demonstrats that a register potentially written in a callee will be returned to all the ancestor callers
foo () { bar () { car() {
bar();
reg_used car(); reg_sritten;
} } }
The idea here is to use pointer analysis to partition the local stack which will give us start points of variables and using range analysis to figure out the sizes of those partitions.
; Allocating the local stack of size 32*64 bits, 32 is inferred from static stack height computation
%_local_stack_alloc_ = alloca i64, i64 32
%_local_stack_start_ptr_ = getelementptr inbounds i64* %_local_stack_alloc_, i32 0
%_local_stack_start_ = ptrtoint i64* %_local_stack_start_ptr_ to i64
%_local_stack_end_ = add i64 %_local_stack_start_, 32
store i64 %_local_stack_end_, i64* %RSP_val
; "Real" Use of RBP_val
; Code for mov %rbp to %rsp
%78 = load i64* %RSP_val, !mcsema_real_eip !2
%79 = add i64 %78, -8
%80 = inttoptr i64 %79 to i64*, !mcsema_real_eip !2
store i32 10, i64* %80, !mcsema_real_eip !2
%81 = add i64 %78, -12
%82 = inttoptr i64 %79 to i64*, !mcsema_real_eip !2
store i32 20, i64* %80, !mcsema_real_eip !2
So here we are looking for a AA which should say that %80 and %82 are not aliased. Tried with basicAA, globalsmodref-aa and scev-aa and all of them says that the locations are aliased. Need to try ander-aa and dsa-aa
We can model a simple type analysis as follows:
- Multiplication, substraction, shifting, xor, binary and, binary or and division force their “parameters” to be integers.
- Dereferencing forces its parameter to be a pointer.
- The return values of standard library functions are maintained.
- Any variable that is branched on is a boolean.
- If two variables are added together and one is a pointer, the other is an integer.
- If two variables are added together and one is an integer, the other is either a pointer or an integer.
- If two variables are compared with <, >, >= or <=, they are both integers.
- If two variables are compared with == or !=, they have the same type.
- If something is returned from main(), it is an integer.
- If the value of one variable is moved into another variable, they have the same type.
- If the dereferenced value of a pointer has type τ , then the pointer has type τ∗.
- The sum of a pointer of type τ ∗ and an integer is a pointer, but not necessarily of type τ
- Finished the okmplementation of stack decpnstruction
Before the transformation the mcsema code looks like this define fiunc(struct %0) { //For accessing register in the struct RAX_val = alloca RAX = getelementptr .. // read the register loc in the struct %temp = load RAX store %temp RAX_val //For accessing the stack in the struct RSP_val = alloca RSP = getelementptr .. // read the stack loc in the struct %temp = load RSP store %temp RSP_val // assign the start addres of the stack to local variable RSP_val .... ... = load RAX_val ... = load RSP_val // or any offset from RSP_val } After the moficication define func(struct %0, %parent_stack_end ) { %_local_stack_alloc_ = alloca i64, i64 0 %_local_stack_start_ptr_ = getelementptr inbounds i64* %_local_stack_alloc_, i32 0 %_local_stack_start_ = ptrtoint i64* %_local_stack_start_ptr_ to i64 %_local_stack_end_ = sub i64 %_local_stack_start_, 0 //To be passed to any called function //For accessing register in the struct RAX_val = alloca RAX = getelementptr .. // read the register loc in the struct %temp = load RAX store %temp RAX_val //For accessing the stack in the struct RSP_val = alloca RSP = getelementptr .. // read the stack loc in the struct %temp = load RSP store %_local_stack_start_ RSP_val // assign the start addres of the stack to local variable RSP_val .... if(RAX_val < _local_stack_start_) { // As the alloca of local stack _local_stack_alloc_ is before the alloca of RAX_val, this condition should always be true ... = load RAX_val } else { //compute the offset in the parent stack and load it } if(RSP_val < _local_stack_start_) { ... = load RSP_val }else { %offset = RSP_val - _local_stack_start_; %address_in_parent_stack = %offset + %parent_stack_end; ... = load %address_in_parent_stack }
-
Worked on deconstruction the global stack ( which is shared by all the procedures ) into per procedure stack frame
- Transforming all the dereferences into the following checks
Let curr_frame_start_ptr: Base address of the current stack frame parent_frame_start_ptr: Base address of the parent frame if(PTR < curr_frame_start_ptr) { //PTR corresponds to current procedure stack frame //dereference as usual } else { offset_in_parent_stack1 = PTR - curr_frame_start_ptr; offset_in_parent_stack = offset_in_parent_stack1 - 8; // This 8 bits is to get past the location used for return address storage dereference *[ (parent_frame_start_ptr + parent_frame_height) - offset_in_parent_stack)] }
Note: Currently all the load/store instructions are transformed like above. But if we have a static analysis like VSA, then for many cases we will be able to know the precise value of PTR and can prevent emitting these static checks. But we still be emiting the checks for those PTR's for which VSA fails to infer the values(e.g the values load from memory).
-
Getting rid of impreciseness in computing stack heights statically
Before doing this stack deconstruction, we were statically computing the stack heights and using that to compute the
parent_frame_start_ptr + parent_frame_height
. As we can imagine that computing the precise value the above needs precise value of the stack height, which is not possible with a static analysis.Currently we still be using the statically computed stack heights which is actually the maximum possible height that the stack can grow. And based on that will allocate a stack S. We will be instrumenting all the stack writes (writes within the boundaries of S) so as to track the last written location.
[mail link] (https://github.com/sdasgup3/binary-decompilation/blob/stable/docs/Retped.md)
-
Starting referring the dissertation: POLYMORPHIC TYPE INFERENCE FOR LANGUAGES WITH OVERLOADING AND SUBTYPING, Geoffrey Seward Smith
-
Few relevant questions w.r.t the polymorphic type inference paper.
-
Question 1 In reference to this particular paper, is variable recovery included in this type inference ? Or the variables (like scalar and aggregate variables) are already recovered before this particular type inference kicks in ? Or variable recovery is irrelevant in the context of this paper, because this paper only talks about inferring the type of stack offsets?
From the constraints generated (as given in Figure 2), it seems that some constraints considers the offset information of the fields of a structure (w.r.t the procedure's base pointer ), which after solving is going to help in knowing the type of the those fields. But it is not obvious to me if this particular paper is doing any kind of "structure variable" recovery using those offset information ?
More specifically, In figure 2 of the paper, we can see the following constraints corresponding to instructions mov eax ,dword [edx] and mov eax ,dword [edx + 4] respectively.
> τ.load.σ32@0 <: τ > τ.load.σ32@4 <: int ∧#FileDescriptor
Are these constraints also used to infer a "structure variable" with two fields at offset 0 and 4? Or its is used just to infer the type of the variables at offsets 0 and 4 from the base pointer of the procedure?
-
Question2
This question is related to my previous question.
As you mentioned that you do not need any pointer analysis information to infer that x has a the type struct S { struct S , ...}. I believe that the way you achieve this is by adding a new constraint for each instruction corresponding to the access of the fields of x and solving them later. (Please correct me if I am wrong here). If this is correct, then doesn't that mean that you already have the information that x is a struct (but of course without the type information of its fields) before starting this type inference?
-
- Review of "Polymorphic type inference for machine code"
The mean distance to the true type was 0.54 for Retypd, compared to 1.15 for dynamic TIE, 1.53 for REWARDS,
1.58 for static TIE, and 1.70 for SecondWrite. The mean interval size shrunk to 1.2 with Re- typd, compared to 1.7 for SecondWrite and 2.0 for TIE.
ElWazeer et al. [8] also in- troduced a multi-level pointer accuracy rate, which attempts to quantify how many “levels” of pointers were correctly in- ferred. On SecondWrite’s benchmark suite Retypd attained a mean multi-level pointer accuracy of 91%, compared with SecondWrite’s reported 73%. Across all benchmarks, Re- typd averages 88% pointer accuracy.
SecondWrite’s overall conservativeness is 96%, measured on a subset of the SPEC2006 benchmarks; Retypd attained a slightly lower 94% on this subset.
- Retyped uses subtyping based type inference as opposed to unificatiion based in Second write. And it seems that that is what gives all the better performance numbers
- This paper does not seem to do any kind of variable recovery. They quote in page 5, second column , last but one para:
These rules ensure that Retypd can perform “iterative
variable recovery”; lack of iterative variable recovery was cited by the authors of the Phoenix decompiler [27] as a common cause of incorrect decompilation when using TIE [15] for type recovery.
- From Second write paper:
Integrating our variable identification system with type recovery makes the type
recovery simpler because it will need only recover scalar types like integers, floats and doubles.
Structures and arrays are detected as part of the variable identification.
from paper
For the other operations in the table, we propagate the types using the function unifyType.
This function attempts to set the data type of all the given symbols andALocs to be the same.
At least one of the symbols or the ALocs given to that function should be typed. Whenever this
function finds conflicting types, it gives up and does not update any types. It is used for copy
operations like type casts and phi nodes. It is also used to propagate types through memory as shown
in the rules for stores and loads. Interprocedural information is propagated by unifying the formal and
actual arguments types at a call instruction. The return value data type at the call site is unified
with all the data types of all return values appearing in the return statements inside the called function body.
- Handling "Arguments passed to callee using parent stack frame"
- The call instrutions (to mcsema generated functions) are modified to add caller stack frame as an extra argument. DONE commit
- Planning to add checks after add instrutions to rsp and rbp such that if the access if beyond the current stack frame (in the positive) direction, then access the parent stack frame to access the argument. WIP
- Implemented and tested a basic pass which maps the access w.r.t to the global stack (provided by mcsema register context) to local stack per procedure.
- This is a transform pass on the mcsema generated llvm ir.
- This is done by replacing the following instructions in each procedure
%RSP_val = alloca i64, !mcsema_real_eip !2
%RSP = getelementptr inbounds %struct.regs* %0, i64 0, i32 6, !mcsema_real_eip !2 ; Reading the register context to get the stack pointer
%7 = load i64* %RSP, !mcsema_real_eip !2
store i64 %7, i64* %RSP_val, !mcsema_real_eip !2 ; Storing the stack pointer in the local variable %RSP_val
; All subsequesnt computations are using %RSP_val
by
%RSP_val = alloca i64, !mcsema_real_eip !2
; Following two are dead instructions
%RSP = getelementptr inbounds %struct.regs* %0, i64 0, i32 6, !mcsema_real_eip !2 ;
%7 = load i64* %RSP, !mcsema_real_eip !2
%_local_stack_alloc_ = alloca [32 x i64] ; The stack height 32 is determined by a previous dfa pass ; Newly inserted
%_local_stack_gep_ = getelementptr inbounds [32 x i64]* %_local_stack_alloc_, i32 0, i32 0 ; Newly inserted
%_local_stack_P2I_ = ptrtoint i64* %_local_stack_gep_ to i64 ; Newly inserted
store i64 %_local_stack_P2I_, i64* %RSP_val ; Newly inserted
; All subsequesnt computations are using %RSP_val
- Limitation: Arguments passed to callee using parent stack frame are not handled yet.
- Planning to pass the parent proc's local stack as an argument to callee so that callee can access them.
- Fixed a issue with the previous implementation
-
There is a difference in which gcc and clang generated there prologue and epilogue for each function.
- clang generated binary
//prologue push %rbp mov %rsp,%rbp sub $0x20,%rsp // Allocation for local space ... //epilogue add $0x20,%rsp // De-Allocation of local space // At this point the rsp is pointing to the stack loc containing // previously pushed rbp pop %rbp retq // pop the return address
and corresponding mcsema generated llvm ir for last 2 instructions
%254 = load i64* %RSP_val %268 = inttoptr i64 %254 to i64*, !mcsema_real_eip !12 ;popq %rbp %269 = load i64* %268, !mcsema_real_eip !12 ;popq %rbp store i64 %269, i64* %RBP_val, !mcsema_real_eip !12 ;popq %rbp %270 = add i64 %254, 16, !mcsema_real_eip !13 ;retq store i64 %270, i64* %RSP_val, !mcsema_real_eip !11 ;retq
- gcc generated binary
//prologue push %rbp mov %rsp,%rbp sub $0x20,%rsp // Allocation for local space ... //epilogue leaveq // %rsp = %rbp; pop %rbp retq // pop the return address
and corresponding mcsema generated llvm ir for last 2 instructions
%248 = load i64* %RBP_val, !mcsema_real_eip !10 ;leave store i64 %248, i64* %RSP_val, !mcsema_real_eip !10 ;leave %249 = inttoptr i64 %248 to i64*, !mcsema_real_eip !10 ;leave %250 = load i64* %249, !mcsema_real_eip !10 ;leave store i64 %250, i64* %RBP_val, !mcsema_real_eip !10 ;leave %251 = add i64 %248, 16, !mcsema_real_eip !11 ;retq store i64 %251, i64* %RSP_val, !mcsema_real_eip !11 ;retq
-
The issue (explained next) is with the gcc generated binary and related to leave instrcution.
-
In the previous implementation, before doing the global iterative dfa, we determine the local (i.e. restricted to a bb) constant (i.e. does not depend on In/Out) Gen <actual_rsp, max_disp_rsp, actual_rbp, max_disp_rbp> as follows:
Gen[bb]::actual_rsp = Actual displacement of rsp across the bb with initial value of rsp/rbp assumed as 0. Gen[bb]::max_disp_rsp = max (Out[I]::max_disp_rsp) for all I in bb. - correspondingly for rbp -
Note Gen is calculated with initial value of rsp/rbp as 0.
-
Consider the calculation of actual_rsp component of Gen for an exit block (which will have the epilouge) for gcc generated binary
-
The actual rsp calculation (which is supposed to be rsp = rbp; rsp += 16;) is wrong as we have not considered the fact that actual_rsp is dependent on the In::actual_rbp. In other words, the calculation of Gen is not a local property, but dependent on the In.
-
So we modified the global dfa so that gen are calculated during the iterative global dfa.
-
- Testing the previous implementation with icc (clang and gcc are working fine) generated binary
- icc generated binary gives error with external calls
- Started on the implementation on converting accesses on global mcsema stack to per function stack accesses.
- Augment the automated testing with switches to generate binaries (to be consumed by mcsema) using different coppilers.
- Planning to augment the automated testing with switches for different calling conventions
- Implemented a pass to "find the maximum stack height growth"
-
A forward data flow analysis (dfa).
-
Each program point is associated with the following data flow value : {
actual_rsp
,actual_rbp
,max_disp_rsp
,max_disp_rbp
} where-
actual_rsp
( oractual_rbp
) = Actual displacement of%rsp
(or%rbp
). For example, for a statementsub $0x20,%rsp
, if%rsp
value isx
before the statement, thenactual_rsp
becomesx - 32
after it. -
max_disp_rsp
( ormax_disp_rbp)
= Offset of the stack access w.r.t%rsp
(or%rbp
). For example, for a statementmov -0x4(%rsp),%esi
, if%rsp
value isx
before the statement, thenmax_disp_rsp
becomesx-4
after it. -
Note: Both
actual_rsp
andmax_disp_rsp
need to be separately tracked.- Problem with having only
actual_rsp
sub $0x8,%rsp mov -0xc(rsp), %edi ;actual_rsp = -8, but max stack height = -0xc - Ox8
- Problem with having only
max_disp_rsp
(in negative direction)
sub $0x8,%rsp sub $0xc,%rsp ;max_disp_rsp = -0xc, but max stack height = -0x14
- Also just adding the offsets will not do.
mov -0x8(rsp), %edi sub $0xc, %rsp ;Adding the constants gives max stack height as 0x14, but its actually -0xc.
- Problem with having only
-
-
Local dfa within a bb: Calculating
Gen[bb]
-
Each instruction
I
(which may potentially affect rsp or rbp) within a bb is tracked to obtain the data flow values before,In[I]
and after,Out[I]
. This example captures all kinds of instructions considered and how the data values are propagated from one instruction to other within a bb. The call instruction in the figure results in to%rsp += 8
because it is assumed that the function is well formed with conventional prologue and epilogue and the only change that can happen to%esp
is pop of return address. -
After the data value propagation, Gen[bb] is computed as follows:
Gen[bb]::actual_rsp = Actual displacement of rsp across the bb with initial value of rsp/rbp assumed as 0. Gen[bb]::max_disp_rsp = max (Out[I]::max_disp_rsp) for all I in bb.
-
In the running example,
Gen[bb] = { 8, -64, 0, 0}
-
-
Global dfa: Calculating
In[bb]
andOut[bb]
- Meet operator: Calculating
In[bb]
as a function ofOut[pped_bb]
,
//For any pair of predecessor pred_bb_x and pred_bb_y if ( Out[pred_bb_x]::actual_rsp == OUT[pred_bb_y]::actual_rsp && OUT[pred_bb_x]::actual_rbp == OUT[pred_bb_y]::actual_rbp) { In[bb]::actual_rsp = Out[pred_bb_x]::actual_rsp; In[bb]::actual_rbp = Out[pred_bb_x]::actual_rbp; In[bb]::max_disp_rsp = min ( OUT[pred_bb_x]::max_disp_rsp, OUT[pred_bb_y]::max_disp_rsp) In[bb]::max_disp_rbp = min ( OUT[pred_bb_x]::max_disp_rbp, OUT[pred_bb_y]::max_disp_rbp) } else { In[bb] = Bottom }
-
The value
Bottom
(no useful information) is for the cases where the%rsp
or%ebp
is updated differently in control flow paths before the join. This is going to solve two scenarios- Consider two branches of a conditional statement; in both of them
%rsp
is updated differently and then an ancestor variable is accessed . Now if we choose height = max of%rsp
updates, and use it to deconstruct the global stack to local stack frame, then in case of indirect access it will not be possible to put a static check to distinguish which stack frame the access belongs to. - If in the while loop body
%rsp
is updated, then it is not statically possible to figure out the value of stack height and we getBottom
in that scenario as well.
- Consider two branches of a conditional statement; in both of them
-
Transfer function: Calculating
Out[bb]
as a function ofGen[bb]
andIn[bb]
if(In[bb] == Bottom) { Out[bb] = Bottom; } else { Out[bb]::actual_rsp = In[bb]::actual_rsp + Gen[bb]::actual_rsp; Out[bb]::actual_rbp = In[bb]::actual_rbp + Gen[bb]::actual_rbp; Out[bb]::max_disp_rsp = min ( In[bb]::actual_rsp + Gen[bb]::max_disp_rsp, In[bb]::max_disp_rsp; Out[bb]::max_disp_rbp = min ( In[bb]::actual_rbp + Gen[bb]::max_disp_rbp, In[bb]::max_disp_rbp; }
- A
Bottom
inIn
orOut
prevents deconstruction of stack frames. During testing we do NOT get any cases withBottom
appering inIn
orOut
.
- A
-
[This example] (https://github.com/sdasgup3/binary-decompilation/blob/master/source/test/max-stack-height/test_5/cfg.png) shows two cfgs corresponding to main (bigger one) and draw routines of maze program with the following interpretation ( Note: The reported
In[bb]
,Gen[bb]
andOut[bb]
are fixedpoint values )
- Meet operator: Calculating
-
Max stack height of function F
max ( Out[bb]::max_disp_rsp, Out[bb]::max_disp_rsp ) for all bb.
- Tested the implementation.
- A [testsuite] (../source/test/max-stack-height/) is incrementally created.
- Added 25 test cases including all the demo testcases with which mcsema is tested.
- In none of cases, the value
Bottom
is reached. - Each test case folder contains a png file, showing the cfg and associaed fixed point values of
In
,Gen
andOut
. [This example] (https://github.com/sdasgup3/binary-decompilation/blob/master/source/test/max-stack-height/test_24/cfg.png) shows one of the complex cfgs handled.
- variable recovery algorithm
- Deconstruct the monolithic stack that mcsema uses into local stack for each procedure.
- Use a data flow analysis to identify the max stack height of each procedure. Its OK to have the a variable expresion for max stack height.
- Allocate an array of "max stack height" right at the beginnig of each procedure.
- For each procedure, convert the accesses on that monolithic stack into accesses on the local stack.
- For each procedure, identify the abstract locations on its local stack corresponding to recovered variable.
- Promote the abstract locations into variables.
Consider the source code,
int main(){
int z;
z = foo(10,20);
return z;
}
int foo(int a, int b) {
int temp1;
temp1 = a+b;
return temp3;
}
In the mcsema recovered IR, all the stack memory acceses are going on a monolithic array r.RSP.
int main(struct regcntx r){
r.RSP = r.RSP-2; //Local Allocation
r.RSP[1] = 20; //Outgoing argument
r.RSP[0] = 10;
int llvm_tmp_3 = rewritten_foo(r);
return llvm_tmp3;
}
int rewritten_foo(struct regcntx r) {
int* llvm_EBP = r.RSP; //Local Frame Pointer
llvm_RSP = llvm_RSP-10; //Local Allocation
int tmpIn1 = llvm_EBP[0]; //Incoming Arg
int tmpIn2 = llvm_EBP[1];
int llvm_tmp2 = tmpIn1+tmpIn2;
llvm_RSP[2] = llvm_tmp2;
return llvm_tmp2;
}
The first step is localize the monolithic array to each procedure.
int main(struct regcntx r){
int LOCAL\_FRAME[MAX\_MAIN];
LOCAL_FRAME[0] = 20; //Outgoing argument
LOCAL_FRAME[1] = 10; //Outgoing argument
int temp1 = LOCAL_FRAME[0];
int temp2 = LOCAL_FRAME[1];
int temp3 = rewritten_foo(temp1, temp2);
return temp3;
}
int rewritten_foo(int arg1, int arg2) {
int LOCAL_FRAME[MAX_FOO];
int temp = tmpIn1+tmpIn2;
LOCAL_FRAME[0] = temp;
return temp;
}