It’s been three weeks into GSoC and I’m having an amazing time. I am working along side dkreuter and been learning tons from him too!
Here is the repository where you can track our progress and also give us suggestions :)
We chose Rust as our language for implementation. Though at first I was a bit scared of this choice, I quickly realised how great the language is! Rust has allowed be to far more productive, after of course my initial battles with the borrow checker. The zero-cost abstractions allowed by Rust has been a great so far!
This is a quick roundup of what I’ve been upto:
- Firstly, I got an ESIL parser up and running. We use this parser to convert to an IR which is easier to perform static analysis on. While the ESIL is amazing for emulation purposes, it’s not so much for static analysis as it has a very large number of supported opcodes. The RadecoIR (name subject to change) is much more simplified in terms of the number of opcodes. Also, ESIL is primarily just strings (as suggested by its name “Evaluable Strings Intermediate Language”) which could be pretty hard to work with.
- The next step was to build a Control Flow Graph (CFG) out of the IR. Building a control flow graph will allow us to reason about the way the control flows in the program, and hence, helps us better understand the different programming constructs that go into making it. To make debugging and visualizations, I first we ahead and implemented a dot format emitter (ok, I admit it, I probably did this first as I felt it was more fun :P). Just a brief word about dot, dot is a graph description language which can be used an input to graphviz to obtain visual representations of a graph. The current implementation is just a minimal dot format emitter and all the features of dot format are not fully supported. At this point, it still cannot be used on any generic graphs. I plan to expand this in the future to allow more features and work on generic graphs.
- The last part was making the CFG out of the emitted RadecoIR. This turned out to be pretty simple, however, there are still a ton of improvements to be made here :)
Apart from Radeco itself, I also helped in making r2pipe.rs which allows communication with radare2 over pipes. In the future, r2pipe.rs will be used to connect Radeco to radare2. R2Pipe.rs is great news if you’re a rust guy as you can now interact with radare2 and extend it to meet your needs!
Check out the documentation if you’re interested :)
This is just a glimpse of what’s in for the upcoming weeks:
- Integration with radare2.
- Write tests and documentation for all features. The limited documentation we currently have can be found here. This will be updated soon.
- Convert the IR into the SSA form.
- Write dataflow analysis on the SSA form and also perform optimizations like deadcode elimination.
- Constant folding.
- Type propagation.
Bonus: Here is a small example of the CFG graph that we currently generate