Methodology
We explored microcode via the IDA Python API. But in the beginning of our work (April 2019), the Hex-Rays Python API to handle microcode was not released yet.
This is why we started our investigations using ctypes to handle C++ structures1. When the microcode API was finally released in July 2019 (ida_hexrays, 7.3 version), we replaced all calls to ctypes libraries by the corresponding Python method. As the Python and C++ API share the same structure, it was a pretty straightforward operation.
As the official documentation for the ida_hexrays Python API is not up-to-date with the new version, we made extensive use of the C++ Hex-Rays documentation (thankfully, both API are overwhelmingly similar).
Introduction to microcode
Microcode is the intermediate language used by the Hex-Rays decompiler.
Its main goal is to turn the result of disassembly into a simpler and processor-independent representation of source code. (For a comprehensive overview of microcode syntax, one might refer to Guilfanov's conferences on microcode2.)
Microcode is generated then optimized during the first steps of decompilation.
At first sight it looks like RISC code:
mov ebx.4, eax.4 ; 4014FB u=ebx.4 d=eax.4
- Each microinstruction has one opcode and 3 operands (one or more may be missing depending on the instruction). Notice that the destination operand is at the far right:
opcode left, right, destination
- The '4014FB' part is the address of the instruction
The last part is the most interesting for us:
- the 'u' and 'd' elements represent use- and def-list of instructions. We will discuss these elements on the last part of this post (Dataflow information).
When generated, microcode is very verbose, but a signification portion of microinstructions gets deleted during successive optimization phases.
There are 7 steps of microcode optimization. As the first steps produce verbose codes and incomplete data flow information, we focused on the last two steps: Global Optimization 3 and Local Variable.
The reason we used mainly the 6th optimization level (Global Optimization 3) and not the highest one (Local Variable) is because the Local Variable optimization gets rid of the distinction between stack and register.
Tree-like structure
Generating microcode for a function results in a tree-like structure called mbl_array_t. As its name suggests, mbl_array_t is an array of microblocks, linked to each other according to the control flow graph.
Microblocks are groups of micro-instructions, with the first and last microblocks of a function being always empty.

Figure 1: Tree-like structure of microcode
mov call $"j_?divide"<cdecl:"int a" %j.4, "int b" #7.4>.4, %r.4
Instructions directly linked to a microblock are called top-level instructions, the other are subinstructions.
Micro-Operands
There are 15 different types of micro-operands. Operand types are valuable information, for they allow to easily distinguish between registers, stack variables, memory addresses or numeric constants.
In practice, each operand owns a 't' attribute stating its type, and each operand type holds different kinds of additional information.
|
Nom |
mop.t |
Operand type |
Useful data |
Corresponding attribute |
|
z |
0x0 |
None |
Empty operand |
|
|
r |
0x1 |
Register |
Register number |
int mop.r |
|
n |
0x2 |
Number constant |
Constant value |
long mop.nnn.value |
|
t |
0x3 |
String constant |
String |
String mop.cstr |
|
d |
0x4 |
Result of another instruction |
Subinstruction |
minsn_t mop.d |
|
S |
0x5 |
Local stack variable |
variable offset |
long mop.s.off |
|
v |
0x6 |
Global variable |
variable address |
long mop.g |
|
b |
0x7 |
Microbloc (jump destination) |
Microbloc number |
int mop.b |
|
f |
0x8 |
List of arguments |
List of arguments |
List<mop_t> mop.f. args |
|
l |
0x9 |
Local variable[1] |
Variable offset |
long mop.l.off |
|
a |
0xA |
Address of operand (l, v, S or r) |
Operand |
mop_t mop.a |
|
h |
0xB |
Helper function |
Helper (char_p) |
String mop.helper |
|
c |
0xC |
microcases[2] |
Not studied |
|
|
fn |
0xD |
Floating point constant |
Not studied |
|
|
p |
0xE |
Operand pair |
low and high operand |
mop_t mop.pair.lop mop_t mop.pair.hop |
|
sc |
0xF |
Scattered |
Not studied |
|
Navigation
Navigating through microcode is appreciably easy. Many features are implemented, such as visitor patterns and find_XX() methods.
Below is a set of useful functions for navigation, with use examples for the not self-explanatory ones :
Table 1 : ida_hexrays API for navigating microcode
|
Operation |
Methode / Attribut |
|
|
Generate microcode of function |
mbl_array_t ida_hexrays.gen_microcode( mba_ranges_t mbr, hexrays_failure_t hf, mlist_t retlist = None, int VRFLAGS = 0, int reqmat = MMAT_GLBOPT3 ) |
|
|
Generate microcode for function at address 'ea' |
mbr = ida_hexrays.mba_ranges_t() fn = idaapi.get_func(ea) eea = fn.endEA sea = fn.startEA mbr.ranges.push_back(ida_range.range_t(sea, eea)) hf = ida_hexrays.hexrays_failure_t() f = ida_hexrays.DECOMP_WARNINGS mmat = 7 mba = ida_hexrays.gen_microcode(mbr, hf, None, f, mmat) |
|
|
mbl_array_t |
||
|
Object information |
||
|
|
void print(vd_printer_t vp) |
|
|
vp = ida_hexrays.vd_printer_t() mba.print_t(vp) |
||
|
Entry address |
long entry_ea |
|
|
Number of blocks |
int qty |
|
|
Navigation |
||
|
First block |
mblock_t blocks |
|
|
Get block i |
mblock_t get_mblock(i) |
|
|
Visit all instructions (including subinstructions) |
int for_all_insns(minsn_visitor_t mv) |
|
|
Visit all top level instructions |
int for_all_topinsns(minsn_visitor_t mv) |
|
|
Visit all operands |
int for_all_ops(mop_visitor_t mv) |
|
|
mblock_t |
||
|
Object information |
||
|
|
void print(vd_printer_t vp) |
|
|
Print mblock |
vp = ida_hexrays.vd_printer_t() mblock.print_t(vp) |
|
|
Bloc number |
int serial |
|
|
Navigation |
||
|
mbl_array_t parent |
mbl_array_t mba |
|
|
Previous bloc (natural order) |
mblock_t prevb |
|
|
Next bloc (natural order) |
mblock_t nextb |
|
|
List of predecessors in CFG |
List<mblock_t> predset |
|
|
List of successor in CFG |
List<mblock_t> succset |
|
|
First instruction |
minsn_t head |
|
|
Last instruction |
minsn_t tail |
|
|
minsn_t |
||
|
Object Information |
||
|
|
String dstr() |
|
|
Address |
int ea |
|
|
Opcode |
int opcode |
|
|
Navigation |
||
|
Previous instruction in bloc |
minsn_t prev |
|
|
Next instruction in bloc |
minsn_t next |
|
|
left operand |
mop_t l |
|
|
right operand |
mop_t r |
|
|
destination operand |
mop_t d |
|
|
Visit instruction and subinstructions |
int for_all_insns(minsn_visitor_t) |
|
|
Example of visitor implementation |
class minsn_visitor(ida_hexrays.minsn_visitor_t): def __init__(self): ida_hexrays.minsn_visitor_t.__init__(self)
def visit_minsn(self): print self.blk.serial # show block number print self.topins.dstr() # show top-instruction print curins.dstr() # show current instruction return 0
visitor = minsn_visitor() mba.for_all_insns(visitor) |
|
|
Find a (sub)instruction with specific opcode |
bool contains_opcode(int opcode) |
|
|
Find a call (sub)instruction |
bool contains_call(bool with_helpers=False) |
|
|
Object Information |
||
|
|
String dstr() |
|
|
Operand type |
int t |
|
Data flow information
Two types of information are available:
• Location: which memory locations are being read or modified by one instruction. This information is stored in microlists (mlist_t class).
• Possible values of registers or data. This information is represented by value ranges (valrng_t class)
It is important to understand that these microlist and value range objects are inherently different: although they both represent sets of values and therefore share common methods, they stand for two entirely distinct concepts.
Microlists
Microlists represent memory locations used or modified by one element of microcode. They can be calculated for a microblock, a microinstruction or even a single micro-operand.
It is a composite class which allows common representation for register, stack and global memory locations.

Figure 2: microlist internals
- Attribute mem (ivlset_t class) is a list of memory ranges. Each range (ivl_t class) is characterized by one offset and one size.
Value ranges
Value ranges represent all possible values of an operand. They can refer to an entire microblock or a microinstruction. Internal structure is not publicly available.
Obtaining dataflow information
All data flow information is obtainable through microblocks, even if it refers to microinstructions or operands.
Constant values MAYMUST and VRFLAGS are defined in the second table.
Table 2 : Methods and attributes for exploring data flow
|
Information |
Methode / Attribut |
|
|
mblock_t |
|
|
|
Memory zones |
||
|
In block |
||
|
Data that must be used by the block |
mlist_t mustbuse |
|
|
Data that may be used by the block |
mlist_t maybuse |
|
|
Data that must be defined by the block |
mlist_t mustbdef
|
|
|
Data that may be defined by the block |
mlist_t maybdef |
|
|
In one instruction of the block |
||
|
Used data |
mlist_t build_use_list(minsn_t minsn, int MAYMUST) |
|
|
Defined data |
mlist_t build_def_list(minsn_t minsn, int MAYMUST) |
|
|
In one operand of the block |
||
|
Used data (return value in mlist) |
void append_use_list(mlist_t mlist, mop_t mop, int MAYMUST) |
|
|
Defined data (return value in mlist) |
void append_def_list(mlist_t mlist, mop_t mop, int MAYMUST) |
|
|
Memory zones values |
||
|
Find possible values for an instruction |
bool get_valranges(valrng_t valrng, vivl_t vivl, minsn_t minsn, int VRFLAGS) |
|
Table 3 : Constants values for data flow manipulation
|
Constant |
Defined values |
|
|
|
|
MUST_ACCESS = 0 MAY_ACCESS = 1 |
|
|
VR_AT_START= 0 get value ranges before the instruction or at the block start VR_AT_END= 1 get value ranges after the instruction or at the block end |
|
get_valranges() in practice
In theory, microblock's get_valranges() method seems like an exciting tool to work with: possible values of one operand are indeed a highly valuable information for static analysis.
Needless to say, we did not expect this function to operate without deficiency: Most of the time, information on memory values is simply not available through static analysis.
We mainly focused our investigation on retrieving value ranges for call arguments, but the method could also apply for other types of instructions.
The goal was to clearly define the limitations and possibilities of the get_valranges() method.
Method arguments
The arguments used by the get_valranges() method requires some explanations:
bool get_valranges(valrng_t valrng, vivl_t vivl, minsn_t minsn, int VRFLAGS)
-
The return Boolean indicates whether the operation went well
-
The calculated value ranges are added to the valrng argument.
Creating an empty valrng_t instance is easy:
valrng = ida_hexrays.valrng_t()
- The vivl argument represents the memory location we want to know value ranges of.
The vivl_t class represents memory locations (one memory offset and one size). Though it looks very similar to the ivl_t class, one is used for mlist_t and the other for valrng_t, and no direct conversion is possible from one to the other.
The easiest way of instantiating vivl_t is using a micro-operand mop:
vivl = ida_hexrays.vivl_t(mop)
-
minsn argument can be 'None' for considering the whole microblock. Obviously, minsn must belong to the involved microblock.
-
VRFLAGS values are defined above (table 3): they allow calculating the value ranges either just before or right after the instruction.
Operand types
This can seem counter-intuitive, but the get_valranges() method is only made for actual memory location (r, S or v-type operands) : The attempt of calculating value ranges for numeric constant (n-type operands) or local variable (l-type operands) will result in a RuntimeError or an empty valrng_t object.
This has some note-worthy implications:
-
First, get_valranges() is useful only for microcode below 'LOCAL VARIABLES' maturity level (level 7), i.e. without local variable.
-
And second, when trying to evaluate an operand, you need to take into consideration the cases where said operand is a numeric constant, which value is obviously known without the need of get_valranges().
Subinstruction operands
It is perfectly possible to retrieve value ranges of subinstruction operands. However, as mentioned above, the minsn argument of get_valranges() must belong to the microblock, i.e. it must be one of microblock's top instruction.
More specifically, using a subinstruction as argument in get_valranges() will raise a Runtime Error during execution.
The correct way of proceeding is to use a vivl_t corresponding to the desired operands, and then the top-level operand of its instruction.
Calculation pessimism
One thing we realized about value ranges propagation between instructions is that it always considers the worst-case scenario: in practical terms, if one instruction may modify the value of a memory location in an unpredictable way, then all previously calculated value ranges are considered irrelevant after this instruction.
This behavior can be particularly disappointing when working with call instructions or indirect addressing, as they may access and modify any memory location.
To work around this feature, we implemented our own way of propagating value ranges, ignoring may-be-defined elements of instruction (see code below). Our function works as expected, but it only works within a microblock, as we did not find a satisfactory way of propagating results between blocks.
MUST, VR_AT_START = 0
MAY, VR_AT_END = 1
MOP_N = 2
def get_valranges(maymust, mblock, minsn, mop):
# get valranges of mop before instruction minsn
valrng = ida_hexrays.valrng_t()
if (mop.t == MOP_N): # MOP_N = Numeric constant Operand
val.set_eq(mop.nnn.value)
else :
vivl = ida_hexrays.vivl_t(mop)
if maymust == MAY :
mblock.get_valranges(valrng, vivl, minsn, VR_AT_START)
elif maymust == MUST :
uselist = ida_hexrays.mlist_t()
mblock.append_use_list(uselist, mop, maymust)
mblock.get_valranges(valrng, vivl, None, VR_AT_START) # valrange at block entry
# if mop value is updated inside the microbloc, valrng is updated as well :
def_insn = find_last_redefinition(uselist, mblock, mblock.head, minsn, maymust)
if def_insn is not None :
mblock.get_valranges(valrng, vivl, def_insn, VR_AT_END)
return val
def find_last_redefinition(mlist, mblock, minsn_start, minsn_end, maymust):
# find last instruction redefining one element of mlist
# between minsn_start and minsn_end
minsn_start = mblock.find_redefinition(mlist, minsn_start, minsn_end, maymust)
minsn = minsn_start
while minsn_start is not None :
minsn = minsn_start
minsn_start = mblock.find_redefinition(mlist, minsn_start.next, minsn_end, maymust)
return minsn
Conclusion
In comparison with pseudocode, microcode is easier to navigate through using the ida_hexrays API, especially when considering retrieving call arguments.
Use- and def-list features are working exactly as expected and could be useful for some applications.
In regard to the value ranges feature, the most severe restriction of use concerns value ranges propagation between microblocks. As it is, it does not reveal much valuable information.
On the other hand, value ranges calculations within a block give satisfactory outcomes, although it might have limited practical applications.
-
A huge thanks to this github project : https://github.com/NeatMonster/MCExplorer ↩
-
https://i.blackhat.com/us-18/Thu-August-9/us-18-Guilfanov-Decompiler-Internals-Microcode-wp.pdf ↩
-
At 'LOCAL VARIABLES" optimization level, register(r) and local stack variable (S) operands are being replaced by local variable (l) ↩
-
c operands are used for representing 'switch' idioms ↩
-
At 'LOCAL VARIABLES" optimization level, register(r) and local stack variable (S) operands are being replaced by local variable (l) ↩
-
c operands are used for representing 'switch' idioms ↩