During our evaluations tests, we often need to analyze quickly large Windows products, and want to pinpoint how their different bricks work together, especially their modules. In most cases, a module will import another module's functions, and this will be easily retrieved statically.
However, in a few other cases, a module may export classes constructors, which will return objects containing references towards their virtual methods. In some other few cases, callbacks may be registered and called by other modules. In these cases, it will not be trivial to pinpoint which method will be called by another module (and especially by which function).
We developped a small tool, "DIMCT" (for Dirty Inter-Module Calls Tracer) which allows tracing inter module calls, without a too big overhead.
The tool may be found at https://github.com/AMOSSYS/DIMCT.
The usage is relatively straightforward:
- Run the provided IDAPython script in order to generate a configuration file;
- Start the monitored process;
- Run the provided executable with the process PID, the configuration file, and the delay before killing the process;
- Load the output with the IDAPython script in order to pinpoint which functions have been called;
- Manually parse the output file if you want more information, e.g 'who called who'.
The inner concepts are also quite simple: inline hooks are placed in top of any identified function. The hook points toward a logging function, which only logs intermodular calls. Logs are performed in a dedicated memory area, which is periodically read and dumped by the remote process.
It follows this scheme:
Figure 1: DIMCT flow
The reasons why we call this tool "dirty" are the following ones:
- we do NOT use a shared memory section, the monitoring process keeps reading the remote memory area and wipes it when full (two WriteProcessMemory calls are done, one to wipe the area, the second one to "release" the mutex). We just gave the monitoring process an higher priority than the target process in order to minimize the impact;
- we do NOT use any Windows API in the logging function, so mutexes are implemented with a
lock cmpxchginstruction (i.e no OS benefits such as thread priority boosts).
Yeah, that's really dirty, but this actually worked without too much bugs/overhead/drops, so... we keeped it as is. We also did not encounter the need for x64 binaries so actually only x86 processes are handled (the concept remains the same, we will implement it soon, I guess).
The main problems we faced is handling relative instructions while moving our saved instructions. Moving a
SHORT JMP or a
CALL, which opcodes are relatives to the current instruction position is not that straightforward, and that's the main reason why we used an IDAPython script.
In order to face this problem and use absolute addresses, we replaced
PUSH/RET instructions, and conditional jumps with their counterparts and
PUSH/RET instructions. For instance, a
JNZ SHORT <addr> will be replaced by a
JZ SHORT $+6 / PUSH <addr> / RET. Those absolute addresses belonging to the module itself are stored relatively to the module base address, and then "relocated" at the hook installation. Absolute addresses are also logged in order to be relocated by the program.
As an example, here are the original function, the configuration file and the final result:
Figure 2: DIMCT trampolines
As an example, let's test it on
KernelBase.dll and the 32 bit version of
notepad.exe. First, load
KernelBase.dll (the SysWOW64 version) in IDA Pro, load the script and run
Python>create_config("C:\\Users\\user\\Desktop\\config.bin", True) 4374 subs will be monitored
On a Windows 10 1709 we actually cover 4374 over 4458 subs.
Now let's start the
notepad.exe instance and then DIMCT tool, with notepad's PID and 120 seconds. The interface is actually quite responsive but may be slowed, especially when opening the file/open dialog. Finally we've got a
log.bin file of approximatively 6Mb.
Figure 3: DIMCT running
In order to show the results in IDA, we use the
parse_output function, and here are the called functions:
Python>parse_output("C:\\Users\\User\\Desktop\\log.bin") Modules list: notepad.exe : 00380000 - 003be000 ntdll.dll : 77c20000 - 77dae000 [...] Unique callers: COMDLG32.dll urlmon.dll gdi32full.dll TextInputFramework.dll msvcrt.dll CoreUIComponents.dll dwmapi.dll ntdll.dll sechost.dll PROPSYS.dll cfgmgr32.dll KERNEL32.DLL IMM32.DLL SHLWAPI.dll USER32.dll MPR.dll combase.dll notepad.exe uxtheme.dll OLEAUT32.dll profapi.dll SHELL32.dll RPCRT4.dll shcore.dll clbcatq.dll COMCTL32.dll windows.storage.dll MSCTF.dll ucrtbase.dll CoreMessaging.dll twinapi.appcore.dll ADVAPI32.dll oleacc.dll Unique called subs: 0x100f2800 GetProcessHeap 0x100fb300 DeactivateActCtx 0x100f77b0 sub_100F77B0 [...]
Sorting the called functions:
AccessCheck ActivateActCtx AddAccessAllowedAce AddRefActCtx [...] Wow64DisableWow64FsRedirection Wow64RevertWow64FsRedirection lstrcmpW lstrcmpiW lstrlenW sub_100D991D sub_100EE882 sub_100F77B0 sub_100FD090 sub_10103281 sub_1010331F
Interrestingly, 6 non exported subs have been called. For instance,
sub_100FD090 are only referenced by
CreateThreadpoolIo. Let's see who called them:
Python>whocalled("sub_100D991D", "C:\\Users\\user\\Desktop\\log.bin") Unique callers: 0x32c1f4d Python>whocalled("sub_100EE882", "C:\\Users\\user\\Desktop\\log.bin") Unique callers: 0x32c2e2a Python>whocalled("sub_100F77B0", "C:\\Users\\user\\Desktop\\log.bin") Unique callers: ntdll.dll : 0x77c597c7 Python>whocalled("sub_100FD090", "C:\\Users\\user\\Desktop\\log.bin") Unique callers: ntdll.dll : 0x77c5d087 Python>whocalled("sub_10103281", "C:\\Users\\user\\Desktop\\log.bin") Unique callers: 0x32c1c11 Python>whocalled("sub_1010331F", "C:\\Users\\user\\Desktop\\log.bin") Unique callers: 0x32c42e0 Python>
Ntdll called the 2 thread pools callbacks, the other ones seem to have been called by jitted code, which is in fact... our own "trampolined" code (which moved several
CALL instructions), which we really should add in the white list.
We hope this basic tool/source code will be useful to others than us. We want it to remain simple, so the biggest improvements will probably be removing the "dirty" part (i.e using shared memory, Windows mutexes, and tuning the assembly code), and adding the x64 support. We may also test it against intra-modular calls in the future, but we're not really confident over the performances. We'll see. Feel free to contribute!