Recrutement
Si vous êtes intéressés pour bosser sur des sujets sympas tout en restant loin de Paris, consultez nos offres d'emploi et envoyez nous votre CV à rh@amossys.fr.

Sodinokibi / REvil Malware Analysis

This article details the behavior of the Sodinokibi ransomware using static analysis with IDA Pro.

Introduction

Sodinokibi, also called REvil, is a ransomware active since april 2019. Older version have already been analysed, but Sodinokibi receives frequent updates, tweaking its features and behavior. In this article we will be analysing a sample found during an Amossys CERT mission, compiled in march 2020 according to the PE timestamp.

The purpose of this article is to detail how the malware works, and to provide reverse engineering tips when possible. No dynamic analysis was conducted, as static reversing with IDA Pro proved sufficient.

Presentation of the malware

Sodinokibi is a "Ransomware as a Service" which means that the developers are not the one conducting attacks. Instead, they maintain a management / payment infrastructure and give or sell the malware to customers. Thoses custormers are the one spreading the malware. For each ransom paid, developers get a percentage. This approach has many advantages: infections sources are multiplied, developers can focus on the code and maintenance while customers can focus on attacking and infecting targets.

According to the cybersecurity blog Krebs on security, in june 2020, criminals behind Sodinokibi started selling stolen data if victims were not inclined to pay the ransom1. As data stealing features were not found in Sodinokibi, this lets suppose that infections are manual and targeted at already compromised system.

Sample information

We uploaded our sample to Virus Total to get signatures and information on the PE.

Virus Total score

Figure 1: Virus Total score

Sample Signatures

Figure 2: Sample Signatures

Obfuscated IAT

Right after loading the PE into IDA, we notice that its imports table (IAT) is probably obfuscated. Two points can lead to this conclusion. First, IDA only detects 5 imported functions, which is way too few to do anything significant.

IDA imports subview

Figure 3: IDA imports subview

Then, the program calls dwords which do not seems to point to any valid function. Thoses dwords might be the obfuscated IAT.

Unknown dwords calls

Figure 4: Unknown dwords calls

Probable obfuscated IAT

Figure 5: Probable obfuscated IAT

Before being able to do anything, the malware has to deobfuscate its IAT. By looking into the first function we can spot the following loop, where dword_40ff40 is a pointer to the start of the obfuscated IAT:

Deobfuscation loop

Figure 6: Deobfuscation loop

sub_406817 is used to resolve the unknown dwords into valid functions adresses. Here is how it works:

Dword resolution

Figure 7: Dword resolution

  1. The obfuscated dword is transformed (with XOR and bit shifting). The new dword is splitted in 2. The 11 most significants bits designate a DLL. The 21 least significant bits designate a function exported by this DLL
  2. The DLL corresponding to the 11 bits value is loaded into memory with the LoadLibrary() function.
  3. The name of each function exported by the library is hashed using a custom algorithm. The hash is compared to the 21 bits value. If they match, the obfuscated dword is replaced with the correct function address in memory.

Let's see what all these steps look like in IDA Pro.

Hash transformation

The obfuscated dword (named arg_iat_hash here) is tranformed with this line:

Obfuscated dword tranformation

Figure 8: Obfuscated dword tranformation

Switch / Case

Here, the switch / case statement is used to load a specific DLL depending on the dll_hash variable. Switch / Case statements are often compiled to look like this: Compiled switch / case statement

Figure 9: Compiled switch / case statement

The considered value (here dll_hash) get substracted and compared to 0 instead of being compared to direct values. Every compiler has its own way to process conditional statements, but we have seen this one multiple times.

LoadLibrary()

DLLs are loaded with the LoadLibrary() function.

LoadLibrary() function hash

Figure 10: LoadLibrary() function hash

LoadLibrary() is exported by the Kernel32 library. So when LoadLibrary() itself needs to be resolved in the obfuscated IAT, Kernel32 can not be loaded. Thanks to the few non-obfuscated functions (see Figure 3), Kernel32 is already loaded when the process starts up. When LoadLibrary() must be resolved, the program looks into its Process Environment Block for loaded modules, and retrieve the adresse of Kernel32 with yet another hash mechanism.

DLL retrieval from PEB structure

Figure 11: DLL retrieval from PEB structure

Function name and address resolution

Once the DLL address is known, the following code is executed:

Unknown code

Figure 12: Unknown code

The offset 0x3C and 0x78 are noticeable for a PE file. 0x3C is the offset of the PE header and 0x78 is the offset of the IMAGE_DATA_DIRECTORY structure in this PE header. Data directories contain various information about the PE file. The first one (offset 0x00 in the IMAGE_DATA_DIRECTORY) is the IMAGE_EXPORT_DIRECTORY, containing information about exported functions. Here is its layout:

struct _IMAGE_EXPORT_DIRECTORY
{
  DWORD Characteristics;
  DWORD TimeDateStamp;
  WORD MajorVersion;
  WORD MinorVersion;
  DWORD Name;
  DWORD Base;
  DWORD NumberOfFunctions;
  DWORD NumberOfNames;
  DWORD AddressOfFunctions;
  DWORD AddressOfNames;
  DWORD AddressOfNameOrdinals;
};

We can import this structure in IDA and change the v18 variable type. We now see which fields of the structure are accessed Exported function access

Figure 13: Exported function access

Each exported function name is then hashed and compared with the unknown hash. If they match, the unknown dword is replaced with the correct function address.

Function hashes comparison

Figure 14: Function hashes comparison

Function name hashing

Function names are hashed with the following code:

Hash function

Figure 15: Hash function

The for loop as displayed by IDA decompiler may not be very clear. Here is an equivalent in python:

def mw_get_fn_name_hash(fn_name):
  result = 0x2B
  for c in fn_name:
    result = ord(c) + 0x10F * result

  return result

This hash function do not need to be very robust, as potential inputs are limited to function names from common DLL.

Automation

Now that we understand the deobfuscation mechanism, we can rename every dword in the obfuscated IAT with the correct function name. The OALabs team provides IDA scripts to automate this process especially for Sodinokibi. The first script2 builds hashes for functions in commonly used DLL. The second script3 compare every dword in the obfuscated IAT to the previously build hashes. If they match, the dword is renamed to the corresponding function name.

After executing the scripts, functions are successfuly renamed. For exemple, here are dwords from Figure 4 resolved into CreateMutexW() and RtlGetLastWin32Error().

Resolved calls

Figure 16: Resolved calls

Encrypted strings

The malware contains no meaningful strings. They might be obfuscated or encrypted. Let's take a look at the CreateMutexW() call we just resolved:

Unknown mutex name

Figure 17: Unknown mutex name

Here, v2 has to be a string containing the mutex name. It is only used in sub_4056E3(), a function that is called in many other places with unk_4101C0 as the first argument:

sub_4056E3 xrefs

Figure 18: sub_4056E3 xrefs

Considering how frequently sub_4056E3() is called, it is probably used to somehow initialize strings. By looking into it, we can find the following code:

RC4 extract from binary

Figure 19: RC4 extract from binary

We notice two loops going from 0 to 255 (0x100 values). This scheme is characteristic of the RC4 encryption algorithm. Here is the equivalent pseudo code taken from the RC4 Wikipedia page:

RC4 pseudocode

Figure 20: RC4 pseudocode

RC4 needs to know the key, the data to encrypt or decrypt, and their respective size. We can rename sub_4056E3() arguments as follow:

String decryption function arguments

Figure 21: String decryption function arguments

unk_4101C0 (renamed ptr_to_enc_str) is a pointer to a data blob containing encrypted strings and their decryption key. The next argument is the offset to the key in the data blob. Then, the key and string sizes are given. The string offset in the blob is obtained by adding the key offset and the key size. This means that the key and the corresponding string are adjacent in the data blob.

Automation

Here again, OAlabs provides a script4 to automate stings decryption. For each call to sub_4056E3(), the script fetches the arguments and decrypt the string. The decrypted string is added as a comment next to the call. Here is the result for the mutex name:

Decrypted mutex name

Figure 22: Decrypted mutex name

Concurrency checking

In the two previous parts (Obfuscated IAT and Encrypted Strings), we took as exemple a short snippet of code, where the malware opens a Mutex and check if it already exist. This code is here to prevent two instances of the malware to run at the same time. Normally, the malware was designed to prevent this, but if files get encrypted twice, the victim may not be able to recover them, even after paying the ransom. However, Sodinokibi authors seem to attach great importance to the data recovery rate. Here is a snippet of the payment instruction:

In Q2 2019, victims who paid for a decryptor recovered 92% of their encrypted data. This statistic varied dramatically depending on the ransomware type.

For example, Ryuk ransomware has a relatively low data recovery rate, at ~ 87%, while Sodinokibi was close to 100%. "

Now you have a guarantee that your files will be returned 100 %.

If the malware has a low data recovery rate and acquires bad reputation, victims will be less inclined to pay, generating losses for the authors.

Privileges obtention

The malware needs administrators privileges to read and overwrite files on the system. Three tests are made to check if the malware has enough privileges:

Privileges verification

Figure 23: Privileges verification

First, it checks if the Windows version is Windows XP or lower. Then, it checks if the process Token rights can be elevated or not. Finally, it checks the process SID. If all of the tests fails (no administrator privileges), the malware will just spam the UAC prompt to get user consent:

Infinit user consent request

Figure 24: Infinit user consent request

The ShellExecuteExW() function is used to execute a binary with given parameters. The runas command executes it as Administrator, asking the user for consent. ShellExecuteExW() is called in an infinite loop. An unaware user might say "no" multiple time before getting annoyed and say "yes". Alternatively, the malware might be executed by an attacker already having administrator privileges on a compromised system.

Configuration

After getting administrator privileges, the malware reads its JSON configuration. As the strings, the configuration is RC4 encrypted. It is embedded into a special section of the binary called .11hix here. This name is probably changed with each version of the malware.

PE sections names

Figure 25: PE sections names

Here are all the fields in our sample's configuration :

JSON configuration fields

Figure 26: JSON configuration fields

Once decrypted, the configuration is parsed to load all fields data into memory. It is very unlikely that the malware authors spent time developping a JSON parser. They probably used an already existing solution. Lets search for commonly used parser:

JSON Parser search

Figure 27: JSON Parser search

Top 2 results are cJSON and json-parser. We can see that those two parser have a different way to handle JSON data types. cJSON types are defined like this:

/* cJSON Types: */
#define cJSON_Invalid (0)
#define cJSON_False  (1 << 0) //1
#define cJSON_True   (1 << 1) //2
#define cJSON_NULL   (1 << 2) //4
#define cJSON_Number (1 << 3) //8
#define cJSON_String (1 << 4) //16
#define cJSON_Array  (1 << 5) //32
#define cJSON_Object (1 << 6) //64
#define cJSON_Raw    (1 << 7) //128 /* raw json */

json-parser types are defined in an enum like this:

typedef enum
{
   json_none,     //0
   json_object,   //1
   json_array,    //2
   json_integer,  //3
   json_double,   //4
   json_string,   //5
   json_boolean,  //6  
   json_null      //7
} json_type;

For each configuration field, the malware seems to use a structure looking like this:

struct  mw_config_field
{
  DWORD field_name;
  DWORD unknown;
  DWORD parse_function;
};

Those structures are then stored in an array:

Parsing strucutures array

Figure 28: Parsing strucutures array

The unknown fields values are between 0 and 6. We can suppose that the unknown field corresponds to the json types from json-parser. By checking the type in IDA and in the configuration, we can confirm our supposition. For exemple, the svc field is an array, both in the configuration and in the IDA structure :

svc array in configuration

Figure 29: svc array in configuration

svc type in parsing structure

Figure 30: svc type in parsing structure

Now that we have the parser source code, we can avoid wasting time reversing it in IDA.

Payment instructions

The payment instructions are base 64 encoded in the nbody field of the configuration. Once decoded, we can see informations are missing :

Payment instruction excerpt

Figure 31: Payment instruction excerpt

The three fields UID, KEY and EXT are generated and replaced at run time.

UID

The UID is the victim identifier, generated with the Windows volume serial number, and CPU attributes obtained with the cpuid ASM instruction.

UID generation

Figure 32: UID generation

KEY

The KEY is a JSON dictionnary containing various information about the system, the user and the malware. The dictionnary is AES encrypted with a key embedded in the binary, and base 64 encoded :

KEY generation

Figure 33: KEY generation

KEY base 64 encoding

Figure 34: KEY base 64 encoding

EXT

The EXT is the file extension that will be added to encrypted files. It is a randomly generated string between 5 and 10 characters, containing numbers and / or lowercase letters. The file extension is saved into a registry key to be used if the malware is executed multiple times.

Command line arguments

The malware accept 5 command line arguments.

Command line argument parsing

Figure 35: Command line argument parsing

By default, the malware will encrypt all files on local and shared network drives. The following arguments change this behavior :

  • nolan : do not encrypt files on local drives
  • nolocal : do not encrypt files on shared drives
  • path : ignore local and network to only encrypt files in a specific path

The malware supports multiple encryption types:

  • fast : only encrypt the first MB of each file
  • full : encrypt the whole file

A third encryption type can be set in the configuration. We will come back to that later.

Region whitelisting

The malware will now verify that the infected system is not in a whitelisted region. To do so, it checks the system language and the keyboard layout:

System language whitelist

Figure 36: System language whitelist

Keyboard layout whitelist

Figure 37: Keyboard layout whitelist

Both the keyboard layout and the language need to be whitelisted for the malware to stop.

Persistence

The malware may be interrupted before being able to encrypt all files, either by an antivirus, or by a user shutting down the process or the machine. Supposedly to prevent this scenario, the malware registers itself in the SOFTWARE\Microsoft\Windows\CurrentVersion\Run registry key to be relaunched at boot time. To pick up where it left off and avoid encrypting files multiple times, the malware saves some information in registry keys. In particular, it saves the EXT file extension, and intermediate secret keys, that are sadly not sufficient to decrypt files (more details about encryption keys are given in the Keys management section).

Registration in CurrentVersion\\Run key

Figure 38: Registration in CurrentVersion\Run key

However, when finished, the malware deletes himself rendering the path in the registry key invalid.

Registration for deletion

Figure 39: Registration for deletion

From MSDN Documentation:

If dwFlags specifies MOVEFILE_DELAY_UNTIL_REBOOT and lpNewFileName is NULL, MoveFileEx registers the lpExistingFileName file to be deleted when the system restarts.

Processes and sevices shutdown

The malware configuration contains a list of services and processes name (svc and prc fields). These services and processes are stopped by the malware before files encryption. They usually are backup / snapshot services or antiviruses. They can also be database services like sql. By stopping these services, the databases files are no longer opened in other processes and can be encrypted.

The exact list may vary from samples to samples. As attacks are very targeted, we can assume that attackers adapt the configuration to suit the victim system.

In addition to stopping services / processes, the malware also deletes shadow copies. Shadow copies are files or volumes snapshots made by the Volume Snapshot Service (VSS) included in Windows.

For Windows versions over Windows XP, the following command is used :

powershell -e RwBlAHQALQBXAG0AaQBPAGIAagBlAGMAdAAgAFcAaQBuADMAMgBfAFMAaABhAGQAbwB3AGMAbwBwAHkAIAB8ACAARgBvAHIARQBhAGMAaAAtAE8AYgBqAGUAYwB0ACAAewAkAF8ALgBEAGUAbABlAHQAZQAoACkAOwB9AA==

Once decoded :

Get-WmiObject Win32_Shadowcopy | ForEach-Object {$\_.Delete();}

For windows XP and below, the following command is used :

cmd.exe /c vssadmin.exe Delete Shadows /All /Quiet & bcdedit /set {default} recoveryenabled No & bcdedit /set {default} bootstatuspolicy ignoreallfailures

Files listing

By default, the malware encrypt all local and network files. Depending on command line arguments, it can ignore local or network files, or ignore both to only encrypt a given path.

File listing options

Figure 40: File listing options

mw_enum_path_files() is the function that recursively list all files in a given path. mw_enum_local_files() and mw_enum_network_resources() are used to list high level directories, and call mw_enum_path_files().

Local directories

mw_enum_local_files() brainlessly list all disks from A:\ to Z:\. Each disk is sent to mw_enum_path_files() :

Local file listing

Figure 41: Local file listing

Network resources

Network resources are listed with the WNetOpenEnumW() and WNetEnumResourceW() functions. Resources of type RESOURCETYPE_DISK are sent to mw_enum_path_files().

Network file listing

Figure 42: Network file listing

Specific directory listing

As we can see, both mw_enum_local_files() and mw_enum_network_resources() call mw_enum_path_files() to recursively list files in a given directory.

The malware configuration contains a whitelist of folders (fld), files (fls) and file extensions (ext) that must not be encrypted (like the windows installation folder for exemple). Our sample's configuration was the following:

"wht": {
    "fld": [
      "msocache", "intel", "$recycle.bin", "google", "perflogs",
      "system volume information", "windows", "mozilla", "appdata",
      "tor browser", "$windows.~ws", "application data", "$windows.~bt",
      "boot", "windows.old"
    ],
    "fls": [
      "bootsect.bak", "autorun.inf", "iconcache.db", "thumbs.db", "ntuser.ini",
      "boot.ini", "bootfont.bin", "ntuser.dat", "ntuser.dat.log", "ntldr",
      "desktop.ini"
    ],
    "ext": [
      "com", "ani", "scr", "drv", "hta", "rom", "bin", "msc", "ps1", "diagpkg",
      "shs", "adv", "msu", "cpl", "prf", "bat", "idx", "mpa", "cmd", "msi",
      "mod", "ocx", "icns", "ics", "spl", "386", "lock", "sys", "rtp", "wpx",
      "diagcab", "theme", "deskthemepack", "msp", "cab", "ldf", "nomedia", "icl",
      "lnk", "cur", "dll", "nls", "themepack", "msstyles", "hlp", "key", "ico",
      "exe", "diagcfg"
    ]
  },

mw_enum_path_files() will start by checking if the given path is a whitelisted folder. If not, it proceeds by writing ransom instructions and listing items with FindFirstFile() and FindNextFile() functions.

Each item found is ignored if on the whitelist. Otherwise, if the item is a folder, the ransom instructions are written in a text file named {EXT}-readme.txt. This name is defined in the nname field of the configuration. If the item is a file, it is encrypted.

Encryption parallelisation

To shorten encryption time and take full advantage of the victim's calculation power, the malware uses "I/O Completion Port":

I/O completion ports provide an efficient threading model for processing multiple asynchronous I/O requests on a multiprocessor system. When a process creates an I/O completion port, the system creates an associated queue object for requests whose sole purpose is to service these requests. Processes that handle many concurrent asynchronous I/O requests can do so more quickly and efficiently by using I/O completion ports in conjunction with a pre-allocated thread pool than by creating threads at the time they receive an I/O request.

The IOCP API is composed of three functions :

  • CreateIoCompletionPort() : called without or with a file handle, to respectively create a port or add a file to it;
  • GetQueuedCompletionStatus() : wait for a completion paquet to be posted to the port;
  • PostQueuedCompletionStatus() : post a completion paquet to the port. Completion paquet are also automatically posted when supported I/O operation are finished (ReadFile(), WriteFile(), etc...)

The malware uses IOCP like this :

  1. The IOCP is created.
  2. A pool of threads is created.
  3. All threads wait an event with GetQueuedCompletionStatus().
  4. When a file is found with mw_enum_path_files(), it is added to the IOCP.
  5. A completion paquet is posted with PostQueuedCompletionStatus() to notify a thread that a file has to be encrypted.

Here is how a port and its threads are created :

IOCP creation

Figure 43: IOCP creation

Thread pool creation

Figure 44: Thread pool creation

We can see that threads take an argument called encryption_routine. This is a pointer to the encryption function that threads will execute. It starts with a call to GetQueuedCompletionStatus() to wait for a file.

When a file is added to a completion port, a completion paquet is posted to trigger a thread :

File addition to the IOCP

Figure 45: File addition to the IOCP

File encryption

A file encryption is done in four steps.

  1. A 1 MB data block is read from the file (or the entire file if its size is less than 1 MB)
  2. The data block is encrypted and written back to the file. Depending on the encryption type, step 1 and 2 can be repeated multiple times.
  3. Metadata are added at the end of the file.
  4. The {EXT} extension is added to the file name.

Encryption steps

Figure 46: Encryption steps

Encryption types

We already presented encryption type full and fast, selectable from command line arguments. The encryption type can also be choosed from the configuration with the et field. Here, a third encryption type is available, which we will call mixed.

For very long files, encrypting only the first MB leaves a lot of unencrypted data, but a full encryption would take too much time. The mixed encryption type allows to encrypt multiple blocks of 1 MB within a file, leaving some data between blocks unencrypted. The size of data left unencrypted between blocks is defined in the spsize configuration field. The malware first read the encryption type from the configuration (cfg_et), but overwrites it if a command line argument is given.

Encryption type selection

Figure 47: Encryption type selection

Encryption algorithms

To find out which encryption algorithm is used by the malware, we must look for known constants in the binary file (AES Sbox for exemple). Various plugins and scripts were made to automate this search. A well known plugin is FindCrypt. If you don't want / can not install IDA plugins or yara-python, here is an alternative IDAPython only implementation we used for this analysis.

The script detect constants and rename them accordingly : FindCrypt output

Figure 48: FindCrypt output

Salsa20 and AES constants

igure 49: Salsa20 and AES constants

Salsa20 and AES constants are present in the binary. By studying these constants references, we find out that files are encrypted with Salsa20. For each file found in the mw_enum_path_files() function, a Salsa20 Matrix is setted up with a unique encryption key, a unique IV, and the Salsa20 constants.

Salsa20 matrix preparation for file encryption

Figure 50: Salsa20 matrix preparation for file encryption

Processing structure

On the previous screenshot, you can see a processing_info variable. It is a structure containing data about the file being encrypted. It is used to transfer information between threads:

struct mw_file_processing
{
  DWORD ptrOverlapped;
  DWORD dword4;
  DWORD NbBytesProcessed_low;
  DWORD NbBytesProcessed_high;
  DWORD dword16;
  DWORD FileHandle;
  DWORD CurrentFileName;
  DWORD dword28;
  DWORD NbBytesToProcess_low;
  DWORD NbBytesToProcess_high;
  BYTE secret_1[88];
  BYTE secret_2[88];
  BYTE file_public_key[32];
  BYTE file_IV[8];
  DWORD file_public_key_crc;
  DWORD encryption_type;
  DWORD spsize;
  DWORD encrypted_null;
  salsa20_matrix salsa20_matrix;
  DWORD current_processing_step;
  DWORD next_processing_step;
  DWORD NbBytesToRead;
  BYTE  EncryptionBuffer[4];
};

Keys management

The malware uses a complex key system to make the ransom payment mandatory for file recovery.

Session keys and encrypted keys

When infecting a new victim, the malware starts by generating a session key pair with the Elliptic-Curve Diffie-Hellman (ECDH) algorithm. The curve used is Curve25519.

Session pair generation

Figure 51: Session pair generation

An attacker's key which we will call attackers_public_1 is stored in the pk configuration field. attackers_public_1 and a newly generated private_1 key are used with the ECDH algorithm to generate the shared_1 key. This shared key is used to encrypt session_private with the AES algorithm.

secret_1 generation

Figure 52: secret_1 generation

This exact same process is repeated to generate an encrypted_2 key, but this time using an attackers key embedded in the binary file (attackers_public_2).

secret_2 generation

Figure 53: secret_2 generation

The secret_1 and secret_2 data are saved in memory and in registry keys, as well as the session_public and attackers_public_1 keys. Other data or keys are freed from memory as they will not be used for file encryption.

File encryption keys and metadata

For each file, a new key pair is generated. The new file_private key and the session_public key are used with the ECDH algorithm to generate the file_encryption key. This key is used with the Salsa20 algorithm to encrypt the file content. Thus, each file is encrypted with a different key.

File encryption

Figure 54 File encryption

Metadata are added at the end of every encrypted file :

Encrypted file metadata

Figure 55: Encrypted file metadata

File decryption

To decrypt a file, one needs the metadata added at its end, and the private key corresponding to either attackers_public_1 or attackers_public_2. The decryption mechanism is supposedly the following:

File decryption

Figure 56: File decryption

The attackers private keys are obviously unknown to us and victims. This is the keystone of Sodinokibi security, preventing victims to decrypt files without paying. This whole key management system also allows the malware to operate without having to communicate with a C2 server. We did not have access to the decryption module, so this process is only a supposition we made based on other analysis5,6.

Cryptographic library identification

ECDH

In this part we wanted to explain how we understood that the binary was actually using ECDH to generate key pairs and shared keys.

We already knew where keys were generated. For exemple, here is the creation of the session keys, and encrypted keys :

Sessions keys and encrypted keys generation

Figure 57: Sessions keys and encrypted keys generation

But when looking into mw_generate_key_pair() sub-functions, the pseudocode quickly ended up looking like this :

Unreadable code

Figure 58: Unreadable code

Here, we searched some distinguishable elements allowing us to identify the algorithm, or better yet, the library used. We finally found a value that seemed oddly specific. Exactly what we wanted :

Oddly specific value

Figure 59: Oddly specific value

After using different search terms, we finally got this result :

Algorithm search

Figure 60: Algorithm search

We successfuly identified the algorithm (ECDH). Can we find the exact library used? Let's look for implementation :

Implementation search

Figure 61: Implementation search

The first result is a github repository in which we can find back our 121665 value :

Curve25519-donna excerpt

Figure 62: Curve25519-donna excerpt

And comparing to what IDA gives us:

Unknown pseudocode

Figure 63: Unknown pseudocode

The two codes have many similarities. We notice common memcpy() and memset() calls. The fscalar_product() function implements a loop from 0 to 10 in the donna library. We find this loop in IDA too. It just seems to have been inlined by the compiler.

We can confidently say that the malware uses this library or a fork. The documentation gives instructions to generate key pairs :

Curve25519-donna usage instructions

Figure 64: Curve25519-donna usage instructions

We can find these steps in the malware :

Private key generation

Figure 65: Private key generation
Public key generation
Figure 66: Public key generation
Shared key generation
Figure 67: Shared key generation

AES

Thanks to the FindCrypt script, we know that the malware uses T-Tables instead of SBoxes for AES encryption. We looked for AES implementations using T-Tables. OpenSSL seemed like a good option, but some function calls and arguments were not matching.

Server Communication

When the net configuration field is set to true, the malware will use the WinHTTP API to send the KEY data (see Payment instructions) to a server. Here, KEY is not base 64 encoded.

The configuration contains 1223 domain names in the dmn field. For each domain, the malware will generate an URL and send the victim's KEY. Each url will have the following pattern :

https://<domain>/<path1>/<path2>/<filename>.<ext>.

  • domain is the current domain.
  • path1 is randomly chosen between the following values:
    • wp-content, static, content, include, uploads, news, data, admin.
  • path2 is randomly chosen between the following values:
    • images, pictures, image, temp, tmp, graphic, assets, pics, game.
  • filename is a randomly generated strings composed of 1 to 9 pairs of lowercase letters.
  • ext is randomly chosen between the following values:
    • jpg, png, gif.

Many domains seem legitimate, but one or more could be compromised and used by the malware authors. The use of so many domains allows to hide which servers really belong to the attackers. Using a simple python script, we sent one POST request to a valid URL on all domains. We received code 200 75 times and code 404 784 times. Many server do not recognise the url and respond with error code 404. Thus, they can not receive data from the malware. With further tests we could continue reducing the list of potential malicious or compromised servers, but this is out of scope of this article.

Conclusion

In this article we explained what the Sodinokibi malware do, and how it operates. The sample analysed was found in march 2020.

Sodinokibi implements two obfuscation mechanism : IAT obfuscation and Strings encryption. While these mechanism do not prevent reverse engineering, they prevent antiviruses solutions to easily detect the threat.

The malware was spread via spam and phishing campaigns, but it was also manually executed by attackers on already compromised system. Indeed, it was designed to be adaptable to the victim's system with a JSON configuration and command line arguments.

To obtain administrator privileges, the malware will spam the UAC window for user consent. Then, it will stop services and processes listed in its configuration. These processes usually are antiviruses, databases, backup or snapshot solutions, etc.

Sodinokibi uses I/O Completion Port to parallelise file encryption, and make it as fast as possible. Files are encrypted with the Salsa20 algorithm, each with a unique encryption key. Encryption keys are protected with a complex key system, preventing file decryption without a private key owned by the attackers.

The malware can operate without contacting a C2 server, but if correctly configured, it will communicate a victim identification key to one or multiple servers. These servers are hidden in a list of thousand of domains.

Detecting Sodinokibi is not easy. Signatures are not reliable as the malware can be recompiled for each victim. Various names like the configuration PE section or registry keys can be randomly generated at each compilation so they can not be used for detection either. Furthermore, the encryption process showed in this blogpost shows that's it's impossible to decrypt the files without paying the ransom.

If you are victim of a ransomware like this one, or if you encounter a security incident on your network, you can contact the CERT-Amossys to help you manage and investigate it.

References