Tutorial: How to reverse unknown protocols using Netzob

This article presents the main features of Netzob on how to reverse engineer unknown protocols. It goes through learning the message formats of a simple protocol as well as its state machine, and gives some insights on how to generate traffic in order to communicate with a real implementation. Finally, we show how to apply some basic fuzzing targeting the server implementation.

Netzob introduction

Netzob is an open source tool for reverse engineering, traffic generation and fuzzing of communication protocols. It allows to infer the message format and state machine of a protocol through passive and active processes. The model can afterward be used to simulate realistic and controllable traffic as well as fuzzing a target implementation.

Through this tutorial, we will present the main features of Netzob regarding the inference of message formats and grammar of a simple toy protocol, and some basic fuzzing of the implementation at the end. The described features cover the following capabilities:

  • Import a file containing traces we want to reverse
  • Infer the message format
    • Partitionment of messages following a specific delimiter
    • Regroupment of messages following a specific key field
    • Partitionment of a subset a each message following a sequence alignment
    • Search for relationships in each group of messages
    • Modification of the message format to apply found relationships
  • Infer the grammar
    • Generation of an automaton with one main state according to a captured sequence of messages
    • Generation of an automaton with a sequence of states according to a captured sequence of messages
    • Generation of a Prefix Tree Acceptor (PTA) automaton according to a captured sequence of messages
  • Generate traffic and fuzz the server
    • Generation of messages following the inferred message format of each group and through visiting the inferred automata
    • Fuzzing of an implementation by generating altered message formats

Install Netzob and download the tutorial resources

At first, retrieve the source code of Netzob, install its dependencies and compile the underlying libraries. If required, more details on the installation process are provided in the README file.

$ git clone https://dev.netzob.org/git/netzob
$ cd ./netzob/
$ sudo apt-get install python python-dev python-impacket python-setuptools build-essential python-numpy
$ python setup.py build
$ python setup.py develop --user

Then, you can retrieve the source code of the toy protocol implementation used in this tutorial, as well as some PCAP files of sequences of messages:

Next paragraphs of this article go through the different steps that can be followed to reverse this toy protocol. Before diving into Netzob features you can have a look at its documentation and especially the description of the API.

Message format inference

Import messages from a PCAP file

The first step in most Protocol Reverse Engineering (PRE) processes is to collect and import communication samples. In this tutorial, samples take the form of PCAP files. Reading packets from a PCAP file is done through the PCAPImporter.readFile() static function. This function can optionally take more parameters to specify a BPF filter, the import layer or the number of packets to capture, as shown in the documentation:

def readFile(filePath, bpfFilter="", importLayer=5, nbPackets=0):
     """Read all messages from the specified PCAP file. A BPF filter
     can be set to limit the captured packets. The layer of import
     can also be specified:
      - When layer={1, 2}, it means we want to capture a raw layer (such as Ethernet).
      - If layer=3, we capture at the network level (such as IP).
      - If layer=4, we capture at the transport layer (such as TCP or UDP).
      - If layer=5, we capture at the applicative layer (such as the TCP or UDP payload).
     Finally, the number of packets to capture can be specified.

    :param filePath: the pcap path
    :type filePath: :class:`str`
    :param bpfFilter: a string representing a BPF filter.
    :type bpfFilter: :class:`str`
    :param importLayer: an integer representing the protocol layer to start importing.
    :type importLayer: :class:`int`
    :param nbPackets: the number of packets to import
    :type nbPackets: :class:`int`
    :return: a list of captured messages
    :rtype: a list of :class:`netzob.Common.Models.Vocabulary.Messages.AbstractMessage`

This function can be used to extract the messages from the PCAPs we collected while stimulating our toy protocol implementation. For example, the following code creates a symbol based on the messages extracted out of the PCAPs. A symbol represents all the messages that share the same syntax and semantic. In other words, a symbol is an abstraction of a group of similar messages, that have the same impact from the protocol point of view. At first, all the messages imported from the PCAP file are grouped in a unique symbol. We will then apply different methods on this symbol to identify the message formats of the protocol.

from netzob.all import *

# Import of two PCAP files representing two sessions of the protocol (i.e. two instances of a communication between the client and the server)
messages_session1 = PCAPImporter.readFile("target_src_v1_session1.pcap").values()
messages_session2 = PCAPImporter.readFile("target_src_v1_session2.pcap").values()
messages = messages_session1 + messages_session2

# Group the messages of the two sessions into a uniq symbol
symbol = Symbol(messages = messages)

# Display symbol content
print symbol 
Field                                                
-----------------------------------------------------
'CMDidentify#\x07\x00\x00\x00Roberto'                
'RESidentify#\x00\x00\x00\x00\x00\x00\x00\x00'       
'CMDinfo#\x00\x00\x00\x00'                           
'RESinfo#\x00\x00\x00\x00\x04\x00\x00\x00info'       
'CMDstats#\x00\x00\x00\x00'                          
'RESstats#\x00\x00\x00\x00\x05\x00\x00\x00stats'     
'CMDauthentify#\n\x00\x00\x00aStrongPwd'             
'RESauthentify#\x00\x00\x00\x00\x00\x00\x00\x00'     
'CMDencrypt#\x06\x00\x00\x00abcdef'                  
"RESencrypt#\x00\x00\x00\x00\x06\x00\x00\x00$ !&'$"  
"CMDdecrypt#\x06\x00\x00\x00$ !&'$"                  
'RESdecrypt#\x00\x00\x00\x00\x06\x00\x00\x00abcdef'  
'CMDbye#\x00\x00\x00\x00'                            
'RESbye#\x00\x00\x00\x00\x00\x00\x00\x00'            
'CMDidentify#\x04\x00\x00\x00fred'                   
'RESidentify#\x00\x00\x00\x00\x00\x00\x00\x00'       
'CMDinfo#\x00\x00\x00\x00'                           
'RESinfo#\x00\x00\x00\x00\x04\x00\x00\x00info'       
'CMDstats#\x00\x00\x00\x00'                          
'RESstats#\x00\x00\x00\x00\x05\x00\x00\x00stats'     
'CMDauthentify#\t\x00\x00\x00myPasswd!'              
'RESauthentify#\x00\x00\x00\x00\x00\x00\x00\x00'     
'CMDencrypt#\n\x00\x00\x00123456test'                
"RESencrypt#\x00\x00\x00\x00\n\x00\x00\x00spqvwt6'16"
"CMDdecrypt#\n\x00\x00\x00spqvwt6'16"                
'RESdecrypt#\x00\x00\x00\x00\n\x00\x00\x00123456test'
'CMDbye#\x00\x00\x00\x00'                            
'RESbye#\x00\x00\x00\x00\x00\x00\x00\x00'            
-----------------------------------------------------

Apply a format partitionment with a delimiter

According to a quick review of the displayed messages, the character # sounds interesting as it appears in the middle of each message. Thus, a first step in our inference process will be to split each message according to the delimiter #. As stated in the documentation, the function splitDelimiter() plays this role :

def splitDelimiter(field, delimiter):
    """Split a field (or symbol) with a specific delimiter. The
    delimiter can be passed either as an ASCII, a Raw, an
    HexaString, or any objects that inherit from AbstractType.

    :param field : the field to consider when spliting
    :type: :class:`netzob.Common.Models.Vocabulary.AbstractField.AbstractField`
    :param delimiter : the delimiter used to split messages of the field
    :type: :class:`netzob.Common.Models.Types.AbstractType.AbstractType`

So let's use the delimiter # with the function splitDelimiter(). We can latter display the obtained field structure with the _str_debug() method. This method displays an ASCII representation of the symbol structure, thus showing the definition of each field.

# Apply a split by delimiter method on the symbol
Format.splitDelimiter(symbol, ASCII("#"))

# Display symbol structure
print symbol._str_debug()
Symbol
|--  Field-0
     |--   Alt
           |--   Data (Raw='CMDidentify' ((0, 88)))
           |--   Data (Raw='RESidentify' ((0, 88)))
           |--   Data (Raw='CMDinfo' ((0, 56)))
           |--   Data (Raw='RESinfo' ((0, 56)))
           |--   Data (Raw='CMDstats' ((0, 64)))
           |--   Data (Raw='RESstats' ((0, 64)))
           |--   Data (Raw='CMDauthentify' ((0, 104)))
           |--   Data (Raw='RESauthentify' ((0, 104)))
           |--   Data (Raw='CMDencrypt' ((0, 80)))
           |--   Data (Raw='RESencrypt' ((0, 80)))
           |--   Data (Raw='CMDdecrypt' ((0, 80)))
           |--   Data (Raw='RESdecrypt' ((0, 80)))
           |--   Data (Raw='CMDbye' ((0, 48)))
           |--   Data (Raw='RESbye' ((0, 48)))
|--  Field-sep-23
     |--   Alt
           |--   Data (ASCII=# ((0, 8)))
           |--   Data (Raw=None ((0, 0)))
|--  Field-2
     |--   Alt
           |--   Data (Raw='\x07\x00\x00\x00Roberto' ((0, 88)))
           |--   Data (Raw='\x00\x00\x00\x00\x00\x00\x00\x00' ((0, 64)))
           |--   Data (Raw='\x00\x00\x00\x00' ((0, 32)))
           |--   Data (Raw='\x00\x00\x00\x00\x04\x00\x00\x00info' ((0, 96)))
           |--   Data (Raw='\x00\x00\x00\x00\x05\x00\x00\x00stats' ((0, 104)))
           |--   Data (Raw='\n\x00\x00\x00aStrongPwd' ((0, 112)))
           |--   Data (Raw='\x06\x00\x00\x00abcdef' ((0, 80)))
           |--   Data (Raw="\x00\x00\x00\x00\x06\x00\x00\x00$ !&'$" ((0, 112)))
           |--   Data (Raw="\x06\x00\x00\x00$ !&'$" ((0, 80)))
           |--   Data (Raw='\x00\x00\x00\x00\x06\x00\x00\x00abcdef' ((0, 112)))
           |--   Data (Raw='\x04\x00\x00\x00fred' ((0, 64)))
           |--   Data (Raw='\t\x00\x00\x00myPasswd!' ((0, 104)))
           |--   Data (Raw='\n\x00\x00\x00123456test' ((0, 112)))
           |--   Data (Raw="\x00\x00\x00\x00\n\x00\x00\x00spqvwt6'16" ((0, 144)))
           |--   Data (Raw="\n\x00\x00\x00spqvwt6'16" ((0, 112)))
           |--   Data (Raw='\x00\x00\x00\x00\n\x00\x00\x00123456test' ((0, 144)))

Regarding the partitioned messages, this now looks like this:

# Display partitionned messages
print symbol
'CMDidentify'   | '#' | '\x07\x00\x00\x00Roberto'                 
'RESidentify'   | '#' | '\x00\x00\x00\x00\x00\x00\x00\x00'        
'CMDinfo'       | '#' | '\x00\x00\x00\x00'                        
'RESinfo'       | '#' | '\x00\x00\x00\x00\x04\x00\x00\x00info'    
'CMDstats'      | '#' | '\x00\x00\x00\x00'                        
'RESstats'      | '#' | '\x00\x00\x00\x00\x05\x00\x00\x00stats'   
'CMDauthentify' | '#' | '\n\x00\x00\x00aStrongPwd'                
'RESauthentify' | '#' | '\x00\x00\x00\x00\x00\x00\x00\x00'        
'CMDencrypt'    | '#' | '\x06\x00\x00\x00abcdef'                  
'RESencrypt'    | '#' | "\x00\x00\x00\x00\x06\x00\x00\x00$ !&'$"  
'CMDdecrypt'    | '#' | "\x06\x00\x00\x00$ !&'$"                  
'RESdecrypt'    | '#' | '\x00\x00\x00\x00\x06\x00\x00\x00abcdef'  
'CMDbye'        | '#' | '\x00\x00\x00\x00'                        
'RESbye'        | '#' | '\x00\x00\x00\x00\x00\x00\x00\x00'        
'CMDidentify'   | '#' | '\x04\x00\x00\x00fred'                    
(...)    

Cluster according to a key field

Now that we have a first approximation of the decomposition of the symbol in different fields, let's try to regroup some messages together: this is the purpose of the clustering methods in Netzob.

In this example, the first field seems interesting, as it contains some kind of commands (CMDencrypt, CMDidentify, etc.). Let's thus cluster the symbol according to the first field (i.e. group messages that have the same value for the first field). We use the function clusterByKeyField(), that has the following description:

def clusterByKeyField(field, keyField):
    """Create and return new symbols according to a specific key
    field.

    :param field: the field we want to split in new symbols
    :type field: :class:`netzob.Common.Models.Vocabulary.AbstractField.AbstractField`
    :param keyField: the field used as a key during the splitting operation
    :type field: :class:`netzob.Common.Models.Vocabulary.AbstractField.AbstractField`
    :raise Exception if something bad happens

Here, we use the function clusterByKeyField() to generate a list of symbols from the captured messages:

# Apply a cluster by key field method on the symbol
symbols = Format.clusterByKeyField(symbol, symbol.fields[0])

# Display the resulting symbols
print "[+] Number of symbols after clustering: {0}".format(len(symbols))
print "[+] Symbol list:"
for keyFieldName, s in symbols.items():
    print "  * {0}".format(keyFieldName)

The clustering algorithm produces 14 different symbols, where each symbol has a unique value in the first field.

[+] Number of symbols after clustering: 14
[+] Symbol list:
  * RESdecrypt
  * RESbye
  * RESidentify
  * CMDbye
  * RESencrypt
  * CMDidentify
  * RESstats
  * CMDencrypt
  * RESauthentify
  * CMDdecrypt
  * CMDinfo
  * CMDauthentify
  * RESinfo
  * CMDstats

Apply a format partitionment with a sequence alignment on the third field of each symbol

At this step, we have regrouped messages that share the same purpose, and have basic decomposition of each messages in three fields: the command field, the delimiter field (i.e. #) and a third field which seems to have a dynamic size with variable content. Let's now focus on this last field. A field with a dynamic size is a good candidate for what we call in Netzob a "sequence alignment". This feature let us align static and dynamic sub-fields together. To do this, we have the function splitAligned() that has the following documentation:

def splitAligned(field, useSemantic=True, doInternalSlick=False):
    """Split the specified field according to the variations of message bytes.
    Relies on a sequence alignment algorithm.
    (...)

In the following snippet, we want to align the last field of each symbol through a sequence alignment algorithm:

# Apply a format partitionment on the third field (the last one) of each symbol
for symbol in symbols.values():
    Format.splitAligned(symbol.fields[2], doInternalSlick=True)
    print "[+] Partitionned messages:"
    print symbol

For the symbol CMDencrypt, the sequence alignment of the last field produces the following format, where we can observe a static field of \x00\x00\x00 surrounded by two variable fields. The last field seems to be the buffer we want to encrypt, as the key field name suggest (i.e. CMDencrypt).

(...)
[+] Partitionned messages:
'CMDencrypt' | '#' | '\n'   | '\x00\x00\x00' | '123456test'
'CMDencrypt' | '#' | '\x06' | '\x00\x00\x00' | 'abcdef'   
(...)

Find field relationships in each symbol

Now let's try to find relationships in these messages. The Netzob API provides the static function RelationFinder.findOnSymbol(), that allows to identify potential relationships in message fields that are related to the same symbol, as described in the documentation:

def findOnSymbol(symbol):
    """Find exact relations between fields in the provided
    symbol/field.

    :param symbol: the symbol in which we are looking for relations
    :type symbol: :class:`netzob.Common.Models.Vocabulary.AbstractField.AbstractField`
    """

The following snippet shows how to find relationships on our unknown protocol and how to handle the results:

# For each symbol, find potential relationships between its fields
for symbol in symbols.values():
    rels = RelationFinder.findOnSymbol(symbol)

    print "[+] Relations found: "
    for rel in rels:
        print "  " + rel["relation_type"] + ", between '" + rel["x_attribute"] + "' of:"
        print "    " + str('-'.join([f.name for f in rel["x_fields"]]))
        p = [v.getValues()[:] for v in rel["x_fields"]]
        print "    " + str(p)
        print "  " + "and '" + rel["y_attribute"] + "' of:"
        print "    " + str('-'.join([f.name for f in rel["y_fields"]]))
        p = [v.getValues()[:] for v in rel["y_fields"]]
        print "    " + str(p)

Regarding the extracted result below, we have found a relationship in the symbol CMDencrypt between the content of a field (the third one) and the length of another field (the last one, which presumably contains the buffer we want to encrypt).

(...)
[+] Relations found: 
  SizeRelation, between 'value' of:
    Field
    [['\n', '\x06']]
  and 'size' of:
    Field
    [['123456test', 'abcdef']]
(...)

Apply found relationships to the symbol structure

So we just found a field which corresponds to the size of the next field. Regarding this result, we can modify the message format to apply the relationship we have just found. We do this by creating a "Size" field whose value depends on the content of a targeted field. We also specify a factor that basically says that the value of the size field should be one eighth of the size of the buffer field (as every field size is expressed in bits by default).

# For each found relationships for each field, apply the result to the model
for symbol in symbols.values():
    rels = RelationFinder.findOnSymbol(symbol)

    for rel in rels:

        # Apply first found relationship
        rel = rels[0]
        rel["x_fields"][0].domain = Size(rel["y_fields"], factor=1/8.0)

    print "[+] Symbol structure:"
    print symbol._str_debug()

As a result, the CMDencrypt symbol structure now looks like this:

(...)
[+] Symbol structure:
Symbol_CMDencrypt
|--  Field-0
     |--   Data (ASCII=CMDencrypt ((0, 80)))
|--  Field-sep-23
     |--   Data (ASCII=# ((0, 8)))
|--  Field-2
     |--   Data (Raw=None ((0, None)))
|--  |--  Field
          |--   Size(['Field']) - Type:Raw=None ((8, 8))
|--  |--  Field
          |--   Data (Raw='\x00\x00\x00' ((0, 24)))
|--  |--  Field
          |--   Data (Raw=None ((0, 80)))
(...)

We've just displayed the result on the CMDEncrypt symbol, but the same steps can be applied to the other symbols of our protocol. As a result, we are able to retrieve the complete definition of each one.

OK, that's all for the message format inference. As we now understand the structure of each symbol, let's reverse the state machine of the protocol.

State machine inference

Generate a chained states automaton

The first part of the tutorial focused on reversing the protocol message formats. We will now work on reversing the state machine, i.e. the grammar that tells the authorized sequences of messages/symbols. In this part, we generate three kinds of automata by learning the observed sequences of messages. A sequence of messages is represented in Netzob by an object Session. Moreover, when working with symbols (which are an abstraction of a group of similar messages), a sequence of abstracted messages is represented by an abstract session. This object is thus used to infer state machines.

In this section, we will present three approaches of generating automata based on captured PCAP files.

Based on the symbols we have learned, we will first generate a basic automaton that illustrates the sequence of commands and responses extracted from a PCAP file. For each message sent, this will create a new transition to a new state, thus the name of chained states automaton.

# Create a session of messages
session = Session(messages_session1)

# Abstract this session according to the inferred symbols
abstractSession = session.abstract(symbols.values())

# Generate an automata according to the observed sequence of messages/symbols
automata = Automata.generateChainedStatesAutomata(abstractSession, symbols.values())

# Print the dot representation of the automata
dotcode = automata.generateDotCode()
print dotcode

The obtained automaton can be finally converted into Dot code in order to render a graphical version of it.

Automaton result

Generate a one state automaton

This time, instead of converting a PCAP into a sequence of states for each message observed, we generate a unique state that accept any of the observed sent messages to trigger a new transition. In response to each sent message (for example CMDencrypt), we expect a specific response (for example REDencrypt).

# Create a session of messages
session = Session(messages_session1)

# Abstract this session according to the inferred symbols
abstractSession = session.abstract(symbols.values())

# Generate an automata according to the observed sequence of messages/symbols
automata = Automata.generateOneStateAutomata(abstractSession, symbols.values())

# Print the dot representation of the automata
dotcode = automata.generateDotCode()
print dotcode

The obtained automaton is finally converted into Dot code in order to render a graphical version of it.

Automaton result

Generate a PTA-based automaton

Finally, we convert multiple sequences of messages taken from different PCAP files to generate an automaton for which we have merge identical paths. The underlying merging strategy is called a Prefix-Tree Acceptor.

# Create sessions of messages
messages_session1 = PCAPImporter.readFile("target_src_v1_session1.pcap").values()
messages_session3 = PCAPImporter.readFile("target_src_v1_session3.pcap").values()

session1 = Session(messages_session1)
session3 = Session(messages_session3)

# Abstract this session according to the inferred symbols
abstractSession1 = session1.abstract(symbols.values())
abstractSession3 = session3.abstract(symbols.values())

# Generate an automata according to the observed sequence of messages/symbols
automata = Automata.generatePTAAutomata([abstractSession1, abstractSession3], symbols.values())

# Print the dot representation of the automata
dotcode = automata.generateDotCode()
print dotcode

The obtained automaton is finally converted into Dot code in order to render a graphical version of it.

Automaton result

Traffic generation and fuzzing

Generate messages according to the inferred model

We now have a pretty good knowledge of the format message and grammar of the targeted protocol. Let's thus play with this model, by trying to communicate with a real server implementation.

At first, let's start the server in order to discuss with it.

$ cd src_v1/
$ ./server

Ready to read incomming messages

(...)

Then, we create a UDP client that will communicate with the server (on 127.0.0.1:4242) by exchanging messages generated from the infered symbols. In Netzob, an actor is a high-level representation that participates in a communication with a remote peer. This actor is able to send and receive data that respects the state machine (the Automata) as well as the message formats (the Symbols) of a previously learned protocol. In order to convert symbols into concrete messages, or in order to convert received concrete messages into symbols, an abstraction layer is used. This component ensures the specialization of sent symbols and the abstraction of received messages.

# Create a UDP client instance
channelOut = UDPClient(remoteIP="127.0.0.1", remotePort=4242)
abstractionLayerOut = AbstractionLayer(channelOut, symbols.values())
abstractionLayerOut.openChannel()

# Visit the automata for n iteration
state = automata.initialState
for n in xrange(8):
    state = state.executeAsInitiator(abstractionLayerOut)

We go through eight iterations in the automaton.

1454: [INFO] AbstractionLayer:openChannel: Going to open the communication channel...
1454: [INFO] AbstractionLayer:openChannel: Communication channel opened.
1454: [INFO] State:executeAsInitiator: Next transition: Open.
1454: [INFO] AbstractionLayer:openChannel: Going to open the communication channel...
1454: [INFO] AbstractionLayer:openChannel: Communication channel opened.
1454: [INFO] State:executeAsInitiator: Transition 'Open' leads to state: State 1.
1455: [INFO] State:executeAsInitiator: Next transition: Transition.
1455: [INFO] AbstractionLayer:writeSymbol: Going to specialize symbol: 'Symbol_CMDidentify' (id=dbea29b9-7e9f-4c2b-be14-625f675569f3).
1455: [INFO] AbstractionLayer:writeSymbol: Data generated from symbol 'Symbol_CMDidentify': 'CMDidentify#\x03\x00\x00\x00\xfc{\xdb'.
1456: [INFO] AbstractionLayer:writeSymbol: Going to write to communication channel...
1456: [INFO] AbstractionLayer:writeSymbol: Writing to commnunication channel donne..
1456: [INFO] AbstractionLayer:readSymbol: Going to read from communication channel...
1456: [INFO] AbstractionLayer:readSymbol: Received data: ''RESidentify#\x00\x00\x00\x00\x00\x00\x00\x00''
1457: [INFO] AbstractionLayer:readSymbol: Received symbol on communication channel: 'Symbol_RESidentify'
1457: [INFO] Transition:executeAsInitiator: Possible output symbol: 'Symbol_RESidentify' (id=49c24e1c-3751-412e-9f6a-f006a7de7492).
1457: [INFO] State:executeAsInitiator: Transition 'Transition' leads to state: State 2.
1457: [INFO] State:executeAsInitiator: Next transition: Transition.
1457: [INFO] AbstractionLayer:writeSymbol: Going to specialize symbol: 'Symbol_CMDinfo' (id=5eb47a57-eccf-4d06-8231-0b1ae87f96a7).
1458: [INFO] AbstractionLayer:writeSymbol: Data generated from symbol 'Symbol_CMDinfo': 'CMDinfo#\x00\x00\x00\x00'.
1458: [INFO] AbstractionLayer:writeSymbol: Going to write to communication channel...
1458: [INFO] AbstractionLayer:writeSymbol: Writing to commnunication channel donne..
1458: [INFO] AbstractionLayer:readSymbol: Going to read from communication channel...
1458: [INFO] AbstractionLayer:readSymbol: Received data: ''RESinfo#\x00\x00\x00\x00\x04\x00\x00\x00info''
1462: [INFO] AbstractionLayer:readSymbol: Received symbol on communication channel: 'Symbol_RESinfo'
1462: [INFO] Transition:executeAsInitiator: Possible output symbol: 'Symbol_RESinfo' (id=b41502e3-21ea-4cb9-9c1e-dc171f715685).
1462: [INFO] State:executeAsInitiator: Transition 'Transition' leads to state: State 3.
1462: [INFO] State:executeAsInitiator: Next transition: Transition.
(...)

Regarding the real server, we can see that received messages are well formated, as the server is able to parse them and send correct responses.

$ ./server 

Ready to read incomming messages
-> Read: CMDidentify#.
   Command: CMDidentify
   Arg size: 2
   Arg content: ..
<- Send: 
   Return value: 0
   Size of data buffer: 0
   Data buffer: 
    ""

-> Read: CMDinfo#
   Command: CMDinfo
   Arg size: 0
<- Send: 
   Return value: 0
   Size of data buffer: 4
   Data buffer: 
   DATA: 69 6e 66 6f                                        "info"

-> Read: CMDstats#
   Command: CMDstats
   Arg size: 0
<- Send: 
   Return value: 0
   Size of data buffer: 5
   Data buffer: 
   DATA: 73 74 61 74 73                                     "stats"

-> Read: CMDauthentify#.
   Command: CMDauthentify
   Arg size: 6
   Arg content: ......
<- Send: 
   Return value: 0
   Size of data buffer: 0
   Data buffer: 
    ""

-> Read: CMDencrypt#.
   Command: CMDencrypt
   Arg size: 2
   Arg content: ..
<- Send: 
(...)

Do some fuzzing on a specific symbol

Finally, we voluntarily twist the message format of the CMDencrypt symbol, in order to try some fuzzing. The format modification corresponds to an extension of the size of the buffer field (i.e. the one which receives the data to encrypt).

def send_and_receive_symbol(symbol):
    data = symbol.specialize()
    print "[+] Sending: {0}".format(repr(data))
    channelOut.write(data)
    data = channelOut.read()
    print "[+] Receiving: {0}".format(repr(data))

# Update symbol definition to allow a broader payload size
symbols["CMDencrypt"].fields[2].fields[2].domain = Raw(nbBytes=(10, 120))

for i in range(10):
    send_and_receive_symbol(symbols["CMDencrypt"])

We can see that Netzob is only sending CMDencrypt messages with a potentially long last field:

[+] Sending: 'CMDencrypt#6\x00\x00\x00&\xe0*\xb3\xa8A(\x0b\xd2yA\xb5\xb8\rw\x0fGi\xee\xb3\xd6\xb0<\xfc\xc0\xa7m\xbd\xbc\xde2~\xceE\xe5\xda@\xd4\xed\xed\xf2\xb4\xe7\t\xfbC\xbf\x05\xc6\xce\xfb\x83\xf2\x00'
(...)

On the server part, we quickly get a segmentation fault, due to a bug in the parsing of the last field.

$ gdb ./server
(gdb) run
Starting program: /home/fgy/travaux/netzob/git/netzob-resources/experimentations/tutorial_target/src_v1/server 

Ready to read incomming messages
(...)
-> Read: CMDencrypt#6
   Command: CMDencrypt
   Arg size: 54
   Arg content: &?*??A(
wGi???<???m???2~?E??@????????    ?C??

Program received signal SIGSEGV, Segmentation fault.
0x08048bc0 in api_encrypt (in=0x45ce7e32 <Address 0x45ce7e32 out of bounds>, len=3561020133, out=0xb4f2eded <Address 0xb4f2eded out of bounds>) at amo_api.c:80
80        tmpData[i] = (in[i] ^ key) % 0xff;

And that's all folks for this introduction tutorial! You guys should now be able to use Netzob to infer the message format and even the grammar when necessary.

Regarding this tutorial, remember you can get the entire source code of the script used to infer and play with the protocol here: https://dev.netzob.org/attachments/download/183/inference_target_src_v1.py

Also, don't forget to read the API documentation or talk to the netzob devel team on IRC (#netzob on Freenode) if you have any question.