DEF CON Forum Site Header Art
DEF CON Forum Site Header Art


No announcement yet.

ParseAndC Demolab at DEF CON 29

  • Filter
  • Time
  • Show
Clear All
new posts

  • ParseAndC Demolab at DEF CON 29

    ParseAndC - A Universal Parser and Data Visualization Tool for Security Testing

    Short Abstract:
    Parsing is the process of extracting the data values of various fields by mapping the data format (known) onto the datastream (known) from a certain offset (known). Parsing is often an integral part of hacking - even when we do not know the exact format of the data, we still have some vague idea, and we want to parse the data based on our assumed data format to see if our hunch is true. While it is trivial to write a parser that will output the values corresponding to the fields of a single C structure, that parser becomes useless if now we have to deal with a different C structure. A parser that can handle any and all C structures as its input is essentially a compiler, since even C header files contain enough complexity (#define constants, macros calling macros, variadic macros, conditional code via #if-#else etc., included files, packed/aligned attributes, pragmas, bitfield, complex variable declarations, nested and anonymous structure declaration etc.). This tool is capable of mapping any C structure(s) to any datastream from any offset, and then visually displaying the 1:1 correspondence between the variables and the data in a very colorful, intuitive display so that it becomes very easy to understand which field has what value.

    This tool is extremely portable - it is a single 800KB Python text file, supports all versions of Python, is cross-platform (Windows/Mac/Unix), and also works in the terminal /batch mode without GUI. For multi-byte datatypes (e.g. integer or float) it supports both endianness (little/big) and displays value in both decimal and Hex formats. The tool needs no internet connection and works fully offline. It is self-contained - it doesn't import almost anything, to the extent that it implements its own C compiler (front-end) from scratch!!

    This tool is useful for both security- and non-security testing alike (reverse engineering, network traffic analyzing, packet processing etc.). It is currently being widely used at Intel, and in the users' own words, this tool has reduced their days' work into minutes. The author of this tool led many security hackathons at Intel and there this tool was found to be very useful.

    Short Developer Bio:
    Parbati Kumar Manna got his Bachelor of Technology from Indian Institute of Technology, Kharagpur in 1997. After spending a bit of time in the software industry, he went back to school to earn his MS and PhD in Computer Science from University of Florida in 2008. His dissertation involved the creation and detection of some of the smartest malware (particularly internet worms) that leave minimal footprint during their spread yet propagate at the maximal speed. After his PhD he joined the premier security group within Intel, working with other like-minded security researchers looking over the security of various Intel products, including hardware, firmware and software. He has published and reviewed in eminent conferences and journals.

    URL to any additional information:
    The tool has just been open-sourced, but no public announcement has been made (don't want to steal the thunder from DEFCON)

    Detailed Explanation of Tool:
    If one knows the data format of any datastream (basically, if you have access to the source code), parsing is easy since it takes <5 minutes to write a parser for a C structure. However, if one's job involves looking at many different datastreams, each with a different data format (basically, a different C structure), then this process becomes very tedious as you have to write a fresh parser for every new structure. As part of the Intel's in-house core hacking team, this author faced this very problem where he had to parse many different datastreams based on their individual data formats. So, to rid himself of the trouble of writing a new parser every time, he chose to write a tool that can parse any datastream with any data format (a C structure) with just two clicks.

    The other big problem that this tool handles is the data visualization. The problem is, not every time we have a 1:1 mapping between code and data - we can have one-to-many relationship (for arrays), and can have many-to-one relationship (many union members pointing to same chunk of data). For example, if we have a single line of code like int a[30][40][50];, suddenly for a single line of code we have sixty thousand chucks of 4-byte data. This tool handles these many-to-one and one-to-many relationships between code and data very gracefully (just try hovering your cursor over the variables in the Interpreted code window or the data windows, and you will see). Also, if you double-click on any variable, it will re-display the datastream centered around the place where the variable maps to. Similarly, if you double-click on any data byte, it will scroll the Interpreted code window to pinpoint to the variable(s) that map to that data.

    You can see all that just by clicking the "Run Demo" button on the tool. :-)

    Supporting Files, Code, etc:
    The tools needs no supporting file to run. To show its capability, just run the Demo (see below how). There is a huge README explaining everything right at the top of the script itself (the same README is also available in the Open Source repo, but in case you don't have time to read that, below is a TL;DR version.

    Just download the tool source (a single Python file) anywhere (Windows/Linux/Mac), run it using Python 2 or 3, and click on the "Run Demo" button on top right corner. It will load a datafile (the tool script itself), choose a builtin data format (expressed via C structures and variable declarations), compile/Interpret that code and finally map the variables in the data format onto the data file. Once this happens, the Interpreted code window and the Data window will contain colorful items. Just hover your cursor over those colorful items (or double-click) and see the magic happen!

    There is also a bottom window which lays out a Tree-like view of the data format. You can expand/collapse all the structures and arrays in the data format here using left/right arrows (or mouse click).

    It also creates a snapshot.csv file with all the data format variables with their values. It also prints the same in the background (console).

    The tool is currently in Beta stage (a lot of new features have been added lately), but it will absolutely be mature during the actual conference time.

    Target Audience:
    The target audience for this tool is pretty broad - it involves both White Hat and Black Hat researchers alike. Basically, anybody who tests C programs, or reverse engineers any datastream produced from a C program will find this tool extremely useful. Examples of actual usage of this tool are noted below.

    White Hat Testing (has access to source code):
    At Intel, of course we have access to our own source code, so we do not need to speculate about the data format of Intel products. In Intel, this tool has found its wide usage in driver testing, network packet analyzing, firmware reversing etc. where the testers use this tool to confirm that we are indeed observing the intended value in the datastream.

    Black Hat Testing (no access to source code):
    An example of how this tool is useful for even Black Hat hackers is as follows. Suppose you believe that a certain executable or datastream should begin with a certain magic number, followed by version number, followed by a header, followed by data etc. So, you can just write a C structure corresponding to your "hunch", and then use this tool to map that hypothetical structure onto the datastream to see if the values corresponding to the fields "make sense" visually. This is where the visualization part of this tool comes as immensely useful - you can hover your cursor on top of any variable and see its corresponding data value, or hover your cursor over any data byte and see its corresponding variable(s). If some of the supposed fields in the structure make sense but others do not, you know for which fields you have hit the jackpot, and for which you didn't. So, you modify your structure accordingly and just two more clicks will give you the new visualization of the mapped data with the new structure. This way, you can use this tool iteratively to figure out the format of the datastream.

    To summarize, this is a tool that has been widely used at Intel for both security testing and regular non-security testing for the last two years.

    This tool, per se, is not targeted ONLY for security, but it has been proven to be extremely useful for security research (just like the case of a binary disassembler).

    For the past couple of years, it has been used at Intel for both kinds of testers: Security researchers and regular non-security folks. Both groups of people found the tool to be extremely useful.

    To the best of the author's knowledge, no such hacking tool currently exists. Thus, this tool can definitely contribute to a new perspective to DEF CON.
    PGP Key: