First steps in converting analyzeMFT to a Python module, plus improved error handling

I started rewriting analyzeMFT so that it can be loaded as a module and called from other programs. The primary reason is to enable including it in plaso, but perhaps other programs will find a need for it.

The work isn’t done yet, but it is usable as a standalone program still and it has some improved handling of corrupt MFT records so I decided to release it.

Quick install:

Once I finish the work I’ll also make a zip file available.

Notes:

  1. All output between the new and old version is identical except in cases where records are corrupt or incomplete. In those cases, the new output is more accurate.
  2. There is a lot of strangeness going on in MFT records. In restructuring analyzeMFT, I found a number of conditions that I failed to check for but which accidentally didn’t throw errors. For example, there are MFT records with no Standard Information attributes.
  3. Detection of Orphan records, my term, has been improved. Additional research is required to determine what causes them to occur.
  4. Processing time improved slightly

Improved bodyfile support

April 26, 2013 3 comments

With more thanks to Jamie for the prompting, I’ve improved bodyfile support in the latest version of analyzeMFT.

  • You can now specify just a bodyfile for output and do not need to create a normal output file as well.
  • The real (not allocated) file size is included
  • If you use the –bodypath option, it writes out the full path to the file rather than just the file name
  • If you use the –bodystd option, it uses the STD_INFO timestamps rather than just the FN timestamps. I find STD_INFO to be more interesting….

This is a pretty significant fix and I would suggest upgrading if you create timelines with analyzeMFT.

Links:

Git: git clone https://github.com/dkovar/analyzeMFT.git
Code: https://github.com/dkovar/analyzeMFT/blob/master/analyzeMFT.py

Updated analyzeMFT – fixed MFT record number reporting

When I originally wrote analyzeMFT I assumed that the MFT record numbers would start at zero and politely increase by one for each record so “recordNumber = recordNumber + 1″ would be valid. Happily, this worked, apparently for years. That is, until Jamie threw corrupted MFT files at it, such as MFT records extracted from memory.

  1. The sequence numbers had gaps
  2. If there was a gap, then the actual sequence number wouldn’t match the reported sequence number
  3. Determination of the file path might be off as the parent record number pulled from the entry might now point to the wrong entry

Oooops.

This has been fixed.

I also fixed the handling of orphan files, those files that had a null parent or whose parent was a file.

This is a pretty significant fix and I would suggest upgrading.

Links:

Git: git clone https://github.com/dkovar/analyzeMFT.git
Code: https://github.com/dkovar/analyzeMFT/blob/master/analyzeMFT.py

DFIR Fiction Reading List

January 12, 2013 4 comments

The Digital Forensics and Incident Response fiction reading list, in no particular order:

  • Ender’s Game – Orson Scott Card
  • Jumper and Reflex - Steven Gould
  • Most anything by John Grisham
  • Daemon – Daniel Suarez 
  • Zero Day and Trojan Horse - Mark Russinovich (yes, that Mark)
  • Pretty much anything on the Access Data or Guidance Software support web sites
  • Blue Nowhere - Jeffrey Deaver
  • Halting State and Rule 34 – Charles Stross
  • Zero History – William Gibson
  • American Gods – Neil Gaiman
  • The Magicians – Lev Grossman
  • The Night Watch Trilogy - Sergei Lukyanenko
Categories: Uncategorized

Dissecting a Blackhole 2 PDF (mostly) with peepdf.

November 19, 2012 5 comments

I’m fairly new to malware analysis having spent most of the last ten years doing IT consulting, computer forensics, ediscovery, and some related work. I’m now doing a lot of incident response and am taking on some malware analysis responsibilities, at least on a triage and management level.

We got phished the other day, and a rather nice phish it was. Kudos to the mail team for shutting it down quickly and to an alert user who escalated it as well. Some quick dynamic analysis led to a PDF, and there our story starts.

If you open the PDF up you’ll get a rectangle and possibly an error message.

So what do we as analysts do with this? There are a lot of analysis tools out there and I worked my way through quite a few of them, partly just to see what worked, and how they worked. The one I ended up using was peepdf. I barely scratched the surface of its capabilities, but the following features sold me on it for this project:

  • It handled the malformed references that many other PDF analysis tools failed to handle.
  • It is command line based and scriptable. If you develop a peepdf workflow, put that into a file and execute it each time
  • Search raw and decoded objects
  • Spidermonkey built in

So, I loaded my malicious PDF into peepdf and got the following output:

Well, that’s pretty simple, there is only one object with JavaScript in it. Let’s take a look at it:

One thing immediately leaps out at you and another one follows soon after that. First off, this is very ugly code. I mean, rather than just saying “ff = charCode” the construction of ff is broken up over multiple lines. This is classic obfuscation technique, though a lot easier to detect in JavaScript than assembly code. If you look through this code you’ll start to see other similar techniques and will eventually be able to see some pretty simple structure. A hint – everything between /* */ pairs is a comment.

I cleaned up the code a bit and rewrote it in pseudocode (because I don’t know JavaScript yet) to try to figure out what was going on. As you run into JavaScript calls just do a Google search for them, just as you would for Windows APIs. You don’t need to be an expert programmer to figure out what the code is doing. In this case, the bulk of the code amounts to this:

s = s + char(str(int(concat(b1, b2), 0x1a)))

It is building a string up by concatenating pairs of bytes, converting that to an integer with radix 0x1a, and converting that to a string and then into a character.

So that is the second part, a decryption routine, but it needs something to decrypt. I guessed that the first part located the stuff to decrypt and that it found it using the keyword “creation date”. So, how to find that stuff? Back to peepdf:

The search for “creation date” didn’t turn up anything, but searching for “creation” produced hits in object 3 and object 43. We’re already looking at object 43 so let us see what is in object 3. Lo and behold, there is CreationDate and a lot of … stuff. Working on the assumption that the code in object 43 will decode the stuff in object 3, I proceeded as follows. (Yes, there is probably a way to do this all within peepdf, but I’m still learning how to use spidermonkey properly so I took this route.)

First, dump object 3 out.

PPDF> object 3 > object3.txt

Write a Python version of the code in object 43:

Strip the noise off the front of object3.txt (“<< /Title asdasdsad/CreationDate %#^&*%^#@&%#@3″) and then run the Python  code against the object 3 stuff we saved earlier:

> jsparse.py -f object3.txt > object3.js

And then jump back into peepdf and clean up the newly created code:

PPDF> js_beautify file object3.txt > object3-clean.js

This illustrates one of the things that I love about peepdf – it includes a lot of very useful functionality in the application so you don’t need to jump in and out of the tool all the time. (My foray into Python is due to my own issues and not peepdf’s.)

object3-clean.js now contains the second stage of the malicious PDF.

There is a lot more that can be done with this, such as noticing that the JavaScript coding style looks a lot like the php code used elsewhere in this phishing attack, but I’ll leave that and leave decoding the second stage for another day. Readers interested in carrying on will note that var1 and var2 are awfully similar and may be headers for shellcode.

This was a pretty high level run through of a relatively simple problem done by someone rather new to the subject, but hopefully it left you with the confidence to dive into this sort of thing yourself. There are a lot of good tools out there, lots of examples to work on, and many good people to help you out. (Tip of the hat to Willi and to the folks from rem-alumni.)

It isn’t APT, it is SASPDT – Sometimes Advanced, Sometimes Persistent, Definitely a Threat.

November 19, 2012 1 comment

I’m human (thankfully) and I get irked by simple things at times. Today it due to conversations such as this one:

Them: “That malware wasn’t very advanced, it is just a version of <insert commodity malware here>”
Me: “Interesting. What’d they do with it?”
Them: “Moved laterally to our domain controller, dumped all the hashes, and shipping them out via FTP.”
Me: <silent>

OK, so it isn’t APT, it is SASPDT – Sometimes Advanced, Sometimes Persistent, Definitely a Threat.

“Advanced” isn’t required if they (insert your favorite description of the threat actor) can get into your environment using commodity malware, move laterally and collect sensitive data due to poor security controls, and exfiltrate the data via FTP because you don’t have any DLP in place. Similarly, “Persistent” isn’t required if they can phish their way in at will.

As long as the less sophisticated attacks will work, there is no need for malicious actors to deploy more advanced tools. Why was Stuxnet used on Iran and why aren’t you seeing Stuxnet in your environment? Because the attackers needed something sophisticated to get into the Iranian nuclear program environment but don’t need the same level of sophistication to get into your environment.

I normally don’t get too hung up on the term “APT”. For me, it is a convenient shorthand for “groups of often well funded malicious threat actors who may or may not be state sponsored but who are definitely capable of breaking into most environments and taking sensitive data.” Dismissing an attack because it wasn’t advanced, or because it didn’t come from China, seems unwise to me. If they pose a significant risk to your business, then they’re DT – definitely a threat.

Digital photography and social networking anti-forensics

November 11, 2012 Leave a comment

I attended a superb class on OSINT the other week. One of the topics covered using geolocation data in digital photographs found on social networking sites to gather intelligence on suspects.

Geolocation is all the rage, and numerous complaints and even lawsuits have been directed towards companies collecting and (mis)using geolocation data. Despite this, the public is sharing more of their location data every day, and companies are spinning up new services to encourage them to do so. Photographs are one of the primary sources for geolocation data, and Flikr, Facebook, and Instagram are but some of the major players making use of the data. Many of the services accept an uploaded photograph, store the geolocation data for their own use, and then strip it out of the photograph so that users can only see what the service presents to them.

But what if you lie to the service? You can do so through some of their GUIs, but there is a better way – lie in the data you upload.

These four photographs were taken by me in Prague of this year.

But, when I run the following command:

./spoofexif.py -sd 01/01/2001 -ed 12/31/2010 -sh 0 -eh 8 -l “Waya, Fiji” -b 50 -d photos

and then upload the resulting photographs to Flickr, they appear to be taken in Fiji sometime in the last decade, always between midnight and 8AM.

Flickr’s representation of the spoofed images.

 

At the moment, spoofexif.py can do the following

usage: spoofexif.py [-h] [-sd BEGINDATE] [-ed ENDDATE] [-sh BEGINHOUR]
 [-eh ENDHOUR] [-l LOCATION] [-b BOXSIDE]
 [-d DIRECTORY | -i IMAGES]
optional arguments:
 -h, --help show this help message and exit
 -sd BEGINDATE Start date
 -ed ENDDATE End date
 -sh BEGINHOUR Start hour
 -eh ENDHOUR End hour
 -l LOCATION Location to place photograph
 -b BOXSIDE Length of side of bounding box
 -d DIRECTORY Directory containing photographs to modify
 -i IMAGES Name of image file to modify
  • It takes a date range and randomly spreads the specified photos out over the entire range. You can modify a set of winter photographs with “-sd 11/01/2006 -ed 02/28/07″
  • It takes an optional pair of hours and spreads the photos out over that range of hours. So, if you have a collection of dinner photos, you’d use “-sh 18 -eh 23″
  • It takes a location and spreads the photographs randomly in a box with a side of length -b. This allows you to scatter your Prague photos around Paris, or to make it appear that you vacationed in Fiji when you were really in New Jersey.
  • And you can specify an entire directory or just a single file. You can send your entire photo library back in time ….

I can detect my own modifications, and it has given me some ideas on how to detect other people’s. But can Facebook, Twitter, or Instagram detect your time machine/teleporter?

Code available by request.

Follow

Get every new post delivered to your Inbox.

Join 35 other followers