Archive

Posts Tagged ‘python’

Updated analyzeMFT, $MFT sequence numbers, and NTFS documentation

February 10, 2010 Leave a comment

analyzeMFT updates:

At the request of Harlan Carvey and Rob Lee I made some changes to analyzeMFT and fixed a few bugs along the way.

  • Version 1.1: Split parent folder reference and sequence into two fields. I’m still trying to figure out the significance of the parent folder sequence number, but I’m convinced that what some documentation refers to as the parent folder record number is really two values – the parent folder record number and the parent folder sequence number.
  • Version 1.2:
    • Fixed problem with non-printable characters in filenames. Any Unicode character is legal in a filename, including newlines. This presented some problems in my output. Characters that do not render well are now converted to hex and a note is added to the Notes column indicating this.
    • Added “compile time” flag to turn off the inclusion of any GUI related modules and libraries for systems missing tk/tcl support. (Set noGUI to True in the code)
  • Version 1.3: Added new column to hold log entries relating to each record. For example, a note stating that some characters in the filename were converted to hex as they could not be printed

The code and more details are available at www.integriography.com

Quick note on $MFT sequence numbers:

Microsoft tells us that each record in the $MFT has a FILE_RECORD_SEGMENT_HEADER Structure. Within this structure is a sequence number, defined as follows:

“This value is incremented each time that a file record segment is freed; it is 0 if the segment is not used.”

Ok, that’s pretty straightforward. At least until you look at teh first 16 entries in any $MFT as all of their sequence numbers match their record number. I’ve been told that since these files can never be deleted, repurposing the sequence number adds an additional sanity check and disaster recovery option. However, I’ve found one volume where this behavior continues for 12,000 records or more. Still looking into that one.

NTFS Documentation:

One of the best sources for NTFS documentation isn’t Microsoft, it comes from the Linux NTFS developers and is available here.

Advertisement
Categories: analyzeMFT Tags: , ,

analyzeMFT – a Python tool to deconstruct the Windows NTFS $MFT file

January 20, 2010 Leave a comment

Three elements combined last week to inspire me to write a tool to deconstruct the Windows NTFS $MFT file:

  1. I’ve been wanting to learn Python for quite awhile. (I found a “Learning Python” book on my shelf published in 1999.
  2. Mark Menz’s MFT Ripper started me wondering about the significance of the MFT sequence number.
  3. I’d been trying to get through the SANS 508.1 book but couldn’t bear to read about NTFS structures yet again.
  4. I wanted to start building a framework for doing more detailed timeline analysis.

So, last week I sat down and wrote analyzeMFT.py.  Please keep in mind that this is a novice Python programmer’s code and is definitely a work in progress. A simple project page and a link to the source can be found here.

If you have any comments, suggestions, or improvements, please do let me know. I’d like to keep building on this and making it as useful as possible.

Categories: analyzeMFT Tags: , , ,

Using Python to parse and present Windows 64 bit timestamps

January 16, 2010 1 comment

I’m working on learning Python since Perl, even after 20 years, still doesn’t stick in my head. The phrase “like a duck to water” doesn’t quite apply to my experience with Python, but I’m certainly swimming along nicely.

Since I learn languages and tools more effectively when I have a real problem to work on, I set out to parse the NTFS MFT record using Python. This was going swimmingly until I hit the file MAC times. According to Microsoft:

“A file time is a 64-bit value that represents the number of 100-nanosecond intervals that have elapsed since 12:00 A.M. January 1, 1601 Coordinated Universal Time (UTC).”

So, two problems: 1) It is a 64 bit value and python only has 32 bit longs and b) it doesn’t use the Unix definition of epoch.

After a lot of false starts, I learned the power of dictionaries and classes and came up with the following:

from datetime import date, datetime
class WindowsTime:
    def __init__(self, low, high):
        self.low = long(low)
        self.high = long(high)
        self.unixtime = self.GetUnixTime()
        self.utc = datetime.utcfromtimestamp(self.unixtime)
        self.utcstr = self.utc.strftime("%Y/%m/%d %H:%M:%S.%f")
# Windows NT time is specified as the number of 100 nanosecond intervals since January 1, 1601. 
# UNIX time is specified as the number of seconds since January 1, 1970. 
# There are 134,774 days (or 11,644,473,600 seconds) between these dates.
    def GetUnixTime(self):
        t=float(self.high)*2**32 + self.low
        return (t*1e-7 - 11644473600)

I have a module that defines a dictionary and parses chunks of data read from the MFT:

# Decodes the Standard Information attribute in a MFT record
def decodeSIrecord(s):

 d = {}
 d['crtime'] = WindowsTime(struct.unpack("<L",s[:4])[0],
                           struct.unpack("<L",s[4:8])[0])
 ...

Then just do:

SIobject = decodeSIrecord(data)

And you can print the times out directly from the dictionary:

print "CRTime: %s" % (SIobject['crtime'].utcstr)

This isn’t rocket science, and there’s probabaly a better way to do it, but if you’re trying to use Python to work with Windows filesystems and metadata, this could come in handy.