The Digital Forensics and Incident Response fiction reading list, in no particular order:
- Ender’s Game – Orson Scott Card
- Jumper and Reflex – Steven Gould
- Most anything by John Grisham
- Daemon – Daniel Suarez
- Zero Day and Trojan Horse – Mark Russinovich (yes, that Mark)
- Pretty much anything on the Access Data or Guidance Software support web sites
- Blue Nowhere – Jeffrey Deaver
- Halting State and Rule 34 – Charles Stross
- Zero History – William Gibson
- American Gods – Neil Gaiman
- The Magicians – Lev Grossman
- The Night Watch Trilogy – Sergei Lukyanenko
I attended a superb class on OSINT the other week. One of the topics covered using geolocation data in digital photographs found on social networking sites to gather intelligence on suspects.
Geolocation is all the rage, and numerous complaints and even lawsuits have been directed towards companies collecting and (mis)using geolocation data. Despite this, the public is sharing more of their location data every day, and companies are spinning up new services to encourage them to do so. Photographs are one of the primary sources for geolocation data, and Flikr, Facebook, and Instagram are but some of the major players making use of the data. Many of the services accept an uploaded photograph, store the geolocation data for their own use, and then strip it out of the photograph so that users can only see what the service presents to them.
But what if you lie to the service? You can do so through some of their GUIs, but there is a better way – lie in the data you upload.
These four photographs were taken by me in Prague of this year.
But, when I run the following command:
./spoofexif.py -sd 01/01/2001 -ed 12/31/2010 -sh 0 -eh 8 -l “Waya, Fiji” -b 50 -d photos
and then upload the resulting photographs to Flickr, they appear to be taken in Fiji sometime in the last decade, always between midnight and 8AM.
At the moment, spoofexif.py can do the following
usage: spoofexif.py [-h] [-sd BEGINDATE] [-ed ENDDATE] [-sh BEGINHOUR] [-eh ENDHOUR] [-l LOCATION] [-b BOXSIDE] [-d DIRECTORY | -i IMAGES] optional arguments: -h, --help show this help message and exit -sd BEGINDATE Start date -ed ENDDATE End date -sh BEGINHOUR Start hour -eh ENDHOUR End hour -l LOCATION Location to place photograph -b BOXSIDE Length of side of bounding box -d DIRECTORY Directory containing photographs to modify -i IMAGES Name of image file to modify
- It takes a date range and randomly spreads the specified photos out over the entire range. You can modify a set of winter photographs with “-sd 11/01/2006 -ed 02/28/07”
- It takes an optional pair of hours and spreads the photos out over that range of hours. So, if you have a collection of dinner photos, you’d use “-sh 18 -eh 23”
- It takes a location and spreads the photographs randomly in a box with a side of length -b. This allows you to scatter your Prague photos around Paris, or to make it appear that you vacationed in Fiji when you were really in New Jersey.
- And you can specify an entire directory or just a single file. You can send your entire photo library back in time ….
I can detect my own modifications, and it has given me some ideas on how to detect other people’s. But can Facebook, Twitter, or Instagram detect your time machine/teleporter?
Code available by request.
Dan Mares has been writing command line utilities for computer forensics, ediscovery, and other purposes for years. The quality and capability of each utility demonstrates how long he’s been doing this, and how well he knows these fields. Unfortunately, his site now has a warning that reads “All Maresware is command line driven, and as such has gone out of style so it is being discontinued.” I’m here to say that the command line is a long way from going out of style for a significant number of us.
First off, I went through college earning most of my CS degree on Linux. The command line is an old friend, and stringing processes together with utilities is second nature. But even if you’re fresh out of college and have never seen Linux you will quickly find that the GUI driven tools just don’t cover all of your needs, and probably never will. This is particularly true if you’re working with a client on a small budget or a client who lacks in house litigation support. Why? You can’t deliver your work via load files or an expensive review platform. Instead, you need to send over zip files and massage the contents so they can be reviewed with commonly available applications. But even in large ediscovery and forensics projects, the GUI driven tools don’t give you 100% coverage.
Case in point. Using dtSearch I had identified 700 files spanning four volumes mounted using FTK Imager. The list of files was in a single text file. I needed to pack all of these files up in multiple zip containers due to bandwidth issues for delivery to a client without modifying their MAC times. And, by the way, the filenames weren’t unique so I couldn’t just zip them up, and I couldn’t copy them to one location and then zip up that location. I also couldn’t put them in a traditional evidence container using FTK Imager because the client didn’t have FTK Imager or MIP.
I eventually wrote my own utility that drove xxcopy ’cause robocopy is designed for directories, not files and xcopy doesn’t preserve MAC times and neither of them will take a list of files to work on as a command line option. It got the job done, but I spent a lot of time thrashing around before I stumbled on this.
Enter Dan Mares and the upcopy utility. It has an incredible number of useful options, but for my purposes, three really stood out:
- It preserves MAC times
- The –flatten option will take a tree structure and copy all the files to a single directory
- The –nodupe option will detect duplicate files that would result in name collisions and add a unique suffix to each duplicate file
http://www.dmares.com/index.htm (follow the various links in the direct links section.)
Please note that, despite the disclaimer, Dan is still actively supporting his tools and is still very active in the community.
We’ve all seen articles about the looming death of forensics due to the increase in data volume and data containers. The calmer folk generally just chuckle and get back to work, knowing that they’re gainfully employed for as long as they wish to work. For the less calm, and just to give everyone a few more things to think about, let me offer the following three thoughts:
1) As data volumes and the number of devices increase, clients may need to be willing to pay more for the analysis. The cost of the work isn’t nearly proportional to the number of custodians these days. Just because data volumes are increasing doesn’t mean that the work doesn’t need to be done. The successful practitioners will be the ones who figure out how to process all that data while keeping their clients happy.
2) Then again, does all the data need to be processed immediately? The successful practitioner may also be the ones who successfully triage the problem and can defend those triage decisions to their client and in court. Just because you don’t process all the data immediately doesn’t mean you cannot go for a deeper look later when justified.
3) Approaching the problem as a team rather than as an individual will yield better results. In addition to splitting the problem over multiple cores (technical solution), split the problem over multiple people (organizational solution), each with deep domain knowledge and appropriate skills. The amount of work done by each individual may go down a bit, the total work done by the team will scale with the volume of data and number of devices, and there will be some additional overhead due to coordination. The overall efficiency, given a good team, should increase quite a bit. I know I’m much more efficient with additional eyes on the problem working in concert. The solo practitioner may need to limit the jobs they take on, or form partnerships that allow them to share the work efficiently.
The problem is hardly insurmountable, and in any such challenge there are opportunities. We can wail and gnash our teeth or we can quietly (or, if you’re in marketing, noisily) step up and meet the challenge, ensuring quality services for our clients and a secure job for ourselves.
So, there I was …. Or, in other words, once upon a time. Or, …. Anyhow, I’m off doing a really “interesting” collection job. Its a mix of ediscovery and forensics, with all the typical issues – custodians available only for a day, unexpectedly large hard drives, systems that cannot come down at all, 3 Sony Vaios with just one power cord, etc. And, par for the course, no real idea of what I’m getting into prior to showing up on site, despite efforts to gather information. So, what made this fun collection rather than a nightmare? The ultimate collection kit:
- WinFE with FTK Imager, IEF, and X-Ways. This successfully imaged a Vaio laptop with dual SSDs in a RAID configuration without a hitch.
- Tableau TD1 – if this thing would write to multiple destination drives simultaneously, I’d kiss it. Even without the dual destinations, it is a rock solid imaging solution. (Bring a USB keyboard to make things a bit easier.)
- FTK Imager CLI – Ok, I know how to use dd and its brethren, but FTK is a bit more full featured, and being able to use one software tool across all the platforms was great.
- FTK Imager – FTK Imager doing logical folder collections made packaging the loose files very easy. And, again, one software tool.
- It will boot any Intel system, including Macs.
- It is forensically sound
- It is (relatively) easy to add your own tools
> diskpart (to run DiskPart) > list disks (to see the media connected to the system) > select disk “N” (where “N” is number of your destination drive) > online disk (to bring the disk online) > attributes disk clear READONLY (to allow writing to the disk) > list volume (in order to choose the volume on the destination disk to write) > select volume “V” (where “V” is the volume number to your destination disk) > attributes volume clear READONLY (to allow writing to the volume) > assign letter=Z (any letter you choose, to which your image will be written
Of course, there are all sorts of other things in my collection kit – two Pelican cases full of stuff, in fact, but everything mentioned here will fit in one case and will allow you to handle quite a bit of what might be thrown at you.
I started in the digital forensics community about five years ago, and I already feel old, and I am a Johnny-come-lately. This post may come off as a “Hey, you kids, get offa my lawn!” rant. Rather than a rant, I really hope that people start talking about a way to find a small number of safe lawns for all the kids to play on.
In those five years I’ve noticed that the computer forensics community has become *less* supportive, not more supportive. This runs contrary to trends to other communities such as software engineering tools, web frameworks, and startups. I have some feelings and thoughts on why this is. I wish I had some good ideas on how to turn this trend around.
I think there are four major problems:
1) Fragmentation of the sites supporting the community.
When I showed up, there was Forensic Focus, the CCE list, and HTCIA. (And other people probably had their three or four sources that don’t overlap with mine.) Now, I’ve got Forensic Focus, CCE, HTCIA, HTCC, DFCB, wn4n6s, and a host of OS and tool specific sites. Then there is LinkedIn, with an almost one to one mapping of all the external groups, plus subgroups, plus additional new groups not represented elsewhere.It seems that everyone wants their own lawn to play on rather than contributing to the health of an existing lawn. How often have you seen a post along the lines of “Hey, I set up a new forensics wiki! Come check it out and help it grow!” Or found yet another computer forensics LinkedIn group?
This leads to two related problems: Where do you post, and where do you go looking for information? I belong to a lot of the mailing lists and use my personal mail archive as a research tool when I have questions, but that doesn’t reach into the various web based forums. And if I want to post a question, where does it go? Some people blast every mailing list they’re on, hoping for an answer. And the more we balkanize, the more likely those questions are to go unanswered.
I still use FF and the CCE list mostly, but then there are items #2 an #3.
2) Web of trust.
When I joined the CCE list with certification #832. There’s no way I’d ever meet all 832 people, but by proxy, we knew of most people on the list. It was a small, tight community. Forensic Focus was similar – it was a place where we had a pretty good sense of most of the people posting, and most of the new people took some time to get up to speed on the community.I don’t know how many CCEs there are on the list now, but it seems that I know fewer of the people who are posting now that I did two years ago. People I used to see regularly on Forensic Focus are rarely seen, often replaced by very new people who are unfamiliar with the community. Many of these new posters seems to be looking for a solution to some university project. There are now people on the HTCC list posting anonymously.
3) Archiving, auditing, and reach of social media.
The growth in the number of forums, and the number of participants in those forums, greatly increases the number of potential employers, detractors, auditors, etc. Five years ago I felt pretty comfortable about asking stupid questions on the CCE list (a closed list) and even on Forensic Focus. Now, I’m very reluctant to ask anything that might display a lack of knowledge in an area where I am an expert.
We all know that none of us knows everything, and we’re all better for the support and feedback of our community. But when those questions can be spun, taken out of context, or turned back on us in some way, it makes us wonder if the potential downsides are worth it. Since there are almost always other people with the same question who aren’t speaking up, our failure to ask those questions means the entire community is worse off for these questions not being asked.
4) Pointing out that the Emperor might not be wearing any clothes is discouraged, actively and passively.
Some of this is due to “there but for the grace of God go I”, some due to over sensitivity to political correctness, some due to fear of legal action, and some due to fear of getting dragged into the mud. (“Never wrestle with a pig: You both get all dirty, and the pig likes it.)The end result is that bad information lingers in the community, bad behavior persists, and people get fed up and move on to other places to invest their time and energy. And once you lose people, getting them to come back is often very hard.
I know I’ve become far more of a content consumer than generator over the last few years, though I still go through bouts of trying to contribute. My solution was to grow a small group of people I can trust to bounce ideas off of and I’ll turn to them rather than the larger community.
I am poorer for this fragmentation, and if you aggregate the loss of many people such as myself, the community is poorer as well.
Back in November, we applied for funding through a BAA grant entitled ADAMS – Anomaly Detection At Massive Scales. We should find out if we won any funding some time this month. In the meantime, Fast Company found one of my partners and through him, me. The article stemming from those interviews can be found here. It’s worth a read.
Take a moment and do some research on the ADAMS problem. If you’ve any experience with ediscovery, or complex computer forensics cases, you might begin to think that you’ve seen this problem before on a smaller scale. Note that the ADAMS announcement specifies that the providers must provide test data – the providers need to prove that their products work in a controlled, instrumented environment before they’re released into the wild. Further, the people running the project must see the results before the solutions are accepted.
Hmm. What if we could do the same for ediscovery? What if you could have three vendors on site and compare them, on known data, head to head?
And, what if you could run known data through an ediscovery tool or process and accurately measure that process? What if, in so doing, you found that the process was flawed? If it is your process? Your vendor’s process? Your opponent’s process?
Oddly enough, we’re developing tools to help you answer some of those “What if”s.
In the course of the interview, I came up with an analogy for our process which the reporter captured quite well – we’re creating virtual crime scenes. Crime scenes that can be adjusted, wiped clean, rebuilt, or used over and over again. Further, we’re populating these entirely electronic crime scenes with real evidence – documents with accurate metadata, email messages with legitimate headers, SMS messages with topical content.
To digress a bit, the last item is the most difficult, and the most interesting. It is easy to sanitize existing content, and fairly easy to generate responsive content wrapped in digital noise, but can we create a reasonable approximation of human generated content, and keep it on topic? Can we create, out of the whole cloth, email conversations that appear to discuss a particular business topic in a manner that ensures they will be, or will not be, responsive to particular criteria?
No, not immediately, but we’re on the right path. And please don’t get too distracted by our desire to include natural language processing at some point as there is an enormous amount of value we can add now, and in the near future.
We already build virtual crime scenes, or digital corpus representing the corporate computing environment to be processed by ediscovery tools. And knowing how the corpus was built down to the last byte allows us to determine the accuracy of the ediscovery process, down to the last byte.
Stay tuned, interesting times are coming for the ediscovery world.