I started in the digital forensics community about five years ago, and I already feel old, and I am a Johnny-come-lately. This post may come off as a “Hey, you kids, get offa my lawn!” rant. Rather than a rant, I really hope that people start talking about a way to find a small number of safe lawns for all the kids to play on.
In those five years I’ve noticed that the computer forensics community has become *less* supportive, not more supportive. This runs contrary to trends to other communities such as software engineering tools, web frameworks, and startups. I have some feelings and thoughts on why this is. I wish I had some good ideas on how to turn this trend around.
I think there are four major problems:
1) Fragmentation of the sites supporting the community.
When I showed up, there was Forensic Focus, the CCE list, and HTCIA. (And other people probably had their three or four sources that don’t overlap with mine.) Now, I’ve got Forensic Focus, CCE, HTCIA, HTCC, DFCB, wn4n6s, and a host of OS and tool specific sites. Then there is LinkedIn, with an almost one to one mapping of all the external groups, plus subgroups, plus additional new groups not represented elsewhere.It seems that everyone wants their own lawn to play on rather than contributing to the health of an existing lawn. How often have you seen a post along the lines of “Hey, I set up a new forensics wiki! Come check it out and help it grow!” Or found yet another computer forensics LinkedIn group?
This leads to two related problems: Where do you post, and where do you go looking for information? I belong to a lot of the mailing lists and use my personal mail archive as a research tool when I have questions, but that doesn’t reach into the various web based forums. And if I want to post a question, where does it go? Some people blast every mailing list they’re on, hoping for an answer. And the more we balkanize, the more likely those questions are to go unanswered.
I still use FF and the CCE list mostly, but then there are items #2 an #3.
2) Web of trust.
When I joined the CCE list with certification #832. There’s no way I’d ever meet all 832 people, but by proxy, we knew of most people on the list. It was a small, tight community. Forensic Focus was similar – it was a place where we had a pretty good sense of most of the people posting, and most of the new people took some time to get up to speed on the community.I don’t know how many CCEs there are on the list now, but it seems that I know fewer of the people who are posting now that I did two years ago. People I used to see regularly on Forensic Focus are rarely seen, often replaced by very new people who are unfamiliar with the community. Many of these new posters seems to be looking for a solution to some university project. There are now people on the HTCC list posting anonymously.
3) Archiving, auditing, and reach of social media.
The growth in the number of forums, and the number of participants in those forums, greatly increases the number of potential employers, detractors, auditors, etc. Five years ago I felt pretty comfortable about asking stupid questions on the CCE list (a closed list) and even on Forensic Focus. Now, I’m very reluctant to ask anything that might display a lack of knowledge in an area where I am an expert.
We all know that none of us knows everything, and we’re all better for the support and feedback of our community. But when those questions can be spun, taken out of context, or turned back on us in some way, it makes us wonder if the potential downsides are worth it. Since there are almost always other people with the same question who aren’t speaking up, our failure to ask those questions means the entire community is worse off for these questions not being asked.
4) Pointing out that the Emperor might not be wearing any clothes is discouraged, actively and passively.
Some of this is due to “there but for the grace of God go I”, some due to over sensitivity to political correctness, some due to fear of legal action, and some due to fear of getting dragged into the mud. (“Never wrestle with a pig: You both get all dirty, and the pig likes it.)The end result is that bad information lingers in the community, bad behavior persists, and people get fed up and move on to other places to invest their time and energy. And once you lose people, getting them to come back is often very hard.
I know I’ve become far more of a content consumer than generator over the last few years, though I still go through bouts of trying to contribute. My solution was to grow a small group of people I can trust to bounce ideas off of and I’ll turn to them rather than the larger community.
I am poorer for this fragmentation, and if you aggregate the loss of many people such as myself, the community is poorer as well.
Back in November, we applied for funding through a BAA grant entitled ADAMS – Anomaly Detection At Massive Scales. We should find out if we won any funding some time this month. In the meantime, Fast Company found one of my partners and through him, me. The article stemming from those interviews can be found here. It’s worth a read.
Take a moment and do some research on the ADAMS problem. If you’ve any experience with ediscovery, or complex computer forensics cases, you might begin to think that you’ve seen this problem before on a smaller scale. Note that the ADAMS announcement specifies that the providers must provide test data – the providers need to prove that their products work in a controlled, instrumented environment before they’re released into the wild. Further, the people running the project must see the results before the solutions are accepted.
Hmm. What if we could do the same for ediscovery? What if you could have three vendors on site and compare them, on known data, head to head?
And, what if you could run known data through an ediscovery tool or process and accurately measure that process? What if, in so doing, you found that the process was flawed? If it is your process? Your vendor’s process? Your opponent’s process?
Oddly enough, we’re developing tools to help you answer some of those “What if”s.
In the course of the interview, I came up with an analogy for our process which the reporter captured quite well – we’re creating virtual crime scenes. Crime scenes that can be adjusted, wiped clean, rebuilt, or used over and over again. Further, we’re populating these entirely electronic crime scenes with real evidence – documents with accurate metadata, email messages with legitimate headers, SMS messages with topical content.
To digress a bit, the last item is the most difficult, and the most interesting. It is easy to sanitize existing content, and fairly easy to generate responsive content wrapped in digital noise, but can we create a reasonable approximation of human generated content, and keep it on topic? Can we create, out of the whole cloth, email conversations that appear to discuss a particular business topic in a manner that ensures they will be, or will not be, responsive to particular criteria?
No, not immediately, but we’re on the right path. And please don’t get too distracted by our desire to include natural language processing at some point as there is an enormous amount of value we can add now, and in the near future.
We already build virtual crime scenes, or digital corpus representing the corporate computing environment to be processed by ediscovery tools. And knowing how the corpus was built down to the last byte allows us to determine the accuracy of the ediscovery process, down to the last byte.
Stay tuned, interesting times are coming for the ediscovery world.