Dissecting a Blackhole 2 PDF (mostly) with peepdf.
I’m fairly new to malware analysis having spent most of the last ten years doing IT consulting, computer forensics, ediscovery, and some related work. I’m now doing a lot of incident response and am taking on some malware analysis responsibilities, at least on a triage and management level.
We got phished the other day, and a rather nice phish it was. Kudos to the mail team for shutting it down quickly and to an alert user who escalated it as well. Some quick dynamic analysis led to a PDF, and there our story starts.
If you open the PDF up you’ll get a rectangle and possibly an error message.
So what do we as analysts do with this? There are a lot of analysis tools out there and I worked my way through quite a few of them, partly just to see what worked, and how they worked. The one I ended up using was peepdf. I barely scratched the surface of its capabilities, but the following features sold me on it for this project:
- It handled the malformed references that many other PDF analysis tools failed to handle.
- It is command line based and scriptable. If you develop a peepdf workflow, put that into a file and execute it each time
- Search raw and decoded objects
- Spidermonkey built in
So, I loaded my malicious PDF into peepdf and got the following output:
s = s + char(str(int(concat(b1, b2), 0x1a)))
It is building a string up by concatenating pairs of bytes, converting that to an integer with radix 0x1a, and converting that to a string and then into a character.
So that is the second part, a decryption routine, but it needs something to decrypt. I guessed that the first part located the stuff to decrypt and that it found it using the keyword “creation date”. So, how to find that stuff? Back to peepdf:
The search for “creation date” didn’t turn up anything, but searching for “creation” produced hits in object 3 and object 43. We’re already looking at object 43 so let us see what is in object 3. Lo and behold, there is CreationDate and a lot of … stuff. Working on the assumption that the code in object 43 will decode the stuff in object 3, I proceeded as follows. (Yes, there is probably a way to do this all within peepdf, but I’m still learning how to use spidermonkey properly so I took this route.)
First, dump object 3 out.
PPDF> object 3 > object3.txt
Write a Python version of the code in object 43:
Strip the noise off the front of object3.txt (“<< /Title asdasdsad/CreationDate %#^&*%^#@&%#@3”) and then run the Python code against the object 3 stuff we saved earlier:
> jsparse.py -f object3.txt > object3.js
And then jump back into peepdf and clean up the newly created code:
PPDF> js_beautify file object3.txt > object3-clean.js
This illustrates one of the things that I love about peepdf – it includes a lot of very useful functionality in the application so you don’t need to jump in and out of the tool all the time. (My foray into Python is due to my own issues and not peepdf’s.)
object3-clean.js now contains the second stage of the malicious PDF.
This was a pretty high level run through of a relatively simple problem done by someone rather new to the subject, but hopefully it left you with the confidence to dive into this sort of thing yourself. There are a lot of good tools out there, lots of examples to work on, and many good people to help you out. (Tip of the hat to Willi and to the folks from rem-alumni.)