Cloud Blog: Empowering Gemini for Malware Analysis with Code Interpreter and Google Threat Intelligence

Nov 19, 2024

—

Source URL: https://cloud.google.com/blog/topics/threat-intelligence/gemini-malware-analysis-code-interpreter-threat-intelligence/
Source: Cloud Blog
Title: Empowering Gemini for Malware Analysis with Code Interpreter and Google Threat Intelligence

Feedly Summary: One of Google Cloud’s major missions is to arm security professionals with modern tools to help them defend against the latest threats. Part of that mission involves moving closer to a more autonomous, adaptive approach in threat intelligence automation.
In our latest advancements in malware analysis, we’re equipping Gemini with new capabilities to address obfuscation techniques and obtain real-time insights on indicators of compromise (IOCs). By integrating the Code Interpreter extension, Gemini can now dynamically create and execute code to help deobfuscate specific strings or code sections, while Google Threat Intelligence (GTI) function calling enables it to query GTI for additional context on URLs, IPs, and domains found within malware samples. These tools are a step toward transforming Gemini into a more adaptive agent for malware analysis, enhancing its ability to interpret obfuscated elements and gather contextual information based on the unique characteristics of each sample.
Building on this foundation, we previously explored critical preparatory steps with Gemini 1.5 Pro, leveraging its expansive 2-million-token input window to process substantial sections of decompiled code in a single pass. To further enhance scalability, we introduced Gemini 1.5 Flash, incorporating automated binary unpacking through Mandiant Backscatter before the decompilation phase to tackle certain obfuscation techniques. Yet, as any seasoned malware analyst knows, the true challenge often begins once the code is exposed. Malware developers frequently employ obfuscation tactics to conceal critical IOCs and underlying logic. Malware may also download additional malicious code, making it challenging to fully understand the behavior of a given sample.
For large language models (LLMs), obfuscation techniques and additional payloads create unique challenges. When dealing with obfuscated strings such as URLs, IPs, domains, or file names, LLMs often “hallucinate” without explicit decoding methods. Additionally, LLMs cannot access, for example, URLs that host additional payloads, often resulting in speculative interpretations about the sample’s behavior.
To help with these challenges, Code Interpreter and GTI function calling tools provide targeted solutions. Code Interpreter enables Gemini to autonomously create and execute custom scripts, as needed, using its own judgment to decode obfuscated elements within a sample, such as strings encoded with XOR-based algorithms. This capability minimizes interpretation errors and enhances Gemini’s ability to reveal hidden logic without requiring manual intervention.
Meanwhile, GTI function calling expands Gemini’s reach by retrieving contextualized information from Google Threat Intelligence on suspicious external resources such as URLs, IPs, or domains, providing verified insights without speculative guesses. Together, these tools equip Gemini to better handle obfuscated or externally hosted data, bringing it closer to the goal of functioning as an autonomous agent for malware analysis.
To illustrate how these enhancements boost Gemini’s capabilities, let’s look at a practical example. In this case, we’re analyzing a PowerShell script that contains an obfuscated URL that hosts a second-stage payload. This particular sample was previously analyzed with some of the most advanced publicly available LLM models, which incorporate code generation and execution as part of their reasoning process. Despite these capabilities, each model “hallucinated,” generating completely fabricated URLs instead of accurately revealing the correct one.

Obfuscated PowerShell code sample to be analyzed by Gemini

Utilizing Code Interpreter and GTI function calling as part of its reasoning process, Gemini autonomously generated the following report without any human intervention. When deemed necessary, it applies these tools to process and extract additional information from the sample.

Final Report: PowerShell Malware with URL Deobfuscation
This report analyzes a PowerShell script designed to download and execute a file from an obfuscated URL. The script employs a custom obfuscation routine resembling the RC4 stream cipher to conceal the malicious URL.
Functionality:

Obfuscation Routine: The script defines a function F that performs byte-level XOR encryption using a key derived from the string “tox2". This function is used to deobfuscate the malicious URL.

URL Decryption: The obfuscated URL, stored as a byte array, is decrypted using the F function with "tox2" as the key.

File Download: The script downloads a file from the decrypted URL (hxxps://filedn[.]eu/lODWTgN8sswHA6Pn8HXWe1J/tox2/Scan_docs%2398097960[.]msi) and saves it to the system’s temporary directory under a randomly generated filename.

Execution: Finally, the script executes the downloaded file.

Indicators of Compromise (IOCs):

URL: hxxps://filedn[.]eu/lODWTgN8sswHA6Pn8HXWe1J/tox2/Scan_docs%2398097960[.]msi

Deobfuscation Code:
def deobfuscate(x, y):
s = list(range(256))
j = 0
k = [ord(c) for c in y]
for i in range(256):
j = (j + s[i] + k[i % len(k)]) % 256
s[i], s[j] = s[j], s[i]
i = 0
j = 0
output = []
for m in range(len(x)):
i = (i + 1) % 256
j = (j + s[i]) % 256
s[i], s[j] = s[j], s[i]
output.append(x[m] ^ s[(s[i] + s[j]) % 256])
return output

a = [0x25, 0x7B, 0x5A, 0xD8, 0x9C, 0xFF, 0xC3, 0xA4, 0x95, 0xB7,
0x9D, 0x6E, 0x5F, 0x4D, 0x49, 0xFB, 0x78, 0x32, 0xB8, 0x17, 0x72,
0x01, 0x31, 0xFB, 0xC0, 0x58, 0x45, 0xCB, 0x5C, 0x3C, 0xDD, 0x00,
0x60, 0x87, 0x25, 0xB2, 0x44, 0xA6, 0xC5, 0xEA, 0x71, 0xAA, 0x9D,
0xA1, 0x05, 0x5F, 0xBD, 0x41, 0x28, 0x70, 0x30, 0x6B, 0x96, 0x24,
0x92, 0xA0, 0x9D, 0x92, 0x23, 0x79, 0x69, 0x2C, 0x1B, 0x1E, 0xEB,
0x69, 0x8F, 0x04, 0x10, 0x74, 0xBC]
b = "tox2"
c = deobfuscate(a, b)
decoded_url = "".join(chr(i) for i in c)
print(decoded_url)
Google Threat Intelligence (GTI) Lookup:
The GTI lookup provides valuable context regarding the identified IOC:

Threat Actor: UNC5687, known for using MESHAGENT in phishing campaigns impersonating the Security Service of Ukraine.

Campaign: This specific URL is linked to a campaign spreading MESHAGENT via phishing emails with malicious links. The campaign involved a captcha page leading to the download of either a malicious MSI or EXE file.

Malware: The campaign delivers MESHAGENT, an open-source remote access framework, configured to communicate with C2 domains linked to a service called "AnonVNC."

Conclusion:
The analyzed PowerShell script is confirmed to be malicious, associated with a phishing campaign by UNC5687 distributing the MESHAGENT remote access tool.

Gemini identified that the script employs an XOR-based obfuscation algorithm that resembles RC4 to conceal the download URL. Recognizing this pattern, Gemini autonomously generates and executes a Python deobfuscation script within the Code Interpreter sandbox, successfully revealing the external resource.
With the URL in hand, Gemini then utilizes GTI function calling to query Google Threat Intelligence for further context. This analysis links the URL to UNC5687, a threat cluster known for using a remote access tool in phishing campaigns impersonating the Security Service of Ukraine.
As we’ve seen, the integration of these tools has strengthened Gemini’s ability to function as a malware analyst capable of adapting its approach to address obfuscation and gathering vital context on IOCs. By incorporating the Code Interpreter and GTI function calling, Gemini is better equipped to navigate complex samples by autonomously interpreting hidden elements and contextualizing external references.
While these are significant advancements, many challenges remain, especially given the vast diversity of malware and scenarios that exist in the threat landscape. We’re committed to making steady progress, and future updates will continue to enhance Gemini’s capabilities, moving us closer to a more autonomous, adaptive approach in threat intelligence automation.

AI Summary and Description: Yes

Summary: The text discusses Google’s advancements in threat intelligence automation through the smart integration of its Gemini platform. Noteworthy enhancements include the ability to process malware obfuscation techniques and facilitate real-time insights using unique capabilities like the Code Interpreter and Google Threat Intelligence function calling, all aimed at enhancing the efficiency of malware analysis.

Detailed Description:
The provided text outlines significant improvements made by Google Cloud to enhance its security tools, specifically through its Gemini platform aimed at tackling modern malware threats. The discussion revolves around advanced techniques in malware analysis, particularly regarding obfuscation and the real-time extraction of indicators of compromise (IOCs).

Key Points:
– **Security Mission**: Google Cloud’s primary goal is to equip security professionals with modern tools to combat the evolving threats.
– **Gemini’s Innovation**: Expansion of Gemini’s capabilities helps in analyzing malware, specifically targeting obfuscation techniques that malware often utilizes to conceal malicious intentions.
– **Code Interpreter Integration**:
– Enables Gemini to dynamically generate and execute code for decoding obfuscation, leading to reduced interpretation errors.
– Enhances autonomous functioning in analyzing and understanding complex obfuscated samples.
– **Google Threat Intelligence (GTI)**:
– GTI function calling allows Gemini to enrich malware analysis with contextual information about suspicious URLs, IP addresses, or domains.
– Improves accuracy and reliability when identifying potential threats linked to malware samples.
– **Challenges with LLMs**:
– Discussion around how large language models (LLMs) struggle with obfuscation and additional payloads in malware analysis, often leading to “hallucinations.”
– The integration of Code Interpreter and GTI is positioned as a solution to these challenges, providing a more accurate analysis.
– **Real-world Example**:
– An analysis of a PowerShell script associated with a phishing campaign by contagiously showcasing how the new capabilities allowed successful deobfuscation and retrieval of valuable threat context.
– Confirmation of the script’s malicious nature and connection to a known threat actor, UNC5687, illuminates how the enhancements perform in a practical setting.
– **Future Commitment**: Google emphasizes the ongoing development of Gemini toward a more autonomous and adaptive model for threat intelligence that continues to innovate and respond to emerging security challenges.

In sum, these advancements signal a notable step toward a more intelligent approach in handling cybersecurity threats, catering specifically to the rapid complications posed by obfuscation in malware. Security professionals can leverage these tools to augment their malware analysis capabilities, streamline workflows, and ultimately enhance the organization’s security posture.

2 4 access accuracy Act advancement advancements agent AGI AI algorithm algorithms analysis anti API ARM art as Auto automation autonomous agent Behavior by C C2 capabilities CAPTCHA challenges Cloud code code generation Code Interpreter coding compilation Context critical cyber Cybersecurity cybersecurity threat cybersecurity threats D data decompilation decryption DeFi deobfuscation design developer developers development diversity e efficiency email encryption end errors EU evolving threats execution External fine fines framework function calling functionality Gemini Gemini 1.5 Gemini 1.5 Pro Gemini platform Gen generation Go Google Google Cloud Google Threat Intelligence hallucination hallucinations http HTTPS human in indicators of compromise information innovation insights integration Intel intelligence inter IoCs iOS ite k l language language model language models large language model large language models liability Link llm llms lm logic low making malicious code malware malware analysis Mandiant mission ML model models no non NPU o obfuscation obfuscation techniques of on oost open open-source organization ory phi phishing phishing campaign phishing campaigns phishing emails post Power PowerShell professionals Progress Py Python Ray RCE real real-time reasoning reasoning process reliability Remote Access resources s s Position sandbox scalability sec security security challenges security posture security professionals security threat security threats security tools service shell script Sig Signal SoC source SSE system T tactics threat actor threat intelligence threat landscape threats to token tools Tor trie up update updates Wind workflows x