name

malware-analysis

description

Professional malware analysis workflow for PE executables and suspicious files. Triggers on file uploads with requests like "analyze this malware", "analyze this sample", "what does this executable do", "check this file for malware", or any request to examine suspicious files. Performs static analysis, threat intelligence triage, behavioral inference, and produces analyst-grade reports with reasoned conclusions.

Malware Analysis Skill

This skill produces analyst-grade threat reports — not data dumps. Every conclusion must be backed by evidence and reasoning.

Core Principles

Evidence-based reasoning: Never state a conclusion without explaining WHY
Connect the dots: Link indicators to behaviors to capabilities to impact
Assess confidence: State how confident you are and why
Actionable output: Reports should enable decisions, not just inform

Analysis Workflow

Step 1: Collect Data

Run all scripts to gather raw data:

# Static analysis - get hashes, PE info, strings, APIs, entropy
python3 scripts/static_analysis.py /path/to/sample -f json > static.json

# Threat intelligence - check reputation across sources
python3 scripts/triage.py -t file /path/to/sample -f json > triage.json

# IOC extraction - extract network/host indicators
python3 scripts/extract_iocs.py /path/to/sample -f json > iocs.json

Step 2: Analyze and Reason (THIS IS THE KEY STEP)

Using the collected data, perform analyst-grade reasoning:

2.1 Threat Intelligence Assessment

Ask yourself:

Is this sample known? If found in MalwareBazaar/ThreatFox, it's confirmed malware
What's the VT detection rate?
- 0 detections: New sample, FP, or clean — requires behavioral analysis
- 1-5 detections: Possibly new variant or targeted — suspicious
- 5-15 detections: Confirmed malicious by multiple vendors
- 15+ detections: Well-known malware
What family is it attributed to? Research that family's typical behavior
When was it first seen? Recent = active campaign

Always explain your reasoning:

"This sample is identified as RedLine Stealer by MalwareBazaar with 45/70 VT detections. The high detection rate and presence in curated malware repositories confirms this is a known threat, not a false positive."

2.2 Behavioral Analysis from Static Indicators

API Analysis - Map APIs to behaviors:

API Pattern	Likely Behavior	Reasoning
VirtualAlloc + VirtualProtect + WriteProcessMemory + CreateRemoteThread	Process Injection	This is the classic injection pattern: allocate memory, make it executable, write code, execute in target
CredEnumerate, CryptUnprotectData	Credential Theft	These APIs specifically access Windows credential stores and DPAPI-protected data (browser passwords)
InternetOpen + URLDownloadToFile	Downloader	Initializes HTTP and downloads files — classic dropper behavior
RegSetValueEx + Run key paths in strings	Persistence	Writing to Run keys ensures execution at startup
IsDebuggerPresent, GetTickCount, NtQuerySystemInformation	Anti-Analysis	Multiple evasion checks suggest the malware hides its behavior during analysis
CryptEncrypt + file enumeration APIs	Possible Ransomware	Encryption capability combined with file discovery — but could also be secure C2

Always explain your reasoning:

"The presence of VirtualAlloc, VirtualProtect, and CreateRemoteThread together strongly suggests process injection capability. Individually these APIs have legitimate uses, but this specific combination is the textbook pattern for injecting code into other processes."

Packing Analysis:

Indicator	Meaning	Confidence
Entropy > 7.0	Compressed/encrypted content	High
Section entropy > 7.0 (especially .text)	Packed code section	High
UPX0, UPX1, .aspack, .packed sections	Known packer signatures	Very High
RWX sections	Self-modifying code	Medium
Small import table with GetProcAddress/LoadLibrary only	Dynamic API resolution	High

If packed, state the implication:

"This sample shows multiple packing indicators (entropy 7.4, UPX sections). The static analysis findings represent the unpacker stub, NOT the actual payload. Dynamic analysis is required to reveal true functionality."

2.3 Capability Assessment

Based on the evidence, determine what the malware CAN DO:

Capability	Required Evidence	Confidence Level
Process Injection	2+ injection APIs	High if 3+, Medium if 2
Credential Theft	Any cred access API	High (these are specific)
Keylogging	SetWindowsHookEx	Medium (has legit uses)
Network C2	2+ network APIs + extracted URLs/IPs	High
File Download	URLDownloadToFile or similar	High
Persistence	Registry/service APIs + relevant strings	Medium
Encryption/Ransomware	Crypto APIs + file enumeration	Medium (needs context)

State confidence and reasoning:

"Credential Theft Capability: HIGH CONFIDENCE — CryptUnprotectData is present, which specifically decrypts DPAPI-protected data including browser passwords. This API has no legitimate use case in most software."

2.4 Risk Assessment

Determine risk level with justification:

Risk Level	Criteria
CRITICAL	Credential theft APIs, process injection, confirmed malware family known for data theft/ransomware
HIGH	Multiple malicious capabilities, network C2, persistence mechanisms
MEDIUM	Suspicious indicators but no confirmed malicious capability, or packing hiding true behavior
LOW	Few indicators, possibly legitimate software with suspicious patterns
UNKNOWN	Insufficient evidence, heavily packed, or no TI hits

Step 3: Write the Report

Structure your report as follows:

# Threat Analysis Report: [MALWARE_NAME or "Unknown Sample"]

| | |
|---|---|
| **Risk Level** | [CRITICAL/HIGH/MEDIUM/LOW] |
| **Confidence** | [High/Medium/Low] |
| **Analysis Date** | [DATE] |

---

## Executive Summary

[2-3 sentences: What is this? Is it malicious? What can it do? How do we know?]

**Key Finding:** [One sentence bottom line]

---

## Threat Intelligence Assessment

[What do TI sources tell us? Explain what each finding means]

- **VirusTotal:** [X/Y detections] — [what this means]
- **MalwareBazaar:** [Found/Not found] — [what this means]  
- **Family Attribution:** [Family name] — [what this family typically does]

**Assessment:** [Your reasoned conclusion based on TI]

---

## Behavioral Analysis

### Identified Capabilities

#### [Capability 1: e.g., "Process Injection"]
- **Confidence:** [High/Medium/Low]
- **Evidence:** [List the specific APIs/strings found]
- **Reasoning:** [Explain WHY this evidence indicates this capability]

#### [Capability 2: e.g., "Credential Theft"]
...

### Packing Assessment

[Is it packed? What does this mean for the analysis?]

### Anti-Analysis Techniques

[What evasion techniques were identified?]

---

## MITRE ATT&CK Mapping

| Tactic | Technique | ID | Evidence |
|--------|-----------|----|---------| 
| [Only include techniques you can justify with evidence] |

---

## Indicators of Compromise

### File Indicators
[Hashes]

### Network Indicators  
[Defanged IPs, domains, URLs - only if extracted]

### Host Indicators
[Registry keys, file paths, mutexes - only if found]

---

## Risk Assessment

**Overall Risk: [LEVEL]**

This assessment is based on:
1. [Reason 1]
2. [Reason 2]
3. [Reason 3]

**Confidence in Assessment: [High/Medium/Low]**
- [Why this confidence level]

---

## Recommendations

### Immediate Actions
[What should be done RIGHT NOW based on risk level]

### Detection Opportunities
[How to detect this threat]

### Further Analysis Needed
[What questions remain unanswered]

Entropy Interpretation

Entropy	Meaning
0-1	Highly structured (empty, repetitive)
4-5	Plain text, readable strings
5-6	Compiled code (normal .text section)
6-7	Compressed data, some obfuscation
7-8	Encrypted/compressed (PACKED)

File Signatures

Bytes	Type
4D 5A (MZ)	PE executable
50 4B (PK)	ZIP/Office document
7F 45 4C 46	ELF executable
D0 CF 11 E0	OLE/Legacy Office
25 50 44 46	PDF

Example Analysis Reasoning

BAD (data dump):

"Found APIs: VirtualAlloc, CreateRemoteThread, RegSetValueEx. Entropy: 7.2. VT: 34/70."

GOOD (analyst reasoning):

"This sample demonstrates process injection capability (HIGH CONFIDENCE) based on the presence of VirtualAlloc and CreateRemoteThread. These APIs, when used together, form the classic code injection pattern where memory is allocated in a target process and a thread is created to execute the injected code. The high entropy (7.2) suggests the payload is packed, meaning the observed APIs may belong to the unpacker stub rather than the final payload. The 34/70 VirusTotal detection rate confirms this is recognized malware, with multiple vendors identifying it as a variant of Agent Tesla — an info-stealer known for credential harvesting. Given the injection capability and association with a credential-stealing family, this sample poses a CRITICAL risk to credential security on any system where it executes."

Scripts Reference

static_analysis.py

python3 scripts/static_analysis.py <file> -f [text|json]

Extracts: hashes, file type, PE headers, sections, entropy, imports, strings, suspicious indicators

triage.py

python3 scripts/triage.py <ioc> -f [text|json]
python3 scripts/triage.py -t file <filepath> -f json
python3 scripts/triage.py --status  # Check API config

Queries: MalwareBazaar, ThreatFox, URLhaus, VirusTotal, AbuseIPDB

extract_iocs.py

python3 scripts/extract_iocs.py <file> -f [text|json|csv]

Extracts: IPs, domains, URLs, emails, hashes, registry keys, file paths, crypto wallets, mutexes