What exactly is malware and how does it work?
A technical overview of malicious software: taxonomy, classification dimensions, and modern detection pipelines, with a defender-first perspective
TL;DR
- Malware is any software intentionally designed to disrupt, damage, exfiltrate, or gain unauthorized control over systems.
- Classification can be done by propagation (worms), persistence (bootkits), payload (ransomware), stealth (rootkits), operator model (botnets), target (OT/ICS), and many more dimensions.
- Antivirus/endpoint protection engines combine static analysis (signatures, heuristics, ML) with dynamic analysis (emulation/sandboxing, behavior monitoring), reputation, and cloud-assisted verdicting.
- Evasion techniques (packing, polymorphism, process injection, API hashing, LOLBins) force defenders to normalize inputs, correlate telemetry, and focus on behavior and context, not strings alone.
1. Definitions and Scope
Malware (malicious software) refers to programs intentionally crafted to achieve unauthorized objectives such as code execution, persistence, lateral movement, data theft, disruption, or extortion. In modern enterprise environments, “malware” spans:
- User-mode binaries and scripts (PE/ELF/Mach-O, PowerShell, JS/VBS, HTA)
- Kernel components (drivers), bootkits, firmware implants
- Fileless techniques (registry-only, WMI, in-memory only)
- Living-off-the-land (LOLBins/LOLBAS) abuse of signed system tools
Security products commonly use the broader term “malicious content,” which also covers documents with exploit payloads, macro droppers, weaponized PDFs, malicious LNKs/shortcuts, and archive-based stagers.
2. A Practical Taxonomy of Malware
There is no single canonical taxonomy. Useful dimensions include behavior, objective, propagation, and stealth characteristics.
2.1 By Objective/Payload
- Ransomware: Encrypts or exfiltrates data to extort payment (e.g., double extortion).
- Infostealer: Harvests credentials, cookies, crypto wallets, browser data.
- Banker/Trojan: Targets financial transactions, injects web overlays, steals OTPs.
- Spyware: Persistent surveillance, keylogging, screenshots, clipboard capture.
- Wiper/Sabotage: Destroys data or corrupts systems to deny service.
- Backdoor/RAT: Provides remote C2 access, command execution, file ops.
- Cryptominer: Steals compute for crypto mining.
2.2 By Propagation/Delivery
- Worm: Self-propagates over networks via vulnerabilities or weak creds.
- Droppers/Downloaders: Initial stage that deploys additional payloads.
- Supply Chain: Abuse of update channels, signed packages, or CI/CD artifacts.
- Drive‑by/Exploit Kit: Browser/plug‑in exploitation for silent compromise.
2.3 By Stealth/Persistence
- Rootkit: Hides presence by altering OS internals (user or kernel mode).
- Bootkit: Persists in boot process (MBR/VBR/UEFI) to execute before OS.
- Fileless: Lives in memory/registry; relies on scripts and system tools.
- Signed/Sideloaded: Masquerades via valid signatures or DLL sideloading.
2.4 By Target/Platform
- Desktop OS (Windows/macOS/Linux), Mobile (Android/iOS), Cloud workloads (containers, serverless), OT/ICS, Network devices (routers, VPNs), and Firmware (UEFI/BIOS, BMCs).
2.5 By Operator Model
- Botnets: Large fleets under C2, often with modules/plugins.
- APT/Targeted: Hands‑on‑keyboard campaigns with specific objectives.
- Crimeware: Monetization-driven, commodity tooling and affiliates.
3. Classification Dimensions and Labels
Labeling a sample often requires multiple axes:
| Axis | Examples | Detection Signals |
|---|---|---|
| Payload | ransomware, infostealer, RAT, miner | crypto API usage, file rename/encrypt patterns, credential store access |
| Propagation | worm, spearphish, supply chain | SMB brute‑force, mass scanning, email attachments, update channel abuse |
| Stealth | packer, rootkit, fileless | high entropy sections, SSDT/inline hooks, WMI persistence, AMSI bypass attempts |
| Target | Windows, Linux, OT/ICS | platform-specific APIs, protocol usage (Modbus, DNP3), filesystem paths |
| Tactics/Techniques | MITRE ATT&CK | process injection, credential dumping, persistence mechanisms |
Additional practical labels:
- Packed/Obfuscated: High-entropy payloads, UPX/Custom packers, virtualized.
- Modular: Stage loaders, plugin frameworks, reflective DLLs.
- Operator Playbook: Time-of-day execution, geo-fencing, LOAB (living-off-allowed-binaries).
4. Delivery and Initial Access (Defensive View)
- Phishing/Malspam: Attachments (Office, ISO/IMG, LNK, ZIP) or links to staged payloads.
- Web Exploits: Browser/renderer RCE, drive-by downloads, SEO poisoning.
- Supply Chain: Compromised installers, trojanized dependencies, update servers.
- Lateral Movement as Delivery: Wormable vulns (e.g., SMB), stolen creds, RDP.
- Removable Media: USB autorun tricks are rarer but still seen in air‑gapped ops.
Defensive controls: attachment hardening, macro policies, Mark‑of‑the‑Web (MOTW) enforcement, browser isolation, EDR with real‑time content inspection, and application control.
5. Common Capabilities and TTPs (High‑Level)
- Environment Awareness: VM/sandbox checks, language/locale checks, user activity checks.
- Evasion: Packing, polymorphism/metamorphism, API hashing/import resolution at runtime, string encryption, indirect syscalls.
- Privilege and Persistence: UAC bypass attempts, scheduled tasks, services, run keys, WMI subscriptions, startup folders.
- Credential Access: LSASS memory access, browser DBs, keychains, token theft.
- Lateral Movement: SMB/WinRM/PSRemoting, RDP, stolen tickets, named pipes.
- Exfiltration/Comms: HTTPS with domain fronting, DNS tunneling, Tor, dead‑drop resolvers.
These are detection surfaces. Defenders should focus on pre-execution controls, real-time telemetry, and post-execution correlation.
6. How Antivirus and Endpoint Protection Work
Modern engines combine several layers. A typical pipeline:
6.1 Normalization and Unpacking
- Container and archive traversal (ZIP/RAR/7z, nested archives).
- Portable Executable (PE) parsing; section entropy and import tables.
- Packer/cryptor emulation and stub detonation (UPX/custom packers).
- Macro and script de‑obfuscation (VBA, JS, PowerShell), string decoding.
6.2 Static Analysis
- Signature Matching: Exact byte patterns and wildcard masks over normalized content.
- Heuristics: Expert-crafted rules (e.g., suspicious PE flags, import mixes, section anomalies).
- ML/Statistical: N‑gram features, byte histograms, metadata embeddings, tree/NN models.
- Reputation/Prevalence: First‑seen/rare file, signer reputation, download source.
6.3 Dynamic/Behavioral Analysis
- Sandboxing/Emulation: Observe API calls, filesystem/registry ops, process tree.
- Real‑Time Monitoring (EDR): Kernel callbacks, ETW, AMSI, script block logging.
- Policy/Rules: Block on suspicious behaviors (e.g., mass file renames + crypto APIs).
6.4 Cloud-Assisted Verdicting
- Hash lookup (MD5/SHA‑2/SHA‑256), fuzzy hashes (SSDEEP), similarity.
- Detonation at scale, reputation graph, URL/domain intelligence.
- Model offloading for heavier analysis and quick signature distribution.
6.5 Response Actions
- Block/Quarantine, Kill Process, Rollback (volume shadow copies), Network isolate.
- Alerting and ticketing; auto‑tuning policies to reduce false positives.
7. Detection Engineering: Rules and Telemetry
YARA rules are common for hunting and detection on files/memory. Example (educational):
rule Suspicious_Packed_PE_Heuristics { meta: description = "High-entropy sections and uncommon characteristics" author = "defender" strings: $mz = { 4D 5A } condition: uint16(0) == 0x5A4D and pe.number_of_sections > 4 and for any i in (0..pe.number_of_sections - 1): (pe.sections[i].entropy > 7.2) }
Behavioral detections leverage process trees, command lines, registry/FS ops, and network. Map rules to MITRE ATT&CK for common TTPs (e.g., T1055 Process Injection, T1059 Command and Scripting Interpreter).
8. Evasion Techniques and Defensive Countermeasures
- Packing/Obfuscation → Counter: Strong unpacking/normalization, entropy heuristics, code-similarity.
- Polymorphism/Metamorphism → Counter: Structural features, behavior correlation, similarity search.
- Fileless/Scripted → Counter: AMSI integration, script block logging, constrained language mode.
- Process Injection/Hollowing → Counter: ETW/KCFG telemetry, memory scanners, API hooking policies.
- LOLBins/LOLBAS → Counter: Application control, parent-child policy, signed binary restrictions.
- DGA/C2 Malleability → Counter: DNS analytics, JA3/JA3S fingerprinting, beaconing frequency analysis.
9. Limits and Trade-offs
- False Positives/Negatives: Tuning thresholds is delicate; layered detections help.
- Performance: On-access scanning must be fast; heavy analysis offloaded to cloud or post-execution.
- Privacy: Content scanning vs. data minimization; jurisdictional constraints.
- Adversarial Pressure: Models and heuristics degrade as actors adapt; continuous retraining is required.
10. Practical Defender Checklist
- Enforce MOTW and block macros by default; harden archives and attachments.
- Maintain EDR with AMSI and ETW telemetry; enable script block logging.
- Use application control for high-risk endpoints; restrict LOLBins usage.
- Monitor for mass file operations + crypto APIs (ransomware pre‑encryption stage).
- Keep signatures/engines up-to-date; leverage cloud reputation and detonation.
- Build detections around behaviors and context, not strings alone.
References (Further Reading)
- MITRE ATT&CK: https://attack.mitre.org/
- YARA: https://yara.readthedocs.io/
- Windows Internals (Pavel Yosifovich et al.)
- Practical Malware Analysis (Sikorski & Honig)
- NIST SP 800‑83 Malware Incident Prevention and Handling
Disclaimer: This post is for educational and defensive purposes. It avoids operational details that would facilitate creating or deploying malware.