A lightweight capability tagger for Java malware — a "capa-lite" for the JVM.
Mandiant's capa is great, but it does not
support Java bytecode (it handles PE, ELF, .NET, shellcode, and sandbox traces — not
.class / JAR files). capa-java fills that gap: it runs a rule set of Java API and
string signatures over a sample and tells you which class shows which capability.
It is a triage aid — heuristic, not proof. It points you at the interesting classes fast; you confirm the hits by reading the cited line.
- No dependencies. Pure Python standard library. Clone and run.
- Two input modes. Point it at a decompiled
.javasource tree (richest results), or straight at a.jar— in JAR mode it reads printable strings out of each.classconstant pool, so you don't even need to decompile first. - Survives string obfuscation. Malware can encrypt string literals (
"Login Data",schtasks), but it cannot hide API/type references (java.awt.Robot,Cipher.getInstance,com.sun.jna...) — those must stay linkable in the constant pool. Every rule is labelledapi(survives obfuscation) orstr(does not), so you know how much to trust each hit. - MITRE-style capability names.
persistence/scheduled-task,creds/browser-chromium,c2/reverse-shell,collection/keylogging, and ~50 more.
git clone https://github.com/boredchilada/capa-java.git
cd capa-javaRequires Python 3.8+. No pip install needed.
# Run on a decompiled .java tree (richest), or straight on a .jar
python3 java_capa.py <dir-or-jar>
# Show the matched string + kind (api/str) and notable imports per class
python3 java_capa.py <dir-or-jar> --evidence --imports
# Only list classes with at least N capabilities
python3 java_capa.py <dir-or-jar> --min 3| Flag | Effect |
|---|---|
--evidence |
Show a sample match, its kind (api/str), and class:line per capability |
--imports |
List each class's notable API imports (the survives-obfuscation signal) |
--min N |
Only print classes with >= N capabilities |
Three sections:
- Summary — classes scanned, how many carry a capability, distinct capabilities seen.
- Class → capabilities — every flagged class, sorted by capability count, with evidence and imports when requested.
- Capability → classes — the inverse index, so you can jump straight to "which classes do persistence?"
See examples/strrat-sample-output.txt for real
output against a STRRAT sample.
For each unit (.java file or .class entry) the tool builds the raw text plus a
slash→dot normalized copy, so .class references like java/awt/Robot also match dotted
API patterns while raw forms like cmd /c stay intact. Each capability is a list of
(kind, regex) pairs; the first match wins and is recorded with its line number.
The whole rule set is the RULES dict at the top of java_capa.py.
Add a capability or a signature like this:
RULES = {
# ...
"persistence/scheduled-task": [
("str", r"schtasks"), # literal string — defeated by obfuscation
("str", r"/sc minute"),
],
"crypto/cipher": [
("api", r"Cipher\.getInstance"), # API ref — survives obfuscation
("api", r"AES/"),
],
}api rules key on imported APIs/types and survive string-literal obfuscation; str rules
key on literal strings and do not. Patterns are case-insensitive regexes.
- Heuristic by design — expect false positives and false negatives. Always confirm a hit by reading the cited class and line.
- In JAR mode, the "line number" is the index within the per-class string dump, not a source line. Decompile for true source lines.
- Rules currently lean toward Windows-targeting commodity Java RATs/stealers.
This tool only reads samples (string extraction and regex matching) — it never
executes anything. That said, the inputs are live malware: analyse them in an isolated VM,
and never commit real samples to a repo (see .gitignore).