capa-java

A lightweight capability tagger for Java malware — a "capa-lite" for the JVM.

Mandiant's capa is great, but it does not support Java bytecode (it handles PE, ELF, .NET, shellcode, and sandbox traces — not .class / JAR files). capa-java fills that gap: it runs a rule set of Java API and string signatures over a sample and tells you which class shows which capability.

It is a triage aid — heuristic, not proof. It points you at the interesting classes fast; you confirm the hits by reading the cited line.

Why it's useful

No dependencies. Pure Python standard library. Clone and run.
Two input modes. Point it at a decompiled .java source tree (richest results), or straight at a .jar — in JAR mode it reads printable strings out of each .class constant pool, so you don't even need to decompile first.
Survives string obfuscation. Malware can encrypt string literals ("Login Data", schtasks), but it cannot hide API/type references (java.awt.Robot, Cipher.getInstance, com.sun.jna...) — those must stay linkable in the constant pool. Every rule is labelled api (survives obfuscation) or str (does not), so you know how much to trust each hit.
MITRE-style capability names. persistence/scheduled-task, creds/browser-chromium, c2/reverse-shell, collection/keylogging, and ~50 more.

Install

git clone https://github.com/boredchilada/capa-java.git
cd capa-java

Requires Python 3.8+. No pip install needed.

Usage

# Run on a decompiled .java tree (richest), or straight on a .jar
python3 java_capa.py <dir-or-jar>

# Show the matched string + kind (api/str) and notable imports per class
python3 java_capa.py <dir-or-jar> --evidence --imports

# Only list classes with at least N capabilities
python3 java_capa.py <dir-or-jar> --min 3

Flag	Effect
`--evidence`	Show a sample match, its kind (`api`/`str`), and `class:line` per capability
`--imports`	List each class's notable API imports (the survives-obfuscation signal)
`--min N`	Only print classes with `>= N` capabilities

Output

Three sections:

Summary — classes scanned, how many carry a capability, distinct capabilities seen.
Class → capabilities — every flagged class, sorted by capability count, with evidence and imports when requested.
Capability → classes — the inverse index, so you can jump straight to "which classes do persistence?"

See examples/strrat-sample-output.txt for real output against a STRRAT sample.

How it works

For each unit (.java file or .class entry) the tool builds the raw text plus a slash→dot normalized copy, so .class references like java/awt/Robot also match dotted API patterns while raw forms like cmd /c stay intact. Each capability is a list of (kind, regex) pairs; the first match wins and is recorded with its line number.

Extending the rules

The whole rule set is the RULES dict at the top of java_capa.py. Add a capability or a signature like this:

RULES = {
    # ...
    "persistence/scheduled-task": [
        ("str", r"schtasks"),       # literal string — defeated by obfuscation
        ("str", r"/sc minute"),
    ],
    "crypto/cipher": [
        ("api", r"Cipher\.getInstance"),  # API ref — survives obfuscation
        ("api", r"AES/"),
    ],
}

api rules key on imported APIs/types and survive string-literal obfuscation; str rules key on literal strings and do not. Patterns are case-insensitive regexes.

Limitations

Heuristic by design — expect false positives and false negatives. Always confirm a hit by reading the cited class and line.
In JAR mode, the "line number" is the index within the per-class string dump, not a source line. Decompile for true source lines.
Rules currently lean toward Windows-targeting commodity Java RATs/stealers.

Safety

This tool only reads samples (string extraction and regex matching) — it never executes anything. That said, the inputs are live malware: analyse them in an isolated VM, and never commit real samples to a repo (see .gitignore).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
java_capa.py		java_capa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

capa-java

Why it's useful

Install

Usage

Output

How it works

Extending the rules

Limitations

Safety

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

capa-java

Why it's useful

Install

Usage

Output

How it works

Extending the rules

Limitations

Safety

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages