Skip to content

boredchilada/capa-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

capa-java

A lightweight capability tagger for Java malware — a "capa-lite" for the JVM.

Mandiant's capa is great, but it does not support Java bytecode (it handles PE, ELF, .NET, shellcode, and sandbox traces — not .class / JAR files). capa-java fills that gap: it runs a rule set of Java API and string signatures over a sample and tells you which class shows which capability.

It is a triage aid — heuristic, not proof. It points you at the interesting classes fast; you confirm the hits by reading the cited line.

Why it's useful

  • No dependencies. Pure Python standard library. Clone and run.
  • Two input modes. Point it at a decompiled .java source tree (richest results), or straight at a .jar — in JAR mode it reads printable strings out of each .class constant pool, so you don't even need to decompile first.
  • Survives string obfuscation. Malware can encrypt string literals ("Login Data", schtasks), but it cannot hide API/type references (java.awt.Robot, Cipher.getInstance, com.sun.jna...) — those must stay linkable in the constant pool. Every rule is labelled api (survives obfuscation) or str (does not), so you know how much to trust each hit.
  • MITRE-style capability names. persistence/scheduled-task, creds/browser-chromium, c2/reverse-shell, collection/keylogging, and ~50 more.

Install

git clone https://github.com/boredchilada/capa-java.git
cd capa-java

Requires Python 3.8+. No pip install needed.

Usage

# Run on a decompiled .java tree (richest), or straight on a .jar
python3 java_capa.py <dir-or-jar>

# Show the matched string + kind (api/str) and notable imports per class
python3 java_capa.py <dir-or-jar> --evidence --imports

# Only list classes with at least N capabilities
python3 java_capa.py <dir-or-jar> --min 3
Flag Effect
--evidence Show a sample match, its kind (api/str), and class:line per capability
--imports List each class's notable API imports (the survives-obfuscation signal)
--min N Only print classes with >= N capabilities

Output

Three sections:

  1. Summary — classes scanned, how many carry a capability, distinct capabilities seen.
  2. Class → capabilities — every flagged class, sorted by capability count, with evidence and imports when requested.
  3. Capability → classes — the inverse index, so you can jump straight to "which classes do persistence?"

See examples/strrat-sample-output.txt for real output against a STRRAT sample.

How it works

For each unit (.java file or .class entry) the tool builds the raw text plus a slash→dot normalized copy, so .class references like java/awt/Robot also match dotted API patterns while raw forms like cmd /c stay intact. Each capability is a list of (kind, regex) pairs; the first match wins and is recorded with its line number.

Extending the rules

The whole rule set is the RULES dict at the top of java_capa.py. Add a capability or a signature like this:

RULES = {
    # ...
    "persistence/scheduled-task": [
        ("str", r"schtasks"),       # literal string — defeated by obfuscation
        ("str", r"/sc minute"),
    ],
    "crypto/cipher": [
        ("api", r"Cipher\.getInstance"),  # API ref — survives obfuscation
        ("api", r"AES/"),
    ],
}

api rules key on imported APIs/types and survive string-literal obfuscation; str rules key on literal strings and do not. Patterns are case-insensitive regexes.

Limitations

  • Heuristic by design — expect false positives and false negatives. Always confirm a hit by reading the cited class and line.
  • In JAR mode, the "line number" is the index within the per-class string dump, not a source line. Decompile for true source lines.
  • Rules currently lean toward Windows-targeting commodity Java RATs/stealers.

Safety

This tool only reads samples (string extraction and regex matching) — it never executes anything. That said, the inputs are live malware: analyse them in an isolated VM, and never commit real samples to a repo (see .gitignore).

License

MIT

About

capa-lite for Java malware: tag each class with capabilities via API + string signatures. Runs on a .jar or a decompiled .java tree. Pure stdlib, no deps.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages