From 0558b54c88041431efd259643dc4dfa066e1cefe Mon Sep 17 00:00:00 2001
From: bgagent <bgagent@noreply.github.com>
Date: Tue, 23 Jun 2026 14:00:39 -0400
Subject: [PATCH 1/2] =?UTF-8?q?docs(skills):=20fix=20deploy/submit-task/tr?=
 =?UTF-8?q?oubleshoot/status=20=E2=80=94=20stale=20claims,=20node=20PATH,?=
 =?UTF-8?q?=20onboard=20path?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Audit of the four remaining plugin skills against issues found in live use:

deploy:
- Correct the inverted least-privilege claim. It said bootstrap grants
  AdministratorAccess by default and scoping is an optional prod step — the
  opposite of reality: the custom least-privilege bootstrap (ADR-002) IS the
  default. Rewritten to describe that, plus the add-a-resource→add-an-action
  consequence (the #402/#404/#407/#409 class).
- Deploy command uses `--require-approval never` (non-TTY hang); add arch/binfmt
  pre-check; add DELETE_FAILED/Hyperplane-ENI teardown + ROLLBACK_COMPLETE notes
  to destroy; "trust the exit code" on the noisy build; onboard via
  `bgagent repo onboard`, not Blueprint-by-default.

troubleshoot:
- REPO_NOT_ONBOARDED fix is `bgagent repo onboard`, not "needs a Blueprint".
- 403-model fix: point the repo at an already-granted model, or add a model via
  the shared `bedrockModels` context (#433) + account-level Bedrock access —
  not "edit grantInvoke in agent.ts".

submit-task / troubleshoot / status:
- Note that `node cli/lib/bin/bgagent.js …` needs `mise exec --` in a
  non-interactive / mise-managed shell (the live `node: command not found`).

Docs-only; abca-plugin skills are not Starlight-mirrored (no docs:sync needed).
---
 docs/abca-plugin/skills/deploy/SKILL.md       | 28 +++++++++----------
 docs/abca-plugin/skills/status/SKILL.md       |  2 ++
 docs/abca-plugin/skills/submit-task/SKILL.md  |  2 ++
 docs/abca-plugin/skills/troubleshoot/SKILL.md | 12 +++++---
 4 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/docs/abca-plugin/skills/deploy/SKILL.md b/docs/abca-plugin/skills/deploy/SKILL.md
index af7550d0..f96a3f4c 100644
--- a/docs/abca-plugin/skills/deploy/SKILL.md
+++ b/docs/abca-plugin/skills/deploy/SKILL.md
@@ -28,18 +28,21 @@ Before any deployment action, verify:
    export MISE_EXPERIMENTAL=1
    mise run build
    ```
-   This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails.
+   This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails. Note: a passing build is noisy — it prints many `ERROR`/`WARN` and cdk-nag lines from test fixtures. Trust the **exit code (0 = pass)**, not the log volume.
 
-2. **Docker is running** — Required for CDK asset bundling
-3. **AWS credentials are configured** — `aws sts get-caller-identity`
+2. **Docker is running** — Required for CDK asset bundling.
+3. **Build host architecture** — The agent image targets `linux/arm64` (AgentCore is Graviton). On an **x86_64** host without QEMU/binfmt, the deploy fails partway with `exec /bin/sh: exec format error`. Register emulation once with `docker run --privileged --rm tonistiigi/binfmt --install arm64`, or deploy from a native arm64 host (Graviton / Apple Silicon). Skip on arm64 hosts.
+4. **AWS credentials are configured** — `aws sts get-caller-identity` (confirm it's the intended account/region).
 
 ## Deploy Workflow
 
 ```bash
 export MISE_EXPERIMENTAL=1
-mise //cdk:deploy
+mise //cdk:deploy -- --require-approval never
 ```
 
+`--require-approval never` lets the deploy run unattended. **In a non-interactive shell (CI, agent, script) it's required** — without it, `cdk deploy` hangs forever on the IAM/security-group approval prompt. Drop the flag if you're deploying interactively and want to review those changes.
+
 After successful deployment, retrieve and display stack outputs:
 ```bash
 aws cloudformation describe-stacks --stack-name backgroundagent-dev \
@@ -66,6 +69,8 @@ export MISE_EXPERIMENTAL=1
 mise //cdk:destroy
 ```
 
+**Teardown can stall in `DELETE_FAILED`** on a security group / private subnet: AgentCore injects service-managed (Hyperplane) ENIs into the VPC, and AWS reclaims them **asynchronously (~20–40 min)** after the runtime is gone. Wait for the ENIs to clear, then retry `mise //cdk:destroy`. Do **not** force-delete past the stuck VPC resources (`--deletion-mode FORCE_DELETE_STACK` / retaining them) — that orphans the VPC, and VPCs are quota-capped per Region. Also note: a first-create failure leaves the stack in `ROLLBACK_COMPLETE`, which can't be updated — destroy and redeploy fresh.
+
 ## Synth Workflow
 
 ```bash
@@ -79,18 +84,13 @@ Output goes to `cdk/cdk.out/`. Useful for reviewing generated CloudFormation tem
 
 After a successful deploy, remind the user to:
 - Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment
-- Onboard repositories via Blueprint constructs if needed
+- Onboard repositories with `bgagent repo onboard <owner/repo>` (a runtime operation — no redeploy). A CDK `Blueprint` construct is only needed for declarative config (Cedar policies, egress allowlist, system-prompt overrides) — see the `onboard-repo` skill
 - Run a smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`
 
-## Least-Privilege Deployment
+## Least-Privilege Bootstrap (the default)
 
-By default, CDK bootstrap grants `AdministratorAccess` to the CloudFormation execution role. For production or security-sensitive accounts, re-bootstrap with a scoped execution policy:
+`mise //cdk:bootstrap` provisions a **custom least-privilege** CloudFormation execution role by default — NOT `AdministratorAccess` (ADR-002). It deploys `cdk/bootstrap/bootstrap-template.yaml`, which creates scoped `IaCRole-ABCA-*` managed policies (Infrastructure / Application / Observability) generated from `cdk/src/bootstrap/policies/`.
 
-```bash
-cdk bootstrap aws://ACCOUNT/REGION \
-  --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Infrastructure" \
-  --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Application" \
-  --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Observability"
-```
+A consequence worth knowing when you **add a new resource type or a new feature on an existing resource**: the scoped role must allow the IAM action CloudFormation will call, or the deploy rolls back with `AccessDenied` on that action (e.g. `s3:PutBucketVersioning`, `lambda:TagResource`). The fix is to add the action to the relevant policy in `cdk/src/bootstrap/policies/`, regenerate (`mise //cdk:bootstrap:generate`), re-bootstrap, and redeploy. The policy source and the `DEPLOYMENT_ROLES.md` golden doc are kept in sync by tests.
 
-See `docs/design/DEPLOYMENT_ROLES.md` in the repo root for the complete least-privilege IAM policies, trust policy, runtime role inventory, and iterative tightening recommendations.
+See `docs/design/DEPLOYMENT_ROLES.md` for the complete IAM policies, trust policy, runtime role inventory, and tightening recommendations.
diff --git a/docs/abca-plugin/skills/status/SKILL.md b/docs/abca-plugin/skills/status/SKILL.md
index 9c13c4b4..3ac24af3 100644
--- a/docs/abca-plugin/skills/status/SKILL.md
+++ b/docs/abca-plugin/skills/status/SKILL.md
@@ -10,6 +10,8 @@ allowed-tools:
 
 Check the current state of the ABCA platform and report a concise status summary.
 
+> **Running the CLI:** the `node cli/lib/bin/bgagent.js …` checks below need `node` on `PATH`; in a non-interactive or mise-managed shell, prefix with `mise exec --`.
+
 ## Checks to Run
 
 Run these in parallel where possible:
diff --git a/docs/abca-plugin/skills/submit-task/SKILL.md b/docs/abca-plugin/skills/submit-task/SKILL.md
index 342685b4..793d78a1 100644
--- a/docs/abca-plugin/skills/submit-task/SKILL.md
+++ b/docs/abca-plugin/skills/submit-task/SKILL.md
@@ -13,6 +13,8 @@ argument-hint: <repo> [description]
 
 You are helping the user submit a well-crafted coding task to the ABCA platform. Good prompts are critical — the agent works autonomously without asking clarifying questions.
 
+> **Running the CLI:** examples below call `node cli/lib/bin/bgagent.js …` from the repo root. In a **non-interactive or mise-managed shell** `node` may not be on `PATH` (`command not found`) — prefix with `mise exec --` (e.g. `mise exec -- node cli/lib/bin/bgagent.js submit …`), or use a global `bgagent` if installed. If `cli/lib/bin/bgagent.js` is missing, run `mise run build` first.
+
 **Quick mode:** If the user provided a repo and description inline (e.g. "submit task to owner/repo: fix the login bug"), auto-detect the task type from the description and skip to Step 5. Infer the type:
 - PR number or "review PR" → `--review-pr`
 - "iterate on PR" or "fix PR feedback" → `--pr`
diff --git a/docs/abca-plugin/skills/troubleshoot/SKILL.md b/docs/abca-plugin/skills/troubleshoot/SKILL.md
index 4afc15fa..38d3e058 100644
--- a/docs/abca-plugin/skills/troubleshoot/SKILL.md
+++ b/docs/abca-plugin/skills/troubleshoot/SKILL.md
@@ -12,6 +12,8 @@ description: >-
 
 You are diagnosing an issue with the ABCA platform. Follow a systematic approach: gather symptoms, check the most common causes, and apply targeted fixes.
 
+> **Running the CLI:** commands below call `node cli/lib/bin/bgagent.js …`. In a non-interactive or mise-managed shell `node` may not be on `PATH` — prefix with `mise exec --`. Ironically, `node: command not found` is itself a common symptom (the shell hasn't activated mise); that's a missing prefix, not a broken install.
+
 ## Step 1: Identify the Problem Category
 
 Determine which area the issue falls into:
@@ -71,8 +73,9 @@ aws cognito-idp admin-get-user \
 
 ## Task Submission Issues (422 / 400)
 
-**"Repository not onboarded" (422):**
-- The repo needs a Blueprint construct. Use the `onboard-repo` skill.
+**"Repository not onboarded" / `REPO_NOT_ONBOARDED` (422):**
+- The repo isn't registered. Fastest fix: `bgagent repo onboard <owner/repo>` (operator path — writes the RepoTable record at runtime, no redeploy). A CDK Blueprint is only needed for declarative config. Use the `onboard-repo` skill for details.
+- Also confirm the `owner/repo` matches **exactly** what you pass to `bgagent submit --repo`.
 
 **"GUARDRAIL_BLOCKED" (400):**
 - Task description triggered Bedrock Guardrails content screening
@@ -108,8 +111,9 @@ node cli/lib/bin/bgagent.js events <TASK_ID> --output json
 - Common: repo build/test commands not documented in CLAUDE.md
 
 **403 "not authorized to perform bedrock:InvokeModelWithResponseStream":**
-- The Blueprint specifies a model that the runtime IAM role doesn't have permissions for
-- Fix: add `grantInvoke` for the model and its cross-region inference profile in `cdk/src/stacks/agent.ts`, then redeploy
+- The repo's `model_id` is a model the runtime IAM role wasn't granted. The runtime only has `grantInvoke` for the models in the stack's configured set (Sonnet 4.6, Opus 4, Haiku 4.5 by default).
+- **Quick fix:** point the repo at an already-granted model — `bgagent repo onboard <owner/repo> --model us.anthropic.claude-sonnet-4-6` (no redeploy).
+- **To add a new model to the runtime:** grant it in the stack and redeploy. The model set is the shared list in `cdk/src/constructs/bedrock-models.ts` — add the model via the `bedrockModels` CDK context (`cdk.json`) so both the AgentCore and ECS backends grant it (#433). Adding a model also requires **account-level Bedrock access** for it (separate from IAM — see the next row).
 
 **Model not enabled / "not available on your Bedrock deployment" (often immediate failure, few turns, zero or near-zero tokens):**
 - **IAM is necessary but not sufficient.** The AgentCore role may already have `bedrock:InvokeModel*`, but the **account** must also satisfy [Amazon Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html): Marketplace subscription flow on first serverless use (with `aws-marketplace:Subscribe` / `ViewSubscriptions` where needed), Anthropic **first-time use** details (`PutUseCaseForModelAccess` or the console model catalog), and a valid payment method for Marketplace-backed models.

From 617c05562091450d087380449963f7ca93be35bd Mon Sep 17 00:00:00 2001
From: bgagent <bgagent@noreply.github.com>
Date: Tue, 23 Jun 2026 16:15:16 -0400
Subject: [PATCH 2/2] =?UTF-8?q?docs(deploy=20skill):=20address=20review=20?=
 =?UTF-8?q?=E2=80=94=20clarify=20CLI-onboard=20scope,=20add=20platform=20d?=
 =?UTF-8?q?octor?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per @krokoko on #435:
- The `bgagent repo onboard` line over-claimed. Clarify it's the runtime path
  for repos that fit the platform/default-blueprint setup (default token,
  already-granted model, default egress); a repo needing its own token, an
  ungranted model, custom egress, Cedar policies, or system-prompt overrides
  still needs a dedicated CDK Blueprint + redeploy with correct permissions.
- Add `bgagent platform doctor` to post-deploy as the readiness check before
  submitting a task (smoke-checks API/Cognito/token/Bedrock/onboarded repos);
  keep the raw curl as a lower-level alternative.
---
 docs/abca-plugin/skills/deploy/SKILL.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/abca-plugin/skills/deploy/SKILL.md b/docs/abca-plugin/skills/deploy/SKILL.md
index f96a3f4c..09d65513 100644
--- a/docs/abca-plugin/skills/deploy/SKILL.md
+++ b/docs/abca-plugin/skills/deploy/SKILL.md
@@ -83,9 +83,10 @@ Output goes to `cdk/cdk.out/`. Useful for reviewing generated CloudFormation tem
 ## Post-Deployment
 
 After a successful deploy, remind the user to:
-- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment
-- Onboard repositories with `bgagent repo onboard <owner/repo>` (a runtime operation — no redeploy). A CDK `Blueprint` construct is only needed for declarative config (Cedar policies, egress allowlist, system-prompt overrides) — see the `onboard-repo` skill
-- Run a smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`
+- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment.
+- Onboard a repository. `bgagent repo onboard <owner/repo>` is a runtime operation (no redeploy) that works when the repo can use the **platform/default-blueprint** setup — the default GitHub token secret, an already-granted model, and the default egress allowlist. A repo that needs its **own** config — a per-repo GitHub token, a model not yet granted to the runtime, custom egress domains, Cedar HITL policies, or system-prompt overrides — needs a dedicated CDK `Blueprint` construct and a redeploy (with the correct permissions). See the `onboard-repo` skill for both paths.
+- **Verify readiness before submitting a task:** `bgagent platform doctor` smoke-checks the API, Cognito, GitHub token, Bedrock model access, and onboarded repos — confirm everything is green first.
+- (Lower-level alternative) raw API smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`.
 
 ## Least-Privilege Bootstrap (the default)