From 0558b54c88041431efd259643dc4dfa066e1cefe Mon Sep 17 00:00:00 2001 From: bgagent Date: Tue, 23 Jun 2026 14:00:39 -0400 Subject: [PATCH 1/2] =?UTF-8?q?docs(skills):=20fix=20deploy/submit-task/tr?= =?UTF-8?q?oubleshoot/status=20=E2=80=94=20stale=20claims,=20node=20PATH,?= =?UTF-8?q?=20onboard=20path?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Audit of the four remaining plugin skills against issues found in live use: deploy: - Correct the inverted least-privilege claim. It said bootstrap grants AdministratorAccess by default and scoping is an optional prod step — the opposite of reality: the custom least-privilege bootstrap (ADR-002) IS the default. Rewritten to describe that, plus the add-a-resource→add-an-action consequence (the #402/#404/#407/#409 class). - Deploy command uses `--require-approval never` (non-TTY hang); add arch/binfmt pre-check; add DELETE_FAILED/Hyperplane-ENI teardown + ROLLBACK_COMPLETE notes to destroy; "trust the exit code" on the noisy build; onboard via `bgagent repo onboard`, not Blueprint-by-default. troubleshoot: - REPO_NOT_ONBOARDED fix is `bgagent repo onboard`, not "needs a Blueprint". - 403-model fix: point the repo at an already-granted model, or add a model via the shared `bedrockModels` context (#433) + account-level Bedrock access — not "edit grantInvoke in agent.ts". submit-task / troubleshoot / status: - Note that `node cli/lib/bin/bgagent.js …` needs `mise exec --` in a non-interactive / mise-managed shell (the live `node: command not found`). Docs-only; abca-plugin skills are not Starlight-mirrored (no docs:sync needed). --- docs/abca-plugin/skills/deploy/SKILL.md | 28 +++++++++---------- docs/abca-plugin/skills/status/SKILL.md | 2 ++ docs/abca-plugin/skills/submit-task/SKILL.md | 2 ++ docs/abca-plugin/skills/troubleshoot/SKILL.md | 12 +++++--- 4 files changed, 26 insertions(+), 18 deletions(-) diff --git a/docs/abca-plugin/skills/deploy/SKILL.md b/docs/abca-plugin/skills/deploy/SKILL.md index af7550d0..f96a3f4c 100644 --- a/docs/abca-plugin/skills/deploy/SKILL.md +++ b/docs/abca-plugin/skills/deploy/SKILL.md @@ -28,18 +28,21 @@ Before any deployment action, verify: export MISE_EXPERIMENTAL=1 mise run build ``` - This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails. + This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails. Note: a passing build is noisy — it prints many `ERROR`/`WARN` and cdk-nag lines from test fixtures. Trust the **exit code (0 = pass)**, not the log volume. -2. **Docker is running** — Required for CDK asset bundling -3. **AWS credentials are configured** — `aws sts get-caller-identity` +2. **Docker is running** — Required for CDK asset bundling. +3. **Build host architecture** — The agent image targets `linux/arm64` (AgentCore is Graviton). On an **x86_64** host without QEMU/binfmt, the deploy fails partway with `exec /bin/sh: exec format error`. Register emulation once with `docker run --privileged --rm tonistiigi/binfmt --install arm64`, or deploy from a native arm64 host (Graviton / Apple Silicon). Skip on arm64 hosts. +4. **AWS credentials are configured** — `aws sts get-caller-identity` (confirm it's the intended account/region). ## Deploy Workflow ```bash export MISE_EXPERIMENTAL=1 -mise //cdk:deploy +mise //cdk:deploy -- --require-approval never ``` +`--require-approval never` lets the deploy run unattended. **In a non-interactive shell (CI, agent, script) it's required** — without it, `cdk deploy` hangs forever on the IAM/security-group approval prompt. Drop the flag if you're deploying interactively and want to review those changes. + After successful deployment, retrieve and display stack outputs: ```bash aws cloudformation describe-stacks --stack-name backgroundagent-dev \ @@ -66,6 +69,8 @@ export MISE_EXPERIMENTAL=1 mise //cdk:destroy ``` +**Teardown can stall in `DELETE_FAILED`** on a security group / private subnet: AgentCore injects service-managed (Hyperplane) ENIs into the VPC, and AWS reclaims them **asynchronously (~20–40 min)** after the runtime is gone. Wait for the ENIs to clear, then retry `mise //cdk:destroy`. Do **not** force-delete past the stuck VPC resources (`--deletion-mode FORCE_DELETE_STACK` / retaining them) — that orphans the VPC, and VPCs are quota-capped per Region. Also note: a first-create failure leaves the stack in `ROLLBACK_COMPLETE`, which can't be updated — destroy and redeploy fresh. + ## Synth Workflow ```bash @@ -79,18 +84,13 @@ Output goes to `cdk/cdk.out/`. Useful for reviewing generated CloudFormation tem After a successful deploy, remind the user to: - Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment -- Onboard repositories via Blueprint constructs if needed +- Onboard repositories with `bgagent repo onboard ` (a runtime operation — no redeploy). A CDK `Blueprint` construct is only needed for declarative config (Cedar policies, egress allowlist, system-prompt overrides) — see the `onboard-repo` skill - Run a smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks` -## Least-Privilege Deployment +## Least-Privilege Bootstrap (the default) -By default, CDK bootstrap grants `AdministratorAccess` to the CloudFormation execution role. For production or security-sensitive accounts, re-bootstrap with a scoped execution policy: +`mise //cdk:bootstrap` provisions a **custom least-privilege** CloudFormation execution role by default — NOT `AdministratorAccess` (ADR-002). It deploys `cdk/bootstrap/bootstrap-template.yaml`, which creates scoped `IaCRole-ABCA-*` managed policies (Infrastructure / Application / Observability) generated from `cdk/src/bootstrap/policies/`. -```bash -cdk bootstrap aws://ACCOUNT/REGION \ - --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Infrastructure" \ - --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Application" \ - --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Observability" -``` +A consequence worth knowing when you **add a new resource type or a new feature on an existing resource**: the scoped role must allow the IAM action CloudFormation will call, or the deploy rolls back with `AccessDenied` on that action (e.g. `s3:PutBucketVersioning`, `lambda:TagResource`). The fix is to add the action to the relevant policy in `cdk/src/bootstrap/policies/`, regenerate (`mise //cdk:bootstrap:generate`), re-bootstrap, and redeploy. The policy source and the `DEPLOYMENT_ROLES.md` golden doc are kept in sync by tests. -See `docs/design/DEPLOYMENT_ROLES.md` in the repo root for the complete least-privilege IAM policies, trust policy, runtime role inventory, and iterative tightening recommendations. +See `docs/design/DEPLOYMENT_ROLES.md` for the complete IAM policies, trust policy, runtime role inventory, and tightening recommendations. diff --git a/docs/abca-plugin/skills/status/SKILL.md b/docs/abca-plugin/skills/status/SKILL.md index 9c13c4b4..3ac24af3 100644 --- a/docs/abca-plugin/skills/status/SKILL.md +++ b/docs/abca-plugin/skills/status/SKILL.md @@ -10,6 +10,8 @@ allowed-tools: Check the current state of the ABCA platform and report a concise status summary. +> **Running the CLI:** the `node cli/lib/bin/bgagent.js …` checks below need `node` on `PATH`; in a non-interactive or mise-managed shell, prefix with `mise exec --`. + ## Checks to Run Run these in parallel where possible: diff --git a/docs/abca-plugin/skills/submit-task/SKILL.md b/docs/abca-plugin/skills/submit-task/SKILL.md index 342685b4..793d78a1 100644 --- a/docs/abca-plugin/skills/submit-task/SKILL.md +++ b/docs/abca-plugin/skills/submit-task/SKILL.md @@ -13,6 +13,8 @@ argument-hint: [description] You are helping the user submit a well-crafted coding task to the ABCA platform. Good prompts are critical — the agent works autonomously without asking clarifying questions. +> **Running the CLI:** examples below call `node cli/lib/bin/bgagent.js …` from the repo root. In a **non-interactive or mise-managed shell** `node` may not be on `PATH` (`command not found`) — prefix with `mise exec --` (e.g. `mise exec -- node cli/lib/bin/bgagent.js submit …`), or use a global `bgagent` if installed. If `cli/lib/bin/bgagent.js` is missing, run `mise run build` first. + **Quick mode:** If the user provided a repo and description inline (e.g. "submit task to owner/repo: fix the login bug"), auto-detect the task type from the description and skip to Step 5. Infer the type: - PR number or "review PR" → `--review-pr` - "iterate on PR" or "fix PR feedback" → `--pr` diff --git a/docs/abca-plugin/skills/troubleshoot/SKILL.md b/docs/abca-plugin/skills/troubleshoot/SKILL.md index 4afc15fa..38d3e058 100644 --- a/docs/abca-plugin/skills/troubleshoot/SKILL.md +++ b/docs/abca-plugin/skills/troubleshoot/SKILL.md @@ -12,6 +12,8 @@ description: >- You are diagnosing an issue with the ABCA platform. Follow a systematic approach: gather symptoms, check the most common causes, and apply targeted fixes. +> **Running the CLI:** commands below call `node cli/lib/bin/bgagent.js …`. In a non-interactive or mise-managed shell `node` may not be on `PATH` — prefix with `mise exec --`. Ironically, `node: command not found` is itself a common symptom (the shell hasn't activated mise); that's a missing prefix, not a broken install. + ## Step 1: Identify the Problem Category Determine which area the issue falls into: @@ -71,8 +73,9 @@ aws cognito-idp admin-get-user \ ## Task Submission Issues (422 / 400) -**"Repository not onboarded" (422):** -- The repo needs a Blueprint construct. Use the `onboard-repo` skill. +**"Repository not onboarded" / `REPO_NOT_ONBOARDED` (422):** +- The repo isn't registered. Fastest fix: `bgagent repo onboard ` (operator path — writes the RepoTable record at runtime, no redeploy). A CDK Blueprint is only needed for declarative config. Use the `onboard-repo` skill for details. +- Also confirm the `owner/repo` matches **exactly** what you pass to `bgagent submit --repo`. **"GUARDRAIL_BLOCKED" (400):** - Task description triggered Bedrock Guardrails content screening @@ -108,8 +111,9 @@ node cli/lib/bin/bgagent.js events --output json - Common: repo build/test commands not documented in CLAUDE.md **403 "not authorized to perform bedrock:InvokeModelWithResponseStream":** -- The Blueprint specifies a model that the runtime IAM role doesn't have permissions for -- Fix: add `grantInvoke` for the model and its cross-region inference profile in `cdk/src/stacks/agent.ts`, then redeploy +- The repo's `model_id` is a model the runtime IAM role wasn't granted. The runtime only has `grantInvoke` for the models in the stack's configured set (Sonnet 4.6, Opus 4, Haiku 4.5 by default). +- **Quick fix:** point the repo at an already-granted model — `bgagent repo onboard --model us.anthropic.claude-sonnet-4-6` (no redeploy). +- **To add a new model to the runtime:** grant it in the stack and redeploy. The model set is the shared list in `cdk/src/constructs/bedrock-models.ts` — add the model via the `bedrockModels` CDK context (`cdk.json`) so both the AgentCore and ECS backends grant it (#433). Adding a model also requires **account-level Bedrock access** for it (separate from IAM — see the next row). **Model not enabled / "not available on your Bedrock deployment" (often immediate failure, few turns, zero or near-zero tokens):** - **IAM is necessary but not sufficient.** The AgentCore role may already have `bedrock:InvokeModel*`, but the **account** must also satisfy [Amazon Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html): Marketplace subscription flow on first serverless use (with `aws-marketplace:Subscribe` / `ViewSubscriptions` where needed), Anthropic **first-time use** details (`PutUseCaseForModelAccess` or the console model catalog), and a valid payment method for Marketplace-backed models. From 617c05562091450d087380449963f7ca93be35bd Mon Sep 17 00:00:00 2001 From: bgagent Date: Tue, 23 Jun 2026 16:15:16 -0400 Subject: [PATCH 2/2] =?UTF-8?q?docs(deploy=20skill):=20address=20review=20?= =?UTF-8?q?=E2=80=94=20clarify=20CLI-onboard=20scope,=20add=20platform=20d?= =?UTF-8?q?octor?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per @krokoko on #435: - The `bgagent repo onboard` line over-claimed. Clarify it's the runtime path for repos that fit the platform/default-blueprint setup (default token, already-granted model, default egress); a repo needing its own token, an ungranted model, custom egress, Cedar policies, or system-prompt overrides still needs a dedicated CDK Blueprint + redeploy with correct permissions. - Add `bgagent platform doctor` to post-deploy as the readiness check before submitting a task (smoke-checks API/Cognito/token/Bedrock/onboarded repos); keep the raw curl as a lower-level alternative. --- docs/abca-plugin/skills/deploy/SKILL.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/abca-plugin/skills/deploy/SKILL.md b/docs/abca-plugin/skills/deploy/SKILL.md index f96a3f4c..09d65513 100644 --- a/docs/abca-plugin/skills/deploy/SKILL.md +++ b/docs/abca-plugin/skills/deploy/SKILL.md @@ -83,9 +83,10 @@ Output goes to `cdk/cdk.out/`. Useful for reviewing generated CloudFormation tem ## Post-Deployment After a successful deploy, remind the user to: -- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment -- Onboard repositories with `bgagent repo onboard ` (a runtime operation — no redeploy). A CDK `Blueprint` construct is only needed for declarative config (Cedar policies, egress allowlist, system-prompt overrides) — see the `onboard-repo` skill -- Run a smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks` +- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment. +- Onboard a repository. `bgagent repo onboard ` is a runtime operation (no redeploy) that works when the repo can use the **platform/default-blueprint** setup — the default GitHub token secret, an already-granted model, and the default egress allowlist. A repo that needs its **own** config — a per-repo GitHub token, a model not yet granted to the runtime, custom egress domains, Cedar HITL policies, or system-prompt overrides — needs a dedicated CDK `Blueprint` construct and a redeploy (with the correct permissions). See the `onboard-repo` skill for both paths. +- **Verify readiness before submitting a task:** `bgagent platform doctor` smoke-checks the API, Cognito, GitHub token, Bedrock model access, and onboarded repos — confirm everything is green first. +- (Lower-level alternative) raw API smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`. ## Least-Privilege Bootstrap (the default)