aws-samples · krokoko · Jun 23, 2026 · Jun 23, 2026 · Jun 23, 2026
@@ -28,18 +28,21 @@ Before any deployment action, verify:
    export MISE_EXPERIMENTAL=1
    mise run build
    ```
-   This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails.
+   This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails. Note: a passing build is noisy — it prints many `ERROR`/`WARN` and cdk-nag lines from test fixtures. Trust the **exit code (0 = pass)**, not the log volume.
 
-2. **Docker is running** — Required for CDK asset bundling
-3. **AWS credentials are configured** — `aws sts get-caller-identity`
+2. **Docker is running** — Required for CDK asset bundling.
+3. **Build host architecture** — The agent image targets `linux/arm64` (AgentCore is Graviton). On an **x86_64** host without QEMU/binfmt, the deploy fails partway with `exec /bin/sh: exec format error`. Register emulation once with `docker run --privileged --rm tonistiigi/binfmt --install arm64`, or deploy from a native arm64 host (Graviton / Apple Silicon). Skip on arm64 hosts.
+4. **AWS credentials are configured** — `aws sts get-caller-identity` (confirm it's the intended account/region).
 
 ## Deploy Workflow
 
 ```bash
 export MISE_EXPERIMENTAL=1
-mise //cdk:deploy
+mise //cdk:deploy -- --require-approval never
 ```
 
+`--require-approval never` lets the deploy run unattended. **In a non-interactive shell (CI, agent, script) it's required** — without it, `cdk deploy` hangs forever on the IAM/security-group approval prompt. Drop the flag if you're deploying interactively and want to review those changes.
+
 After successful deployment, retrieve and display stack outputs:
 ```bash
 aws cloudformation describe-stacks --stack-name backgroundagent-dev \
@@ -66,6 +69,8 @@ export MISE_EXPERIMENTAL=1
 mise //cdk:destroy
 ```
 
+**Teardown can stall in `DELETE_FAILED`** on a security group / private subnet: AgentCore injects service-managed (Hyperplane) ENIs into the VPC, and AWS reclaims them **asynchronously (~20–40 min)** after the runtime is gone. Wait for the ENIs to clear, then retry `mise //cdk:destroy`. Do **not** force-delete past the stuck VPC resources (`--deletion-mode FORCE_DELETE_STACK` / retaining them) — that orphans the VPC, and VPCs are quota-capped per Region. Also note: a first-create failure leaves the stack in `ROLLBACK_COMPLETE`, which can't be updated — destroy and redeploy fresh.
+
 ## Synth Workflow
 
 ```bash
@@ -78,19 +83,15 @@ Output goes to `cdk/cdk.out/`. Useful for reviewing generated CloudFormation tem
 ## Post-Deployment
 
 After a successful deploy, remind the user to:
-- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment
-- Onboard repositories via Blueprint constructs if needed
-- Run a smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`
+- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment.
+- Onboard a repository. `bgagent repo onboard <owner/repo>` is a runtime operation (no redeploy) that works when the repo can use the **platform/default-blueprint** setup — the default GitHub token secret, an already-granted model, and the default egress allowlist. A repo that needs its **own** config — a per-repo GitHub token, a model not yet granted to the runtime, custom egress domains, Cedar HITL policies, or system-prompt overrides — needs a dedicated CDK `Blueprint` construct and a redeploy (with the correct permissions). See the `onboard-repo` skill for both paths.
+- **Verify readiness before submitting a task:** `bgagent platform doctor` smoke-checks the API, Cognito, GitHub token, Bedrock model access, and onboarded repos — confirm everything is green first.
+- (Lower-level alternative) raw API smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`.
 
-## Least-Privilege Deployment
+## Least-Privilege Bootstrap (the default)
 
-By default, CDK bootstrap grants `AdministratorAccess` to the CloudFormation execution role. For production or security-sensitive accounts, re-bootstrap with a scoped execution policy:
+`mise //cdk:bootstrap` provisions a **custom least-privilege** CloudFormation execution role by default — NOT `AdministratorAccess` (ADR-002). It deploys `cdk/bootstrap/bootstrap-template.yaml`, which creates scoped `IaCRole-ABCA-*` managed policies (Infrastructure / Application / Observability) generated from `cdk/src/bootstrap/policies/`.
 
-```bash
-cdk bootstrap aws://ACCOUNT/REGION \
-  --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Infrastructure" \
-  --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Application" \
-  --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Observability"
-```
+A consequence worth knowing when you **add a new resource type or a new feature on an existing resource**: the scoped role must allow the IAM action CloudFormation will call, or the deploy rolls back with `AccessDenied` on that action (e.g. `s3:PutBucketVersioning`, `lambda:TagResource`). The fix is to add the action to the relevant policy in `cdk/src/bootstrap/policies/`, regenerate (`mise //cdk:bootstrap:generate`), re-bootstrap, and redeploy. The policy source and the `DEPLOYMENT_ROLES.md` golden doc are kept in sync by tests.
 
-See `docs/design/DEPLOYMENT_ROLES.md` in the repo root for the complete least-privilege IAM policies, trust policy, runtime role inventory, and iterative tightening recommendations.
+See `docs/design/DEPLOYMENT_ROLES.md` for the complete IAM policies, trust policy, runtime role inventory, and tightening recommendations.
@@ -10,6 +10,8 @@ allowed-tools:
 
 Check the current state of the ABCA platform and report a concise status summary.
 
+> **Running the CLI:** the `node cli/lib/bin/bgagent.js …` checks below need `node` on `PATH`; in a non-interactive or mise-managed shell, prefix with `mise exec --`.
+
 ## Checks to Run
 
 Run these in parallel where possible:

@@ -13,6 +13,8 @@ argument-hint: <repo> [description]
 
 You are helping the user submit a well-crafted coding task to the ABCA platform. Good prompts are critical — the agent works autonomously without asking clarifying questions.
 
+> **Running the CLI:** examples below call `node cli/lib/bin/bgagent.js …` from the repo root. In a **non-interactive or mise-managed shell** `node` may not be on `PATH` (`command not found`) — prefix with `mise exec --` (e.g. `mise exec -- node cli/lib/bin/bgagent.js submit …`), or use a global `bgagent` if installed. If `cli/lib/bin/bgagent.js` is missing, run `mise run build` first.
+
 **Quick mode:** If the user provided a repo and description inline (e.g. "submit task to owner/repo: fix the login bug"), auto-detect the task type from the description and skip to Step 5. Infer the type:
 - PR number or "review PR" → `--review-pr`
 - "iterate on PR" or "fix PR feedback" → `--pr`

@@ -12,6 +12,8 @@ description: >-
 
 You are diagnosing an issue with the ABCA platform. Follow a systematic approach: gather symptoms, check the most common causes, and apply targeted fixes.
 
+> **Running the CLI:** commands below call `node cli/lib/bin/bgagent.js …`. In a non-interactive or mise-managed shell `node` may not be on `PATH` — prefix with `mise exec --`. Ironically, `node: command not found` is itself a common symptom (the shell hasn't activated mise); that's a missing prefix, not a broken install.
+
 ## Step 1: Identify the Problem Category
 
 Determine which area the issue falls into:
@@ -71,8 +73,9 @@ aws cognito-idp admin-get-user \
 
 ## Task Submission Issues (422 / 400)
 
-**"Repository not onboarded" (422):**
-- The repo needs a Blueprint construct. Use the `onboard-repo` skill.
+**"Repository not onboarded" / `REPO_NOT_ONBOARDED` (422):**
+- The repo isn't registered. Fastest fix: `bgagent repo onboard <owner/repo>` (operator path — writes the RepoTable record at runtime, no redeploy). A CDK Blueprint is only needed for declarative config. Use the `onboard-repo` skill for details.
+- Also confirm the `owner/repo` matches **exactly** what you pass to `bgagent submit --repo`.
 
 **"GUARDRAIL_BLOCKED" (400):**
 - Task description triggered Bedrock Guardrails content screening
@@ -108,8 +111,9 @@ node cli/lib/bin/bgagent.js events <TASK_ID> --output json
 - Common: repo build/test commands not documented in CLAUDE.md
 
 **403 "not authorized to perform bedrock:InvokeModelWithResponseStream":**
-- The Blueprint specifies a model that the runtime IAM role doesn't have permissions for
-- Fix: add `grantInvoke` for the model and its cross-region inference profile in `cdk/src/stacks/agent.ts`, then redeploy
+- The repo's `model_id` is a model the runtime IAM role wasn't granted. The runtime only has `grantInvoke` for the models in the stack's configured set (Sonnet 4.6, Opus 4, Haiku 4.5 by default).
+- **Quick fix:** point the repo at an already-granted model — `bgagent repo onboard <owner/repo> --model us.anthropic.claude-sonnet-4-6` (no redeploy).
+- **To add a new model to the runtime:** grant it in the stack and redeploy. The model set is the shared list in `cdk/src/constructs/bedrock-models.ts` — add the model via the `bedrockModels` CDK context (`cdk.json`) so both the AgentCore and ECS backends grant it (#433). Adding a model also requires **account-level Bedrock access** for it (separate from IAM — see the next row).
 
 **Model not enabled / "not available on your Bedrock deployment" (often immediate failure, few turns, zero or near-zero tokens):**
 - **IAM is necessary but not sufficient.** The AgentCore role may already have `bedrock:InvokeModel*`, but the **account** must also satisfy [Amazon Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html): Marketplace subscription flow on first serverless use (with `aws-marketplace:Subscribe` / `ViewSubscriptions` where needed), Anthropic **first-time use** details (`PutUseCaseForModelAccess` or the console model catalog), and a valid payment method for Marketplace-backed models.