diff --git a/docs/abca-plugin/skills/deploy/SKILL.md b/docs/abca-plugin/skills/deploy/SKILL.md index af7550d0..09d65513 100644 --- a/docs/abca-plugin/skills/deploy/SKILL.md +++ b/docs/abca-plugin/skills/deploy/SKILL.md @@ -28,18 +28,21 @@ Before any deployment action, verify: export MISE_EXPERIMENTAL=1 mise run build ``` - This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails. + This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails. Note: a passing build is noisy — it prints many `ERROR`/`WARN` and cdk-nag lines from test fixtures. Trust the **exit code (0 = pass)**, not the log volume. -2. **Docker is running** — Required for CDK asset bundling -3. **AWS credentials are configured** — `aws sts get-caller-identity` +2. **Docker is running** — Required for CDK asset bundling. +3. **Build host architecture** — The agent image targets `linux/arm64` (AgentCore is Graviton). On an **x86_64** host without QEMU/binfmt, the deploy fails partway with `exec /bin/sh: exec format error`. Register emulation once with `docker run --privileged --rm tonistiigi/binfmt --install arm64`, or deploy from a native arm64 host (Graviton / Apple Silicon). Skip on arm64 hosts. +4. **AWS credentials are configured** — `aws sts get-caller-identity` (confirm it's the intended account/region). ## Deploy Workflow ```bash export MISE_EXPERIMENTAL=1 -mise //cdk:deploy +mise //cdk:deploy -- --require-approval never ``` +`--require-approval never` lets the deploy run unattended. **In a non-interactive shell (CI, agent, script) it's required** — without it, `cdk deploy` hangs forever on the IAM/security-group approval prompt. Drop the flag if you're deploying interactively and want to review those changes. + After successful deployment, retrieve and display stack outputs: ```bash aws cloudformation describe-stacks --stack-name backgroundagent-dev \ @@ -66,6 +69,8 @@ export MISE_EXPERIMENTAL=1 mise //cdk:destroy ``` +**Teardown can stall in `DELETE_FAILED`** on a security group / private subnet: AgentCore injects service-managed (Hyperplane) ENIs into the VPC, and AWS reclaims them **asynchronously (~20–40 min)** after the runtime is gone. Wait for the ENIs to clear, then retry `mise //cdk:destroy`. Do **not** force-delete past the stuck VPC resources (`--deletion-mode FORCE_DELETE_STACK` / retaining them) — that orphans the VPC, and VPCs are quota-capped per Region. Also note: a first-create failure leaves the stack in `ROLLBACK_COMPLETE`, which can't be updated — destroy and redeploy fresh. + ## Synth Workflow ```bash @@ -78,19 +83,15 @@ Output goes to `cdk/cdk.out/`. Useful for reviewing generated CloudFormation tem ## Post-Deployment After a successful deploy, remind the user to: -- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment -- Onboard repositories via Blueprint constructs if needed -- Run a smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks` +- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment. +- Onboard a repository. `bgagent repo onboard ` is a runtime operation (no redeploy) that works when the repo can use the **platform/default-blueprint** setup — the default GitHub token secret, an already-granted model, and the default egress allowlist. A repo that needs its **own** config — a per-repo GitHub token, a model not yet granted to the runtime, custom egress domains, Cedar HITL policies, or system-prompt overrides — needs a dedicated CDK `Blueprint` construct and a redeploy (with the correct permissions). See the `onboard-repo` skill for both paths. +- **Verify readiness before submitting a task:** `bgagent platform doctor` smoke-checks the API, Cognito, GitHub token, Bedrock model access, and onboarded repos — confirm everything is green first. +- (Lower-level alternative) raw API smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`. -## Least-Privilege Deployment +## Least-Privilege Bootstrap (the default) -By default, CDK bootstrap grants `AdministratorAccess` to the CloudFormation execution role. For production or security-sensitive accounts, re-bootstrap with a scoped execution policy: +`mise //cdk:bootstrap` provisions a **custom least-privilege** CloudFormation execution role by default — NOT `AdministratorAccess` (ADR-002). It deploys `cdk/bootstrap/bootstrap-template.yaml`, which creates scoped `IaCRole-ABCA-*` managed policies (Infrastructure / Application / Observability) generated from `cdk/src/bootstrap/policies/`. -```bash -cdk bootstrap aws://ACCOUNT/REGION \ - --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Infrastructure" \ - --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Application" \ - --cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Observability" -``` +A consequence worth knowing when you **add a new resource type or a new feature on an existing resource**: the scoped role must allow the IAM action CloudFormation will call, or the deploy rolls back with `AccessDenied` on that action (e.g. `s3:PutBucketVersioning`, `lambda:TagResource`). The fix is to add the action to the relevant policy in `cdk/src/bootstrap/policies/`, regenerate (`mise //cdk:bootstrap:generate`), re-bootstrap, and redeploy. The policy source and the `DEPLOYMENT_ROLES.md` golden doc are kept in sync by tests. -See `docs/design/DEPLOYMENT_ROLES.md` in the repo root for the complete least-privilege IAM policies, trust policy, runtime role inventory, and iterative tightening recommendations. +See `docs/design/DEPLOYMENT_ROLES.md` for the complete IAM policies, trust policy, runtime role inventory, and tightening recommendations. diff --git a/docs/abca-plugin/skills/status/SKILL.md b/docs/abca-plugin/skills/status/SKILL.md index 9c13c4b4..3ac24af3 100644 --- a/docs/abca-plugin/skills/status/SKILL.md +++ b/docs/abca-plugin/skills/status/SKILL.md @@ -10,6 +10,8 @@ allowed-tools: Check the current state of the ABCA platform and report a concise status summary. +> **Running the CLI:** the `node cli/lib/bin/bgagent.js …` checks below need `node` on `PATH`; in a non-interactive or mise-managed shell, prefix with `mise exec --`. + ## Checks to Run Run these in parallel where possible: diff --git a/docs/abca-plugin/skills/submit-task/SKILL.md b/docs/abca-plugin/skills/submit-task/SKILL.md index 342685b4..793d78a1 100644 --- a/docs/abca-plugin/skills/submit-task/SKILL.md +++ b/docs/abca-plugin/skills/submit-task/SKILL.md @@ -13,6 +13,8 @@ argument-hint: [description] You are helping the user submit a well-crafted coding task to the ABCA platform. Good prompts are critical — the agent works autonomously without asking clarifying questions. +> **Running the CLI:** examples below call `node cli/lib/bin/bgagent.js …` from the repo root. In a **non-interactive or mise-managed shell** `node` may not be on `PATH` (`command not found`) — prefix with `mise exec --` (e.g. `mise exec -- node cli/lib/bin/bgagent.js submit …`), or use a global `bgagent` if installed. If `cli/lib/bin/bgagent.js` is missing, run `mise run build` first. + **Quick mode:** If the user provided a repo and description inline (e.g. "submit task to owner/repo: fix the login bug"), auto-detect the task type from the description and skip to Step 5. Infer the type: - PR number or "review PR" → `--review-pr` - "iterate on PR" or "fix PR feedback" → `--pr` diff --git a/docs/abca-plugin/skills/troubleshoot/SKILL.md b/docs/abca-plugin/skills/troubleshoot/SKILL.md index 4afc15fa..38d3e058 100644 --- a/docs/abca-plugin/skills/troubleshoot/SKILL.md +++ b/docs/abca-plugin/skills/troubleshoot/SKILL.md @@ -12,6 +12,8 @@ description: >- You are diagnosing an issue with the ABCA platform. Follow a systematic approach: gather symptoms, check the most common causes, and apply targeted fixes. +> **Running the CLI:** commands below call `node cli/lib/bin/bgagent.js …`. In a non-interactive or mise-managed shell `node` may not be on `PATH` — prefix with `mise exec --`. Ironically, `node: command not found` is itself a common symptom (the shell hasn't activated mise); that's a missing prefix, not a broken install. + ## Step 1: Identify the Problem Category Determine which area the issue falls into: @@ -71,8 +73,9 @@ aws cognito-idp admin-get-user \ ## Task Submission Issues (422 / 400) -**"Repository not onboarded" (422):** -- The repo needs a Blueprint construct. Use the `onboard-repo` skill. +**"Repository not onboarded" / `REPO_NOT_ONBOARDED` (422):** +- The repo isn't registered. Fastest fix: `bgagent repo onboard ` (operator path — writes the RepoTable record at runtime, no redeploy). A CDK Blueprint is only needed for declarative config. Use the `onboard-repo` skill for details. +- Also confirm the `owner/repo` matches **exactly** what you pass to `bgagent submit --repo`. **"GUARDRAIL_BLOCKED" (400):** - Task description triggered Bedrock Guardrails content screening @@ -108,8 +111,9 @@ node cli/lib/bin/bgagent.js events --output json - Common: repo build/test commands not documented in CLAUDE.md **403 "not authorized to perform bedrock:InvokeModelWithResponseStream":** -- The Blueprint specifies a model that the runtime IAM role doesn't have permissions for -- Fix: add `grantInvoke` for the model and its cross-region inference profile in `cdk/src/stacks/agent.ts`, then redeploy +- The repo's `model_id` is a model the runtime IAM role wasn't granted. The runtime only has `grantInvoke` for the models in the stack's configured set (Sonnet 4.6, Opus 4, Haiku 4.5 by default). +- **Quick fix:** point the repo at an already-granted model — `bgagent repo onboard --model us.anthropic.claude-sonnet-4-6` (no redeploy). +- **To add a new model to the runtime:** grant it in the stack and redeploy. The model set is the shared list in `cdk/src/constructs/bedrock-models.ts` — add the model via the `bedrockModels` CDK context (`cdk.json`) so both the AgentCore and ECS backends grant it (#433). Adding a model also requires **account-level Bedrock access** for it (separate from IAM — see the next row). **Model not enabled / "not available on your Bedrock deployment" (often immediate failure, few turns, zero or near-zero tokens):** - **IAM is necessary but not sufficient.** The AgentCore role may already have `bedrock:InvokeModel*`, but the **account** must also satisfy [Amazon Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html): Marketplace subscription flow on first serverless use (with `aws-marketplace:Subscribe` / `ViewSubscriptions` where needed), Anthropic **first-time use** details (`PutUseCaseForModelAccess` or the console model catalog), and a valid payment method for Marketplace-backed models.