Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 17 additions & 16 deletions docs/abca-plugin/skills/deploy/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,21 @@ Before any deployment action, verify:
export MISE_EXPERIMENTAL=1
mise run build
```
This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails.
This runs agent quality checks, CDK compilation + tests, CLI build, and docs build. Do NOT deploy if the build fails. Note: a passing build is noisy — it prints many `ERROR`/`WARN` and cdk-nag lines from test fixtures. Trust the **exit code (0 = pass)**, not the log volume.

2. **Docker is running** — Required for CDK asset bundling
3. **AWS credentials are configured** — `aws sts get-caller-identity`
2. **Docker is running** — Required for CDK asset bundling.
3. **Build host architecture** — The agent image targets `linux/arm64` (AgentCore is Graviton). On an **x86_64** host without QEMU/binfmt, the deploy fails partway with `exec /bin/sh: exec format error`. Register emulation once with `docker run --privileged --rm tonistiigi/binfmt --install arm64`, or deploy from a native arm64 host (Graviton / Apple Silicon). Skip on arm64 hosts.
4. **AWS credentials are configured** — `aws sts get-caller-identity` (confirm it's the intended account/region).

## Deploy Workflow

```bash
export MISE_EXPERIMENTAL=1
mise //cdk:deploy
mise //cdk:deploy -- --require-approval never
```

`--require-approval never` lets the deploy run unattended. **In a non-interactive shell (CI, agent, script) it's required** — without it, `cdk deploy` hangs forever on the IAM/security-group approval prompt. Drop the flag if you're deploying interactively and want to review those changes.

After successful deployment, retrieve and display stack outputs:
```bash
aws cloudformation describe-stacks --stack-name backgroundagent-dev \
Expand All @@ -66,6 +69,8 @@ export MISE_EXPERIMENTAL=1
mise //cdk:destroy
```

**Teardown can stall in `DELETE_FAILED`** on a security group / private subnet: AgentCore injects service-managed (Hyperplane) ENIs into the VPC, and AWS reclaims them **asynchronously (~20–40 min)** after the runtime is gone. Wait for the ENIs to clear, then retry `mise //cdk:destroy`. Do **not** force-delete past the stuck VPC resources (`--deletion-mode FORCE_DELETE_STACK` / retaining them) — that orphans the VPC, and VPCs are quota-capped per Region. Also note: a first-create failure leaves the stack in `ROLLBACK_COMPLETE`, which can't be updated — destroy and redeploy fresh.

## Synth Workflow

```bash
Expand All @@ -78,19 +83,15 @@ Output goes to `cdk/cdk.out/`. Useful for reviewing generated CloudFormation tem
## Post-Deployment

After a successful deploy, remind the user to:
- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment
- Onboard repositories via Blueprint constructs if needed
- Run a smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`
- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment.
- Onboard a repository. `bgagent repo onboard <owner/repo>` is a runtime operation (no redeploy) that works when the repo can use the **platform/default-blueprint** setup — the default GitHub token secret, an already-granted model, and the default egress allowlist. A repo that needs its **own** config — a per-repo GitHub token, a model not yet granted to the runtime, custom egress domains, Cedar HITL policies, or system-prompt overrides — needs a dedicated CDK `Blueprint` construct and a redeploy (with the correct permissions). See the `onboard-repo` skill for both paths.
- **Verify readiness before submitting a task:** `bgagent platform doctor` smoke-checks the API, Cognito, GitHub token, Bedrock model access, and onboarded repos — confirm everything is green first.
- (Lower-level alternative) raw API smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`.

## Least-Privilege Deployment
## Least-Privilege Bootstrap (the default)

By default, CDK bootstrap grants `AdministratorAccess` to the CloudFormation execution role. For production or security-sensitive accounts, re-bootstrap with a scoped execution policy:
`mise //cdk:bootstrap` provisions a **custom least-privilege** CloudFormation execution role by default — NOT `AdministratorAccess` (ADR-002). It deploys `cdk/bootstrap/bootstrap-template.yaml`, which creates scoped `IaCRole-ABCA-*` managed policies (Infrastructure / Application / Observability) generated from `cdk/src/bootstrap/policies/`.

```bash
cdk bootstrap aws://ACCOUNT/REGION \
--cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Infrastructure" \
--cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Application" \
--cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Observability"
```
A consequence worth knowing when you **add a new resource type or a new feature on an existing resource**: the scoped role must allow the IAM action CloudFormation will call, or the deploy rolls back with `AccessDenied` on that action (e.g. `s3:PutBucketVersioning`, `lambda:TagResource`). The fix is to add the action to the relevant policy in `cdk/src/bootstrap/policies/`, regenerate (`mise //cdk:bootstrap:generate`), re-bootstrap, and redeploy. The policy source and the `DEPLOYMENT_ROLES.md` golden doc are kept in sync by tests.

See `docs/design/DEPLOYMENT_ROLES.md` in the repo root for the complete least-privilege IAM policies, trust policy, runtime role inventory, and iterative tightening recommendations.
See `docs/design/DEPLOYMENT_ROLES.md` for the complete IAM policies, trust policy, runtime role inventory, and tightening recommendations.
Comment thread
krokoko marked this conversation as resolved.
2 changes: 2 additions & 0 deletions docs/abca-plugin/skills/status/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ allowed-tools:

Check the current state of the ABCA platform and report a concise status summary.

> **Running the CLI:** the `node cli/lib/bin/bgagent.js …` checks below need `node` on `PATH`; in a non-interactive or mise-managed shell, prefix with `mise exec --`.

## Checks to Run

Run these in parallel where possible:
Expand Down
2 changes: 2 additions & 0 deletions docs/abca-plugin/skills/submit-task/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ argument-hint: <repo> [description]

You are helping the user submit a well-crafted coding task to the ABCA platform. Good prompts are critical — the agent works autonomously without asking clarifying questions.

> **Running the CLI:** examples below call `node cli/lib/bin/bgagent.js …` from the repo root. In a **non-interactive or mise-managed shell** `node` may not be on `PATH` (`command not found`) — prefix with `mise exec --` (e.g. `mise exec -- node cli/lib/bin/bgagent.js submit …`), or use a global `bgagent` if installed. If `cli/lib/bin/bgagent.js` is missing, run `mise run build` first.

**Quick mode:** If the user provided a repo and description inline (e.g. "submit task to owner/repo: fix the login bug"), auto-detect the task type from the description and skip to Step 5. Infer the type:
- PR number or "review PR" → `--review-pr`
- "iterate on PR" or "fix PR feedback" → `--pr`
Expand Down
12 changes: 8 additions & 4 deletions docs/abca-plugin/skills/troubleshoot/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ description: >-

You are diagnosing an issue with the ABCA platform. Follow a systematic approach: gather symptoms, check the most common causes, and apply targeted fixes.

> **Running the CLI:** commands below call `node cli/lib/bin/bgagent.js …`. In a non-interactive or mise-managed shell `node` may not be on `PATH` — prefix with `mise exec --`. Ironically, `node: command not found` is itself a common symptom (the shell hasn't activated mise); that's a missing prefix, not a broken install.

## Step 1: Identify the Problem Category

Determine which area the issue falls into:
Expand Down Expand Up @@ -71,8 +73,9 @@ aws cognito-idp admin-get-user \

## Task Submission Issues (422 / 400)

**"Repository not onboarded" (422):**
- The repo needs a Blueprint construct. Use the `onboard-repo` skill.
**"Repository not onboarded" / `REPO_NOT_ONBOARDED` (422):**
- The repo isn't registered. Fastest fix: `bgagent repo onboard <owner/repo>` (operator path — writes the RepoTable record at runtime, no redeploy). A CDK Blueprint is only needed for declarative config. Use the `onboard-repo` skill for details.
- Also confirm the `owner/repo` matches **exactly** what you pass to `bgagent submit --repo`.

**"GUARDRAIL_BLOCKED" (400):**
- Task description triggered Bedrock Guardrails content screening
Expand Down Expand Up @@ -108,8 +111,9 @@ node cli/lib/bin/bgagent.js events <TASK_ID> --output json
- Common: repo build/test commands not documented in CLAUDE.md

**403 "not authorized to perform bedrock:InvokeModelWithResponseStream":**
- The Blueprint specifies a model that the runtime IAM role doesn't have permissions for
- Fix: add `grantInvoke` for the model and its cross-region inference profile in `cdk/src/stacks/agent.ts`, then redeploy
- The repo's `model_id` is a model the runtime IAM role wasn't granted. The runtime only has `grantInvoke` for the models in the stack's configured set (Sonnet 4.6, Opus 4, Haiku 4.5 by default).
- **Quick fix:** point the repo at an already-granted model — `bgagent repo onboard <owner/repo> --model us.anthropic.claude-sonnet-4-6` (no redeploy).
- **To add a new model to the runtime:** grant it in the stack and redeploy. The model set is the shared list in `cdk/src/constructs/bedrock-models.ts` — add the model via the `bedrockModels` CDK context (`cdk.json`) so both the AgentCore and ECS backends grant it (#433). Adding a model also requires **account-level Bedrock access** for it (separate from IAM — see the next row).

**Model not enabled / "not available on your Bedrock deployment" (often immediate failure, few turns, zero or near-zero tokens):**
- **IAM is necessary but not sufficient.** The AgentCore role may already have `bedrock:InvokeModel*`, but the **account** must also satisfy [Amazon Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html): Marketplace subscription flow on first serverless use (with `aws-marketplace:Subscribe` / `ViewSubscriptions` where needed), Anthropic **first-time use** details (`PutUseCaseForModelAccess` or the console model catalog), and a valid payment method for Marketplace-backed models.
Expand Down
Loading