fix: harden injected sidecars for PSA restricted compliance#411
fix: harden injected sidecars for PSA restricted compliance#411WentingWu666666 wants to merge 6 commits into
Conversation
The CNPG-I sidecar-injector adds two containers to every DocumentDB cluster pod (documentdb-gateway and otel-collector). Neither carried a SecurityContext that meets the Kubernetes Pod Security Admission (PSA) "restricted" profile: the gateway only set RunAsUser/RunAsGroup and the OTel collector had no SecurityContext at all. On namespaces labeled pod-security.kubernetes.io/enforce=restricted (GKE Autopilot, OpenShift, AKS Azure Policy baseline, CIS-benchmark clusters), the API server rejects these pods, so the cluster never comes up even though the operator reports successful reconciliation. Apply a shared hardenedSecurityContext() helper to both injected sidecars setting RunAsNonRoot, Privileged=false, AllowPrivilegeEscalation=false, Capabilities.Drop=[ALL] and SeccompProfile=RuntimeDefault per-container, mirroring how CloudNativePG hardens its own containers. readOnlyRootFilesystem is intentionally omitted (not required by PSA restricted and unverified on the upstream images). Adds a unit test guarding the required fields. Fixes documentdb#387 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
The e2e workflow's pull_request path filter omitted operator/cnpg-plugins/**, so changes to the CNPG-I sidecar-injector — the code that builds the runtime DocumentDB cluster pods and whose image the e2e pipeline rebuilds from source — did not trigger the suite. Add operator/cnpg-plugins/** to the filter so sidecar-injector changes (such as the PSA securityContext hardening in this PR) are covered. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
|
🤖 Auto-triaged by documentdb-triage-tool. Applied: Reasoningcomponent from path globs (ci, test); effort from diff stats (80+5 LOC, 3 files); LLM failed: Invalid response body while trying to fetch https://api.anthropic.com/v1/messages: Premature close If a label is wrong, remove it manually and ping |
Label every per-spec DocumentDB namespace with pod-security.kubernetes.io/enforce=restricted (plus warn/audit) in the shared CreateLabeledNamespace fixture, so the whole e2e suite validates that runtime cluster pods — the CNPG pods and the gateway / otel-collector sidecars injected by the CNPG-I plugin — are admitted under the strictest Pod Security Admission profile. This mirrors the GA target platforms (GKE Autopilot, OpenShift, AKS Azure Policy baseline) and guards the documentdb#387 regression suite-wide rather than in a single bespoke scenario. Add an explicit securityContext assertion to the lifecycle deploy smoke spec so a non-compliant injected container fails with a precise message naming the container and field, instead of an opaque CNPG pod-creation admission error. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the CNPG-I sidecar-injector’s injected containers (documentdb-gateway and otel-collector) to satisfy Kubernetes Pod Security Admission (PSA) restricted requirements, and adds regression tests to prevent future drift.
Changes:
- Introduces a shared
hardenedSecurityContext()and applies it to both injected sidecars in the CNPG-I plugin. - Adds unit + e2e assertions to verify injected sidecars carry the PSA-restricted-required
securityContextfields. - Updates the E2E workflow path filters so changes under
operator/cnpg-plugins/**trigger the E2E job.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go |
Applies a PSA-restricted-compliant container security context to both injected sidecars via a shared helper. |
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle_test.go |
Adds a unit test to ensure the hardened security context contains required PSA fields. |
test/e2e/pkg/e2eutils/fixtures/fixtures.go |
Stamps PSA “restricted” namespace labels for per-spec test namespaces to exercise admission in e2e. |
test/e2e/tests/lifecycle/deploy_test.go |
Adds an e2e regression assertion that injected sidecars have the required container securityContext fields. |
.github/workflows/test-e2e.yml |
Ensures E2E runs when CNPG plugin code changes. |
| func CreateLabeledNamespace(ctx context.Context, c client.Client, name, area string) error { | ||
| labels := ownershipLabels(FixturePerSpec, area) | ||
| for k, v := range psaRestrictedLabels() { | ||
| labels[k] = v | ||
| } | ||
| ns := &corev1.Namespace{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: name, | ||
| Labels: ownershipLabels(FixturePerSpec, area), | ||
| Labels: labels, | ||
| }, |
Address code-review feedback on the PSA hardening: - The shared hardenedSecurityContext() no longer pins RunAsUser/RunAsGroup. PSA "restricted" only requires runAsNonRoot=true, not a specific UID. The documentdb-gateway still runs as UID/GID 1000 (the user its image expects) via a small gatewaySecurityContext() wrapper, while the third-party otel-collector image keeps its own baked-in non-root user (UID 10001) instead of being forced to 1000. - Extract the otel-collector container construction into newOtelCollectorSidecar() and add injection-layer unit tests asserting that BOTH injected sidecars carry the PSA-restricted securityContext (gateway with UID 1000, otel-collector without a forced UID). This closes the gap where the otel sidecar's wiring was not covered by any test — the e2e suite only exercises it when monitoring is enabled. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Extract the inline PSA-restricted securityContext check from the lifecycle deploy spec into a shared assertions helper, AssertInjectedSidecarsPSARestricted, and wire it into both restore specs (recovery.backup CSI snapshot and recovery.persistentVolume). This extends the documentdb#387 regression coverage to backup/restore: a recovery cluster gets the same CNPG-I-injected sidecars as a fresh deploy and lands in a restricted-labeled namespace, so its pods must also carry the hardened context. The helper errors when no injected sidecar is found, so the assertion cannot pass vacuously. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
fixtures_test.go was committed un-indented (no leading tabs) and was not gofmt-clean. Earlier additions matched that broken style; this normalizes the whole file to standard gofmt indentation so the PSA label assertions and the surrounding tests are correctly indented. Whitespace-only change (git diff -w is empty); no logic affected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Summary
Fixes #387 — the CNPG-I sidecar-injector injects two containers into every DocumentDB cluster pod (
documentdb-gatewayandotel-collector), but neither carried aSecurityContextthat satisfies the Kubernetes Pod Security Admission (PSA)restrictedprofile:documentdb-gatewayset onlyRunAsUser/RunAsGroup.otel-collectorhad noSecurityContextat all.PSA requires
allowPrivilegeEscalation: false,capabilities.drop: [ALL],seccompProfile, andrunAsNonRootto be set per container (pod-level inheritance does not satisfy the checks). On namespaces labeledpod-security.kubernetes.io/enforce=restricted(GKE Autopilot, OpenShift, AKS Azure Policy baseline, CIS-benchmark clusters) the API server rejects every cluster pod, so the DocumentDB cluster never comes up — even though the operator reports successful reconciliation.Changes
hardenedSecurityContext()helper and apply it to both injected sidecars. It sets, per container:runAsNonRoot: true,runAsUser/runAsGroup: 1000privileged: false,allowPrivilegeEscalation: falsecapabilities.drop: [ALL]seccompProfile.type: RuntimeDefaultrestrictedfield, so this can't regress silently.This mirrors how CloudNativePG hardens its own built-in containers (
pkg/specsGetSecurityContext) and how the CNPG barman-cloud plugin hardens its injected sidecar.Design note:
readOnlyRootFilesystemIntentionally omitted. It is not required by the PSA
restrictedprofile (confirmed by the admission error in #387, which flags onlyallowPrivilegeEscalation,capabilities, andseccompProfile), and the upstream gateway / OTel collector images haven't been verified to run on a read-only root filesystem. It can be added later with anemptyDirscratch mount once validated, without affecting admission compliance.Testing
go build ./...,go vet ./...,gofmt -l— cleango test ./...(sidecar-injector module) — passing, including the newTestHardenedSecurityContext_PSARestrictedOut of scope
spec.gatewaySecurityContext/spec.otelCollectorSecurityContext(can follow up if desired)