Skip to content

fix: harden injected sidecars for PSA restricted compliance#411

Open
WentingWu666666 wants to merge 6 commits into
documentdb:mainfrom
WentingWu666666:developer/wentingwu/psa-sidecar-securitycontext
Open

fix: harden injected sidecars for PSA restricted compliance#411
WentingWu666666 wants to merge 6 commits into
documentdb:mainfrom
WentingWu666666:developer/wentingwu/psa-sidecar-securitycontext

Conversation

@WentingWu666666

Copy link
Copy Markdown
Collaborator

Summary

Fixes #387 — the CNPG-I sidecar-injector injects two containers into every DocumentDB cluster pod (documentdb-gateway and otel-collector), but neither carried a SecurityContext that satisfies the Kubernetes Pod Security Admission (PSA) restricted profile:

  • documentdb-gateway set only RunAsUser / RunAsGroup.
  • otel-collector had no SecurityContext at all.

PSA requires allowPrivilegeEscalation: false, capabilities.drop: [ALL], seccompProfile, and runAsNonRoot to be set per container (pod-level inheritance does not satisfy the checks). On namespaces labeled pod-security.kubernetes.io/enforce=restricted (GKE Autopilot, OpenShift, AKS Azure Policy baseline, CIS-benchmark clusters) the API server rejects every cluster pod, so the DocumentDB cluster never comes up — even though the operator reports successful reconciliation.

Changes

  • Add a shared hardenedSecurityContext() helper and apply it to both injected sidecars. It sets, per container:
    • runAsNonRoot: true, runAsUser/runAsGroup: 1000
    • privileged: false, allowPrivilegeEscalation: false
    • capabilities.drop: [ALL]
    • seccompProfile.type: RuntimeDefault
  • Add a unit test asserting the helper carries every PSA-restricted field, so this can't regress silently.

This mirrors how CloudNativePG hardens its own built-in containers (pkg/specs GetSecurityContext) and how the CNPG barman-cloud plugin hardens its injected sidecar.

Design note: readOnlyRootFilesystem

Intentionally omitted. It is not required by the PSA restricted profile (confirmed by the admission error in #387, which flags only allowPrivilegeEscalation, capabilities, and seccompProfile), and the upstream gateway / OTel collector images haven't been verified to run on a read-only root filesystem. It can be added later with an emptyDir scratch mount once validated, without affecting admission compliance.

Testing

  • go build ./..., go vet ./..., gofmt -l — clean
  • go test ./... (sidecar-injector module) — passing, including the new TestHardenedSecurityContext_PSARestricted

Out of scope

wentingwu000 and others added 2 commits June 24, 2026 13:01
The CNPG-I sidecar-injector adds two containers to every DocumentDB
cluster pod (documentdb-gateway and otel-collector). Neither carried a
SecurityContext that meets the Kubernetes Pod Security Admission (PSA)
"restricted" profile: the gateway only set RunAsUser/RunAsGroup and the
OTel collector had no SecurityContext at all.

On namespaces labeled pod-security.kubernetes.io/enforce=restricted
(GKE Autopilot, OpenShift, AKS Azure Policy baseline, CIS-benchmark
clusters), the API server rejects these pods, so the cluster never comes
up even though the operator reports successful reconciliation.

Apply a shared hardenedSecurityContext() helper to both injected
sidecars setting RunAsNonRoot, Privileged=false,
AllowPrivilegeEscalation=false, Capabilities.Drop=[ALL] and
SeccompProfile=RuntimeDefault per-container, mirroring how CloudNativePG
hardens its own containers. readOnlyRootFilesystem is intentionally
omitted (not required by PSA restricted and unverified on the upstream
images). Adds a unit test guarding the required fields.

Fixes documentdb#387

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
The e2e workflow's pull_request path filter omitted
operator/cnpg-plugins/**, so changes to the CNPG-I sidecar-injector —
the code that builds the runtime DocumentDB cluster pods and whose image
the e2e pipeline rebuilds from source — did not trigger the suite.

Add operator/cnpg-plugins/** to the filter so sidecar-injector changes
(such as the PSA securityContext hardening in this PR) are covered.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
@documentdb-triage-tool documentdb-triage-tool Bot added bug Something isn't working CI/CD test labels Jun 24, 2026
@documentdb-triage-tool

Copy link
Copy Markdown

🤖 Auto-triaged by documentdb-triage-tool.

Applied: CI/CD, test, bug
Project fields suggested: Component ci · Priority P2 · Effort M · Status In Progress
Confidence: 0.30 (deterministic)

Reasoning

component from path globs (ci, test); effort from diff stats (80+5 LOC, 3 files); LLM failed: Invalid response body while trying to fetch https://api.anthropic.com/v1/messages: Premature close

If a label is wrong, remove it manually and ping @patty-chow so the rules can be tuned. The bot will not re-label items that already have component labels.

Label every per-spec DocumentDB namespace with
pod-security.kubernetes.io/enforce=restricted (plus warn/audit) in the
shared CreateLabeledNamespace fixture, so the whole e2e suite validates
that runtime cluster pods — the CNPG pods and the gateway / otel-collector
sidecars injected by the CNPG-I plugin — are admitted under the strictest
Pod Security Admission profile. This mirrors the GA target platforms
(GKE Autopilot, OpenShift, AKS Azure Policy baseline) and guards the documentdb#387
regression suite-wide rather than in a single bespoke scenario.

Add an explicit securityContext assertion to the lifecycle deploy smoke
spec so a non-compliant injected container fails with a precise message
naming the container and field, instead of an opaque CNPG pod-creation
admission error.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
@WentingWu666666 WentingWu666666 marked this pull request as ready for review June 24, 2026 19:23
Copilot AI review requested due to automatic review settings June 24, 2026 19:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the CNPG-I sidecar-injector’s injected containers (documentdb-gateway and otel-collector) to satisfy Kubernetes Pod Security Admission (PSA) restricted requirements, and adds regression tests to prevent future drift.

Changes:

  • Introduces a shared hardenedSecurityContext() and applies it to both injected sidecars in the CNPG-I plugin.
  • Adds unit + e2e assertions to verify injected sidecars carry the PSA-restricted-required securityContext fields.
  • Updates the E2E workflow path filters so changes under operator/cnpg-plugins/** trigger the E2E job.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go Applies a PSA-restricted-compliant container security context to both injected sidecars via a shared helper.
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle_test.go Adds a unit test to ensure the hardened security context contains required PSA fields.
test/e2e/pkg/e2eutils/fixtures/fixtures.go Stamps PSA “restricted” namespace labels for per-spec test namespaces to exercise admission in e2e.
test/e2e/tests/lifecycle/deploy_test.go Adds an e2e regression assertion that injected sidecars have the required container securityContext fields.
.github/workflows/test-e2e.yml Ensures E2E runs when CNPG plugin code changes.

Comment on lines 315 to 324
func CreateLabeledNamespace(ctx context.Context, c client.Client, name, area string) error {
labels := ownershipLabels(FixturePerSpec, area)
for k, v := range psaRestrictedLabels() {
labels[k] = v
}
ns := &corev1.Namespace{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Labels: ownershipLabels(FixturePerSpec, area),
Labels: labels,
},
wentingwu000 and others added 3 commits June 24, 2026 15:45
Address code-review feedback on the PSA hardening:

- The shared hardenedSecurityContext() no longer pins RunAsUser/RunAsGroup.
  PSA "restricted" only requires runAsNonRoot=true, not a specific UID.
  The documentdb-gateway still runs as UID/GID 1000 (the user its image
  expects) via a small gatewaySecurityContext() wrapper, while the
  third-party otel-collector image keeps its own baked-in non-root user
  (UID 10001) instead of being forced to 1000.

- Extract the otel-collector container construction into
  newOtelCollectorSidecar() and add injection-layer unit tests asserting
  that BOTH injected sidecars carry the PSA-restricted securityContext
  (gateway with UID 1000, otel-collector without a forced UID). This
  closes the gap where the otel sidecar's wiring was not covered by any
  test — the e2e suite only exercises it when monitoring is enabled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Extract the inline PSA-restricted securityContext check from the
lifecycle deploy spec into a shared assertions helper,
AssertInjectedSidecarsPSARestricted, and wire it into both restore
specs (recovery.backup CSI snapshot and recovery.persistentVolume).

This extends the documentdb#387 regression coverage to backup/restore: a
recovery cluster gets the same CNPG-I-injected sidecars as a fresh
deploy and lands in a restricted-labeled namespace, so its pods must
also carry the hardened context. The helper errors when no injected
sidecar is found, so the assertion cannot pass vacuously.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
fixtures_test.go was committed un-indented (no leading tabs) and was
not gofmt-clean. Earlier additions matched that broken style; this
normalizes the whole file to standard gofmt indentation so the PSA
label assertions and the surrounding tests are correctly indented.

Whitespace-only change (git diff -w is empty); no logic affected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CI/CD test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[GA blocker] Runtime cluster pods are not Pod Security Admission "restricted" compliant

3 participants