Skip to content

feat(resources): isolate gateway and OTel collector memory from PostgreSQL#409

Open
xgerman wants to merge 4 commits into
documentdb:mainfrom
xgerman:feat/sidecar-resource-isolation
Open

feat(resources): isolate gateway and OTel collector memory from PostgreSQL#409
xgerman wants to merge 4 commits into
documentdb:mainfrom
xgerman:feat/sidecar-resource-isolation

Conversation

@xgerman

@xgerman xgerman commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

What

Treats spec.resource.memory as the total pod memory envelope and carves sidecar memory out of it, so a gateway or OTel-collector memory leak is OOM-isolated to its own container instead of crowding out PostgreSQL.

For a pod with spec.resource.memory: 16Gi:

Container Memory (request == limit) Source
documentdb-gateway min(18.75% × envelope, 32Gi) = 3Gi carved out (Guaranteed)
otel-collector (when monitoring.enabled) request 48Mi / limit 128Mi carved out
postgres envelope − gateway − otel = ~13Gi remainder

PostgreSQL's memory-aware GUCs (shared_buffers, etc.) are recomputed from the carved value, not the full envelope.

Why

The gateway sidecar (and the OTel collector) previously ran BestEffort (no limits). The gateway's async runtime also spawns one worker thread per visible CPU, so on large nodes a leak or thread blow-up could create pod/node memory pressure that starves the co-located PostgreSQL. Giving each sidecar a bounded, isolated footprint contains that blast radius. Defaults follow the production sizing model (18.75% of the envelope, capped at 32Gi).

What's included

  • API: spec.resource.{gateway,database,otel} ComponentResources overrides (explicit wins over the auto-split).
  • Carve-out: ComputeResourceSplit (pure, unit-tested against production SKU rows) wired into the CNPG cluster spec; PostgreSQL GUCs recomputed from the carved memory; sidecar resources reconciled on the existing sync path (so existing clusters re-split on next reconcile).
  • Plugin: the sidecar-injector sets Resources on the gateway and otel containers and adds GOMEMLIMIT (≈80% of the otel limit).
  • OTel: a memory_limiter processor (cgroup-aware limit_percentage: 80) is added first in the pipeline.
  • Helm: operator-level defaults (operator.sidecarResources.*) wired to env vars; helm-unittest coverage.
  • Docs: CHANGELOG behavioral-change entry + user docs; design doc under docs/designs/.

Configuration

spec:
  resource:
    memory: "16Gi"          # total pod envelope
    gateway:  { memory: "3Gi", cpu: "1" }      # optional explicit override
    otel:     { memory: "128Mi" }              # optional
    database: { memory: "12Gi" }               # optional

Fleet-wide defaults via Helm: operator.sidecarResources.{gatewayMemoryFraction,gatewayMemoryCap,otelMemoryRequest,otelMemoryLimit,otelCpuRequest,gatewayCpuLimit}.

Rollout

Ships default-on. Existing clusters adopt the split — and take a one-time rolling restart to recompute PostgreSQL memory — on their next reconcile. Called out in the CHANGELOG.

Testing

  • Unit: split math against production rows (M20/M50/M60/M200-capped, monitoring carve-out, overrides), plugin param round-trip + GOMEMLIMIT, OTel memory_limiter config. go test ./... (operator + plugin) green.
  • Helm: helm unittest (83 passing) incl. the new operator env assertions.
  • E2E (new tests/resources area): stood up a kind cluster with the images built from this branch and asserted the live container split:
    • monitoring off → gateway 192Mi, postgres 832Mi, no collector
    • monitoring on → gateway 192Mi, otel 48Mi/128Mi + GOMEMLIMIT, postgres 704Mi
    • Result: 2/2 specs passed.

Notes

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

German Eichberger and others added 3 commits June 23, 2026 16:10
…tants, split)

Introduce the contract for splitting the pod memory envelope across the
PostgreSQL, gateway, and OTel collector containers:
- spec.resource.{gateway,database,otel} ComponentResources overrides
- operator-level env/defaults (18.75% gateway fraction, 32Gi cap, otel 48Mi/128Mi)
- ComputeResourceSplit pure function + production-row unit tests

Part of the sidecar resource isolation work (Phase 0).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: German Eichberger <geeichbe@microsoft.com>
…reSQL

Treat spec.resource.memory as the total pod envelope and carve sidecar
memory out of it so a gateway or collector leak is OOM-isolated to its own
container instead of crowding out PostgreSQL:

- gateway: requests==limits == min(18.75% x memory, 32Gi) (production model)
- otel collector (when monitoring enabled): request 48Mi / limit 128Mi, plus
  a memory_limiter processor (limit_percentage 80) and GOMEMLIMIT (80% of limit)
- PostgreSQL gets the remainder; memory-aware GUCs recomputed from it

All three components are independently overridable via
spec.resource.{gateway,database,otel}; fleet defaults via operator Helm values
(gatewayMemoryFraction, gatewayMemoryCap, otel*). Sidecar resources are plumbed
through the sidecar-injector plugin and reconciled on the existing sync path, so
existing clusters adopt the split (one rolling restart) on next reconcile.

Adds unit tests (split math against production rows, plugin params, otel config)
and an e2e 'resources' area asserting the live container split.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: German Eichberger <geeichbe@microsoft.com>
The OTel collector requires a valid pipeline (an exporter) to start; without
one it crash-loops and the cluster never becomes healthy. Configure a
prometheus exporter on port 9464 (avoiding CNPG's built-in metrics port 9187)
so the monitoring-enabled e2e spec exercises a realistic, healthy three-way
container split.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: German Eichberger <geeichbe@microsoft.com>
Copilot AI review requested due to automatic review settings June 23, 2026 23:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces sidecar resource isolation for DocumentDB pods by treating spec.resource.memory as a total pod memory envelope, carving out bounded memory for the gateway and (optionally) the OTel collector, and assigning the remainder to PostgreSQL so that leaks are OOM-contained to the leaking sidecar container and PostgreSQL tuning is computed from the carved database memory.

Changes:

  • Add a resource-splitting model (ComputeResourceSplit) plus new CRD/API fields spec.resource.{gateway,database,otel} to override per-container sizing.
  • Plumb the resolved resources into the CNPG sidecar-injector plugin (gateway + OTel), including GOMEMLIMIT for the collector and an OTel memory_limiter processor.
  • Add Helm defaults/env wiring, unit + helm-unittest coverage, and new E2E “resources” area tests validating the live container split.

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/e2e/tests/resources/sidecar_resources_test.go New E2E specs asserting live gateway/otel/postgres memory split with monitoring on/off.
test/e2e/tests/resources/resources_suite_test.go New Ginkgo suite bootstrap for the E2E “resources” area.
test/e2e/tests/resources/helpers_test.go Area helpers to create clusters from templates and inspect pod container resources/env.
test/e2e/manifests/mixins/sidecar_resources.yaml.template New manifest mixin to set envelope memory + monitoring exporter for resources tests.
test/e2e/labels.go Adds the new resources area label.
operator/src/internal/utils/constants.go Adds env var names and plugin param keys for sidecar resource isolation defaults.
operator/src/internal/otel/config.go Wires memory_limiter into the metrics pipeline processor order.
operator/src/internal/otel/config_test.go Validates embedded memory_limiter config and pipeline processor ordering.
operator/src/internal/otel/base_config.yaml Adds memory_limiter processor to the static collector config.
operator/src/internal/cnpg/resource_split.go New split algorithm + env-driven defaults for gateway/otel carve-out and DB remainder.
operator/src/internal/cnpg/resource_split_test.go Unit tests validating split math, monitoring carve-out, and override precedence.
operator/src/internal/cnpg/cnpg_sync.go Syncs additional plugin parameters for sidecar resources (add/update/remove).
operator/src/internal/cnpg/cnpg_cluster.go Applies carved postgres resources, passes sidecar resource params to plugin, recomputes PG GUCs from carved DB memory.
operator/src/internal/cnpg/cnpg_cluster_test.go Adds tests asserting carved resources + plugin params + recomputed PG parameters.
operator/src/config/crd/bases/documentdb.io_dbs.yaml CRD schema updates adding resource.gateway/database/otel component resource stanzas.
operator/src/api/preview/zz_generated.deepcopy.go Generated deepcopy updates for new API fields.
operator/src/api/preview/documentdb_types.go Adds API types/docs for per-component resource overrides.
operator/documentdb-helm-chart/values.yaml Adds operator-level sidecarResources defaults.
operator/documentdb-helm-chart/tests/09_operator_deployment_test.yaml Adds helm-unittest coverage for default sidecar env var wiring (partial).
operator/documentdb-helm-chart/templates/09_documentdb_operator.yaml Wires Helm values into operator env vars for split defaults.
operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml Helm chart CRD copy updated to match generated CRD schema.
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go Sets container Resources for gateway/otel and adds GOMEMLIMIT for collector.
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle_test.go Adds unit test verifying injected resources + GOMEMLIMIT from plugin params.
operator/cnpg-plugins/sidecar-injector/internal/config/config.go Adds parsing/validation/serialization of new resource plugin parameters.
operator/cnpg-plugins/sidecar-injector/internal/config/config_test.go Adds tests for parameter parsing and round-trip serialization of new fields.
docs/operator-public-documentation/postgresql-tuning.md Documents envelope semantics, sidecar carve-out, overrides, and Helm defaults.
docs/designs/vertical-autoscaling-design.md New design doc covering vertical autoscaling roadmap and the carve-out as phase 0.
CHANGELOG.md Adds behavioral-change note describing the new default-on split and restart implications.
Files not reviewed (1)
  • operator/src/api/preview/zz_generated.deepcopy.go: Generated file

Comment on lines +280 to +286
// Memory and CPU above describe the TOTAL pod envelope. When the gateway
// (and, with monitoring enabled, the OTel collector) sidecars run in the
// same pod, the operator carves their memory out of this envelope and gives
// PostgreSQL the remainder, recomputing PostgreSQL's memory-aware parameters
// from that reduced value. The optional per-component overrides below let you
// size each container independently; an explicit override always wins over the
// automatic carve-out.
Comment on lines +29 to 31
memory: "8Gi" # Total pod memory envelope
cpu: "4" # Pod CPU limit (Guaranteed QoS)
```
Comment on lines +212 to +226
func parseFloatOr(value, fallback string) float64 {
s := value
if s == "" {
s = fallback
}
f, err := strconv.ParseFloat(s, 64)
if err != nil || f <= 0 {
// Re-parse the fallback as a last resort.
if ff, ferr := strconv.ParseFloat(fallback, 64); ferr == nil {
return ff
}
return 0
}
return f
}
Comment on lines +58 to +69
func DefaultSplitConfig() SplitConfig {
frac := parseFloatOr(os.Getenv(util.GATEWAY_MEMORY_FRACTION_ENV), util.DEFAULT_GATEWAY_MEMORY_FRACTION)
capBytes := parseQuantityOrZero(envOr(util.GATEWAY_MEMORY_CAP_ENV, util.DEFAULT_GATEWAY_MEMORY_CAP))
return SplitConfig{
GatewayMemoryFraction: frac,
GatewayMemoryCapBytes: capBytes,
GatewayCPULimit: os.Getenv(util.GATEWAY_CPU_LIMIT_ENV),
OTelMemoryRequest: envOr(util.OTEL_MEMORY_REQUEST_ENV, util.DEFAULT_OTEL_MEMORY_REQUEST),
OTelMemoryLimit: envOr(util.OTEL_MEMORY_LIMIT_ENV, util.DEFAULT_OTEL_MEMORY_LIMIT),
OTelCPURequest: envOr(util.OTEL_CPU_REQUEST_ENV, util.DEFAULT_OTEL_CPU_REQUEST),
}
}
Comment on lines 218 to +226
result[gatewayImagePullPolicyParameter] = string(config.GatewayImagePullPolicy)
result[gatewayMemoryRequestParameter] = config.GatewayMemoryRequest
result[gatewayMemoryLimitParameter] = config.GatewayMemoryLimit
result[gatewayCPURequestParameter] = config.GatewayCPURequest
result[gatewayCPULimitParameter] = config.GatewayCPULimit
result[documentDbCredentialSecretParameter] = config.DocumentDbCredentialSecret
result[otelMemoryRequestParameter] = config.OTelMemoryRequest
result[otelMemoryLimitParameter] = config.OTelMemoryLimit
result[otelCPURequestParameter] = config.OTelCPURequest
Comment on lines +130 to +147
- it: should set default sidecar resource isolation env vars
asserts:
- contains:
path: spec.template.spec.containers[0].env
content:
name: DOCUMENTDB_GATEWAY_MEMORY_FRACTION
value: "0.1875"
- contains:
path: spec.template.spec.containers[0].env
content:
name: DOCUMENTDB_GATEWAY_MEMORY_CAP
value: "32Gi"
- contains:
path: spec.template.spec.containers[0].env
content:
name: DOCUMENTDB_OTEL_MEMORY_LIMIT
value: "128Mi"

@documentdb-triage-tool

Copy link
Copy Markdown

🤖 Auto-triaged by documentdb-triage-tool.

Applied: go, test, documentation, enhancement
Project fields suggested: Component controllers · Priority P2 · Effort XL · Status Needs Review
Confidence: 0.88 (mixed)

Reasoning

component from path globs (controllers, test, docs, api, manifests); effort from diff stats (2054+132 LOC, 28 files); LLM: Cross-component feature (API types, controller reconciliation logic, sidecar injector plugin, Helm chart, OTel config, and docs) that introduces a new resource-splitting model with schema changes and behavioral impact on all existing clusters.

If a label is wrong, remove it manually and ping @patty-chow so the rules can be tuned. The bot will not re-label items that already have component labels.

@documentdb-triage-tool documentdb-triage-tool Bot added documentation Improvements or additions to documentation enhancement New feature or request go Pull requests that update go code test labels Jun 24, 2026
… containers)

Treat spec.resource.memory and spec.resource.cpu as an OPTIONAL pod envelope.
For each dimension:
- envelope set  -> operator carves it (sidecars reserved, PostgreSQL remainder)
- envelope omitted, gateway+database both set -> effective envelope = their sum
- envelope omitted, nothing set -> dimension left unmanaged (no limits)
- envelope omitted, partially specified -> rejected by the validating webhook

CPU is now carved symmetrically with memory (sidecar cpu reservations are
subtracted from the envelope so resolved container CPUs sum to the envelope
instead of exceeding it).

Adds cnpg.ValidateResources (wired into the DocumentDB validating webhook) which
rejects partial-without-envelope configs and configurations where sidecar
reservations leave nothing for PostgreSQL or explicit values exceed the envelope.
Adds unit tests (split + validation), webhook tests, and an e2e spec asserting
the envelope is derived from per-container memory when omitted.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: German Eichberger <geeichbe@microsoft.com>
@xgerman

xgerman commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Update: the pod resource envelope is now optional

Refined the model so the envelope (spec.resource.memory / spec.resource.cpu) no longer has to be specified — it can be derived as the sum of the per-container values. For each dimension independently:

Configuration Behavior
Envelope set Operator carves it (sidecars reserved, PostgreSQL gets the remainder)
Envelope omitted, gateway + database both set Effective envelope = sum of the containers (OTel uses its default)
Envelope omitted, nothing set Dimension left unmanaged (no limits) — backward compatible
Envelope omitted, partially specified Rejected by the validating webhook (gateway fraction / PG remainder can't be derived)
Envelope set but reservations exhaust it, or explicit values exceed it Rejected by the validating webhook

Also made CPU carve symmetrically with memory (sidecar CPU reservations are subtracted from the envelope so the resolved container CPUs sum to the envelope rather than exceeding it).

New cnpg.ValidateResources is wired into the DocumentDB validating webhook.

Tests added/updated: split + validation unit tests, webhook admission tests (accept/reject paths), and a new e2e spec asserting the envelope is derived from per-container memory when omitted. The full tests/resources e2e area (3/3 specs) passes against images built from this branch (incl. the new envelope-omitted case: gateway 256Mi + database 1Gi explicit, no top-level envelope → containers sized exactly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request go Pull requests that update go code test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants