feat(resources): isolate gateway and OTel collector memory from PostgreSQL#409
feat(resources): isolate gateway and OTel collector memory from PostgreSQL#409xgerman wants to merge 4 commits into
Conversation
…tants, split)
Introduce the contract for splitting the pod memory envelope across the
PostgreSQL, gateway, and OTel collector containers:
- spec.resource.{gateway,database,otel} ComponentResources overrides
- operator-level env/defaults (18.75% gateway fraction, 32Gi cap, otel 48Mi/128Mi)
- ComputeResourceSplit pure function + production-row unit tests
Part of the sidecar resource isolation work (Phase 0).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: German Eichberger <geeichbe@microsoft.com>
…reSQL
Treat spec.resource.memory as the total pod envelope and carve sidecar
memory out of it so a gateway or collector leak is OOM-isolated to its own
container instead of crowding out PostgreSQL:
- gateway: requests==limits == min(18.75% x memory, 32Gi) (production model)
- otel collector (when monitoring enabled): request 48Mi / limit 128Mi, plus
a memory_limiter processor (limit_percentage 80) and GOMEMLIMIT (80% of limit)
- PostgreSQL gets the remainder; memory-aware GUCs recomputed from it
All three components are independently overridable via
spec.resource.{gateway,database,otel}; fleet defaults via operator Helm values
(gatewayMemoryFraction, gatewayMemoryCap, otel*). Sidecar resources are plumbed
through the sidecar-injector plugin and reconciled on the existing sync path, so
existing clusters adopt the split (one rolling restart) on next reconcile.
Adds unit tests (split math against production rows, plugin params, otel config)
and an e2e 'resources' area asserting the live container split.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: German Eichberger <geeichbe@microsoft.com>
The OTel collector requires a valid pipeline (an exporter) to start; without one it crash-loops and the cluster never becomes healthy. Configure a prometheus exporter on port 9464 (avoiding CNPG's built-in metrics port 9187) so the monitoring-enabled e2e spec exercises a realistic, healthy three-way container split. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: German Eichberger <geeichbe@microsoft.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces sidecar resource isolation for DocumentDB pods by treating spec.resource.memory as a total pod memory envelope, carving out bounded memory for the gateway and (optionally) the OTel collector, and assigning the remainder to PostgreSQL so that leaks are OOM-contained to the leaking sidecar container and PostgreSQL tuning is computed from the carved database memory.
Changes:
- Add a resource-splitting model (
ComputeResourceSplit) plus new CRD/API fieldsspec.resource.{gateway,database,otel}to override per-container sizing. - Plumb the resolved resources into the CNPG sidecar-injector plugin (gateway + OTel), including
GOMEMLIMITfor the collector and an OTelmemory_limiterprocessor. - Add Helm defaults/env wiring, unit + helm-unittest coverage, and new E2E “resources” area tests validating the live container split.
Reviewed changes
Copilot reviewed 27 out of 28 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/tests/resources/sidecar_resources_test.go | New E2E specs asserting live gateway/otel/postgres memory split with monitoring on/off. |
| test/e2e/tests/resources/resources_suite_test.go | New Ginkgo suite bootstrap for the E2E “resources” area. |
| test/e2e/tests/resources/helpers_test.go | Area helpers to create clusters from templates and inspect pod container resources/env. |
| test/e2e/manifests/mixins/sidecar_resources.yaml.template | New manifest mixin to set envelope memory + monitoring exporter for resources tests. |
| test/e2e/labels.go | Adds the new resources area label. |
| operator/src/internal/utils/constants.go | Adds env var names and plugin param keys for sidecar resource isolation defaults. |
| operator/src/internal/otel/config.go | Wires memory_limiter into the metrics pipeline processor order. |
| operator/src/internal/otel/config_test.go | Validates embedded memory_limiter config and pipeline processor ordering. |
| operator/src/internal/otel/base_config.yaml | Adds memory_limiter processor to the static collector config. |
| operator/src/internal/cnpg/resource_split.go | New split algorithm + env-driven defaults for gateway/otel carve-out and DB remainder. |
| operator/src/internal/cnpg/resource_split_test.go | Unit tests validating split math, monitoring carve-out, and override precedence. |
| operator/src/internal/cnpg/cnpg_sync.go | Syncs additional plugin parameters for sidecar resources (add/update/remove). |
| operator/src/internal/cnpg/cnpg_cluster.go | Applies carved postgres resources, passes sidecar resource params to plugin, recomputes PG GUCs from carved DB memory. |
| operator/src/internal/cnpg/cnpg_cluster_test.go | Adds tests asserting carved resources + plugin params + recomputed PG parameters. |
| operator/src/config/crd/bases/documentdb.io_dbs.yaml | CRD schema updates adding resource.gateway/database/otel component resource stanzas. |
| operator/src/api/preview/zz_generated.deepcopy.go | Generated deepcopy updates for new API fields. |
| operator/src/api/preview/documentdb_types.go | Adds API types/docs for per-component resource overrides. |
| operator/documentdb-helm-chart/values.yaml | Adds operator-level sidecarResources defaults. |
| operator/documentdb-helm-chart/tests/09_operator_deployment_test.yaml | Adds helm-unittest coverage for default sidecar env var wiring (partial). |
| operator/documentdb-helm-chart/templates/09_documentdb_operator.yaml | Wires Helm values into operator env vars for split defaults. |
| operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml | Helm chart CRD copy updated to match generated CRD schema. |
| operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go | Sets container Resources for gateway/otel and adds GOMEMLIMIT for collector. |
| operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle_test.go | Adds unit test verifying injected resources + GOMEMLIMIT from plugin params. |
| operator/cnpg-plugins/sidecar-injector/internal/config/config.go | Adds parsing/validation/serialization of new resource plugin parameters. |
| operator/cnpg-plugins/sidecar-injector/internal/config/config_test.go | Adds tests for parameter parsing and round-trip serialization of new fields. |
| docs/operator-public-documentation/postgresql-tuning.md | Documents envelope semantics, sidecar carve-out, overrides, and Helm defaults. |
| docs/designs/vertical-autoscaling-design.md | New design doc covering vertical autoscaling roadmap and the carve-out as phase 0. |
| CHANGELOG.md | Adds behavioral-change note describing the new default-on split and restart implications. |
Files not reviewed (1)
- operator/src/api/preview/zz_generated.deepcopy.go: Generated file
| // Memory and CPU above describe the TOTAL pod envelope. When the gateway | ||
| // (and, with monitoring enabled, the OTel collector) sidecars run in the | ||
| // same pod, the operator carves their memory out of this envelope and gives | ||
| // PostgreSQL the remainder, recomputing PostgreSQL's memory-aware parameters | ||
| // from that reduced value. The optional per-component overrides below let you | ||
| // size each container independently; an explicit override always wins over the | ||
| // automatic carve-out. |
| memory: "8Gi" # Total pod memory envelope | ||
| cpu: "4" # Pod CPU limit (Guaranteed QoS) | ||
| ``` |
| func parseFloatOr(value, fallback string) float64 { | ||
| s := value | ||
| if s == "" { | ||
| s = fallback | ||
| } | ||
| f, err := strconv.ParseFloat(s, 64) | ||
| if err != nil || f <= 0 { | ||
| // Re-parse the fallback as a last resort. | ||
| if ff, ferr := strconv.ParseFloat(fallback, 64); ferr == nil { | ||
| return ff | ||
| } | ||
| return 0 | ||
| } | ||
| return f | ||
| } |
| func DefaultSplitConfig() SplitConfig { | ||
| frac := parseFloatOr(os.Getenv(util.GATEWAY_MEMORY_FRACTION_ENV), util.DEFAULT_GATEWAY_MEMORY_FRACTION) | ||
| capBytes := parseQuantityOrZero(envOr(util.GATEWAY_MEMORY_CAP_ENV, util.DEFAULT_GATEWAY_MEMORY_CAP)) | ||
| return SplitConfig{ | ||
| GatewayMemoryFraction: frac, | ||
| GatewayMemoryCapBytes: capBytes, | ||
| GatewayCPULimit: os.Getenv(util.GATEWAY_CPU_LIMIT_ENV), | ||
| OTelMemoryRequest: envOr(util.OTEL_MEMORY_REQUEST_ENV, util.DEFAULT_OTEL_MEMORY_REQUEST), | ||
| OTelMemoryLimit: envOr(util.OTEL_MEMORY_LIMIT_ENV, util.DEFAULT_OTEL_MEMORY_LIMIT), | ||
| OTelCPURequest: envOr(util.OTEL_CPU_REQUEST_ENV, util.DEFAULT_OTEL_CPU_REQUEST), | ||
| } | ||
| } |
| result[gatewayImagePullPolicyParameter] = string(config.GatewayImagePullPolicy) | ||
| result[gatewayMemoryRequestParameter] = config.GatewayMemoryRequest | ||
| result[gatewayMemoryLimitParameter] = config.GatewayMemoryLimit | ||
| result[gatewayCPURequestParameter] = config.GatewayCPURequest | ||
| result[gatewayCPULimitParameter] = config.GatewayCPULimit | ||
| result[documentDbCredentialSecretParameter] = config.DocumentDbCredentialSecret | ||
| result[otelMemoryRequestParameter] = config.OTelMemoryRequest | ||
| result[otelMemoryLimitParameter] = config.OTelMemoryLimit | ||
| result[otelCPURequestParameter] = config.OTelCPURequest |
| - it: should set default sidecar resource isolation env vars | ||
| asserts: | ||
| - contains: | ||
| path: spec.template.spec.containers[0].env | ||
| content: | ||
| name: DOCUMENTDB_GATEWAY_MEMORY_FRACTION | ||
| value: "0.1875" | ||
| - contains: | ||
| path: spec.template.spec.containers[0].env | ||
| content: | ||
| name: DOCUMENTDB_GATEWAY_MEMORY_CAP | ||
| value: "32Gi" | ||
| - contains: | ||
| path: spec.template.spec.containers[0].env | ||
| content: | ||
| name: DOCUMENTDB_OTEL_MEMORY_LIMIT | ||
| value: "128Mi" | ||
|
|
|
🤖 Auto-triaged by documentdb-triage-tool. Applied: Reasoningcomponent from path globs (controllers, test, docs, api, manifests); effort from diff stats (2054+132 LOC, 28 files); LLM: Cross-component feature (API types, controller reconciliation logic, sidecar injector plugin, Helm chart, OTel config, and docs) that introduces a new resource-splitting model with schema changes and behavioral impact on all existing clusters. If a label is wrong, remove it manually and ping |
… containers) Treat spec.resource.memory and spec.resource.cpu as an OPTIONAL pod envelope. For each dimension: - envelope set -> operator carves it (sidecars reserved, PostgreSQL remainder) - envelope omitted, gateway+database both set -> effective envelope = their sum - envelope omitted, nothing set -> dimension left unmanaged (no limits) - envelope omitted, partially specified -> rejected by the validating webhook CPU is now carved symmetrically with memory (sidecar cpu reservations are subtracted from the envelope so resolved container CPUs sum to the envelope instead of exceeding it). Adds cnpg.ValidateResources (wired into the DocumentDB validating webhook) which rejects partial-without-envelope configs and configurations where sidecar reservations leave nothing for PostgreSQL or explicit values exceed the envelope. Adds unit tests (split + validation), webhook tests, and an e2e spec asserting the envelope is derived from per-container memory when omitted. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: German Eichberger <geeichbe@microsoft.com>
Update: the pod resource envelope is now optionalRefined the model so the envelope (
Also made CPU carve symmetrically with memory (sidecar CPU reservations are subtracted from the envelope so the resolved container CPUs sum to the envelope rather than exceeding it). New Tests added/updated: split + validation unit tests, webhook admission tests (accept/reject paths), and a new e2e spec asserting the envelope is derived from per-container memory when omitted. The full |
What
Treats
spec.resource.memoryas the total pod memory envelope and carves sidecar memory out of it, so a gateway or OTel-collector memory leak is OOM-isolated to its own container instead of crowding out PostgreSQL.For a pod with
spec.resource.memory: 16Gi:documentdb-gatewaymin(18.75% × envelope, 32Gi)= 3Giotel-collector(whenmonitoring.enabled)postgresPostgreSQL's memory-aware GUCs (
shared_buffers, etc.) are recomputed from the carved value, not the full envelope.Why
The gateway sidecar (and the OTel collector) previously ran BestEffort (no limits). The gateway's async runtime also spawns one worker thread per visible CPU, so on large nodes a leak or thread blow-up could create pod/node memory pressure that starves the co-located PostgreSQL. Giving each sidecar a bounded, isolated footprint contains that blast radius. Defaults follow the production sizing model (18.75% of the envelope, capped at 32Gi).
What's included
spec.resource.{gateway,database,otel}ComponentResourcesoverrides (explicit wins over the auto-split).ComputeResourceSplit(pure, unit-tested against production SKU rows) wired into the CNPG cluster spec; PostgreSQL GUCs recomputed from the carved memory; sidecar resources reconciled on the existing sync path (so existing clusters re-split on next reconcile).Resourceson the gateway and otel containers and addsGOMEMLIMIT(≈80% of the otel limit).memory_limiterprocessor (cgroup-awarelimit_percentage: 80) is added first in the pipeline.operator.sidecarResources.*) wired to env vars; helm-unittest coverage.docs/designs/.Configuration
Fleet-wide defaults via Helm:
operator.sidecarResources.{gatewayMemoryFraction,gatewayMemoryCap,otelMemoryRequest,otelMemoryLimit,otelCpuRequest,gatewayCpuLimit}.Rollout
Ships default-on. Existing clusters adopt the split — and take a one-time rolling restart to recompute PostgreSQL memory — on their next reconcile. Called out in the CHANGELOG.
Testing
GOMEMLIMIT, OTelmemory_limiterconfig.go test ./...(operator + plugin) green.helm unittest(83 passing) incl. the new operator env assertions.tests/resourcesarea): stood up a kind cluster with the images built from this branch and asserted the live container split:192Mi, postgres832Mi, no collector192Mi, otel48Mi/128Mi+GOMEMLIMIT, postgres704MiNotes
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com