Add OpenTelemetry observability to custom background tasks#812
Add OpenTelemetry observability to custom background tasks#8122chanhaeng wants to merge 34 commits into
Conversation
Generalize Fedify's enqueue-and-process-later pattern, previously limited to outgoing activity delivery, to arbitrary application-defined background jobs. `Federation` and `FederationBuilder` gain `defineTask()` (via the new `TaskRegistry` interface), and `Context` gains `enqueueTask()`/`enqueueTaskMany()`. Each task carries a Standard Schema that infers the payload type and validates it both at enqueue time and at dequeue time, guarding against schema drift across deployments. Payloads are serialized with devalue so that `Date`, `Map`, `Set`, `URL`, `bigint`, circular references, and Activity Vocabulary objects round-trip faithfully across every message queue backend. Failed handlers retry with exponential backoff by default, configurable per task or federation-wide, and tasks can be isolated onto a dedicated queue or fall back to the outbox queue. The payload codec is implemented twice on purpose: `codec.ts` as a class (`TaskCodec`) and `codec-fn.ts` as standalone utility functions, each with its own tests. Only the class is wired into the runtime; the functional variant is kept temporarily so the team can compare the two styles and decide which reads better before one is removed. fedify-dev#206 fedify-dev#797 Assisted-by: Claude Code:claude-opus-4-8
Split the monolithic `install` task into `install:deno` and `install:pnpm`, with `codegen` as an explicit dependency, so each runtime's setup can be run on its own. `test:deno` now depends on `install:deno` instead of `prepare`, since Deno runs the TypeScript sources directly and does not need the build step. Update AGENTS.md to match: document `mise run prepare`/`prepare-each` for building, `check-each` and `test-each` for scoping work to specific packages, and add a section directing agents to consult `mise tasks`. Assisted-by: Claude Code:claude-opus-4-8
The `install:deno` task runs `scripts/install.ts`, which `deno cache`s each workspace member's export entry points; those include the generated `packages/vocab/src/vocab.ts`. `install:pnpm` likewise expects the generated sources to be present. Both therefore require `codegen` to have run first. Previously `codegen` sat alongside `install:deno` and `install:pnpm` in the `install` task's `depends` list, which `mise` runs in parallel, so the cache step could start before `vocab.ts` was generated. Move `codegen` into each subtask's own `depends` so it is ordered before them; `mise` dedupes the shared dependency to a single run. As a result `install:deno` and `install:pnpm` are now correct when invoked on their own, not only as part of `install`. Also correct the `install:pnpm` description, which said "for Deno". Assisted-by: Claude Code:claude-opus-4-8
`isPlainObject` in the task codec only accepted objects whose prototype is exactly `Object.prototype`, so an object made with `Object.create(null)` was treated as a non-plain leaf. Any vocab object nested inside such an object was therefore left as its parked holder instead of being revived, even though devalue round-trips null-prototype objects without throwing. Accept a `null` prototype as well, and add a regression test that round-trips a vocab object nested in an `Object.create(null)` object. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-opus-4-8
When a custom task handler throws and the queue does not own retries, the error path computes the elapsed time from `message.started` to feed the retry policy. `message.started` is normally a valid ISO instant set at enqueue time, but a corrupted or drifted queue could hand back an invalid string, in which case `Temporal.Instant.from()` threw out of the error-handling block. That masked the original handler error and aborted the retry, silently dropping the task. Wrap the parse in a try-catch, fall back to a zero elapsed time, and log the offending value. A regression test drives a message with a malformed `started` through a throwing handler and asserts the retry is still enqueued. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-opus-4-8
`FederationBuilderImpl.taskDefinitions` was a plain object, so the duplicate check `name in this.taskDefinitions` and the lookups `this.taskDefinitions[taskName]` consulted the prototype chain. Task names are arbitrary user-supplied strings, so a name such as "constructor", "toString", or "__proto__" was wrongly reported as already defined and resolved to an inherited method on lookup. Switch the registry to a `Map`, which is immune to prototype keys by construction and avoids the clone footgun where a later spread or `Object.assign` would silently reintroduce the prototype. Sibling registries stay plain objects since they are keyed by controlled values (type-id URLs). Add a regression test covering names that collide with `Object.prototype`. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-opus-4-8
`#revive` mapped every node through all five class revivers, allocating five promises per node and resolving them with `Array.fromAsync` before picking the first truthy result. The class filters are mutually exclusive, so it now finds the single matching reviver and runs only that one, cutting the per-node work to a single promise. This keeps the existing behaviour (cycles, repeated references, and Map/Set/Array/plain-object/null-prototype containers all still round-trip, as the codec tests assert) and folds the rationale for two declined suggestions into a comment: the walked tree is devalue's throwaway parse output, so there is no external identity to preserve and nothing to clone lazily; and a recursion-depth cap is moot because this pass recurses with `await` (unwinding the stack each level) while devalue's own recursive `stringify`/`parse` is the binding limit on nesting and would overflow first. fedify-dev#803 (comment) fedify-dev#803 (comment) Assisted-by: Claude Code:claude-opus-4-8
A task may route to its own queue via `defineTask(name, { queue })`, and
`resolveTaskQueue()` enqueues its messages there, but
`_startQueueInternal()` only listened on the four federation-wide queues
(inbox, outbox, fanout, task). A task queue that was none of those got
no worker, so its messages were never processed even while
`startQueue()` was running.
Collect the distinct dedicated queue instances from the task registry
and start a worker for each, treating them as part of the "task"
selector. Dedupe against the standard queues and against task queues
already started on an earlier call so no instance is listened on twice,
and let a deployment whose only queues are per-task ones still start:
the early return no longer bails out when a dedicated task queue exists.
fedify-dev#803 (comment)
Assisted-by: Claude Code:claude-opus-4-8
@dahlia, the maintainer picks `TaskCodec` because it carries the loader state on the instance at [a comment](fedify-dev#803 (comment)). Therefore remove the *codec-fn.ts*. `TaskCodecLoaders` moved to *codec.ts* because `TaskCodec` use it.
The task payload schema validates on both sides of the queue: at enqueue time and again at dequeue time. The wire therefore carries the validated *output*, which the same schema must re-accept as input, so transforming schemas (e.g., Zod's .transform()) whose output differs in shape from their input cannot round-trip. This constraint was neither documented nor tested; state it in the manual and the schema option's JSDoc, and pin it with a regression test. fedify-dev#803 Assisted-by: Claude Code:claude-fable-5
resolveTaskQueue() returns the fallback queue even for a task name with no registered definition, so enqueuing a handle created by a different federation instance silently succeeded and the worker later dropped the message with only a warning. The task API's contract is to fail fast at the enqueue call site (it already validates the payload there), so check the registry before resolving a queue and throw a TypeError instead. fedify-dev#803 Assisted-by: Claude Code:claude-fable-5
Small follow-ups from review:
- Document that tasks must be defined before startQueue() (or the
first request); workers for dedicated per-task queues are only
registered when the queue machinery starts, so a queue defined
later never gets a worker.
- Return early from the enqueue path when no payloads are given,
instead of reaching enqueueMany()/Promise.all with an empty
batch, whose backend behavior is undefined.
- Rename #enqueueSingular to #encodeTaskMessage; it encodes and
builds a TaskMessage but does not enqueue anything.
- Fix a comment typo in the codec.
fedify-dev#803
Assisted-by: Claude Code:claude-fable-5
The revival dispatch pulled init/set out of a heterogeneous tuple list, losing the correlation between each tuple's filter and its init/set node type, which forced two @ts-ignore suppressions at the call site. Such suppressions hide any future error on those lines, so they are unfit for a permanent implementation. Each entry is now built by a generic classReviver() factory whose single type parameter ties the filter to its init/set, letting the compiler check the calls it previously could not. Also bind the recursive reviver to one inner closure per decode pass instead of allocating a fresh closure on every dispatch. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-fable-5
The __contextData phantom field binds a TaskDefinition handle to its federation's context data type, but as a string-keyed property it leaked into user-facing docs and IDE completions despite its @internal tag. Replace it with a module-private unique symbol key: no value exists at runtime, the marker disappears from completions, and cross-federation handle rejection still type-checks, now guarded by a regression test. Also replace the tasks barrel's wildcard re-export of task.ts with explicit named exports of the six types its consumers actually use, so nothing new falls through the barrel unnoticed. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-fable-5
Automated reviewers keep proposing a fixed recursion depth cap (~100) in TaskCodec's #revive to guard against stack overflow from deeply nested payloads. The concern does not apply: the revive traversal suspends at an await on every level, so nesting depth consumes heap (promise chains) rather than native stack, and a structure deep enough to threaten the stack would fail inside devalue.parse() before #revive ever ran. A cap would only reject legitimate payloads. Add a regression test that round-trips a payload nested 1,000 levels deep—an order of magnitude above any proposed cap—through alternating objects and arrays down to a vocab leaf, so introducing such a cap now fails the suite. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-fable-5
The enqueue guard only checked that the handle's task name existed in the local registry, so once two federation instances defined the same task name, a handle from the other instance slipped through: the local context encoded the payload under the schema carried by the foreign handle while the worker decoded it under the local definition's schema. A payload the local schema would have rejected at enqueue thus landed in the queue anyway, only to be dropped at decode time—defeating the fail-fast purpose the guard exists for. defineTask() now stores the exact handle object it returns alongside the internal definition, and enqueueTask()/enqueueTaskMany() compare that handle by identity. Handles still work on every federation built from the same builder, since build() shares the stored definitions. The cross-federation regression test now covers the same-name case in addition to the undefined-name case. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-fable-5
MockContext.enqueueTask() invoked the handler with the raw input, while production enqueueTask() validates the payload against the task schema and hands the validated output to the handler. Tests written against @fedify/testing therefore accepted payloads that production rejects at enqueue, and observed the raw input rather than the coerced or normalized value a transforming schema produces—masking integration bugs the mock exists to surface. The mock now runs the registered schema's Standard Schema validator before invoking the handler, throwing the same TypeError production throws on failure and passing the validated output through. enqueueTaskMany() inherits this since it delegates to enqueueTask(). Added tests covering a rejected payload, a coercing schema whose validated output reaches the handler, and per-item validation in the batch path. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-fable-5
The @standard-schema/spec import is shared by the fedify and testing packages, so it belongs at the workspace level rather than being declared per package. The root deno.json already lists it and workspace members inherit the root import map, making the copy in the fedify package's deno.json redundant; drop it. The pnpm side already sources the version from the catalog in pnpm-workspace.yaml, with each package.json referencing it as "catalog:". Assisted-by: Claude Code:claude-fable-5
Every other enqueue path (inbox, outbox, fanout, forwarding) calls _startQueueInternal() right before enqueuing unless manuallyStartQueue is set, but #enqueueTasks did not. An application that only uses the custom task API never sends an activity, so with the default configuration its first enqueueTask() accepted the message while no worker ever listened: tasks piled up in the queue unprocessed until startQueue() was called explicitly or an activity happened to be sent. Add the same guard to #enqueueTasks, plus a regression test asserting that the first enqueue starts the task worker exactly once and that a second enqueue does not start another listener. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-fable-5
Production's enqueueTaskMany() validates and encodes every payload with Promise.all() before enqueuing anything, so a batch with one invalid item rejects with no effect. The mock looped enqueueTask() per item instead, invoking handlers for earlier payloads before a later one failed validation—tests could observe a partial processing state that cannot occur in production. Split the definition lookup and the schema validation out of enqueueTask() into helpers, and make enqueueTaskMany() validate the whole batch up front, running handlers only once every payload has passed. The existing batch-validation test now pins that no handler runs at all when the batch rejects. fedify-dev#803 (comment) fedify-dev#803 (comment) Assisted-by: Claude Code:claude-fable-5
Production compares the registered handle by identity (14313a1), so passing a handle from another federation instance throws even when both instances define the same task name. The mock looked definitions up by name only, and defineTask() did not keep the handle it returned, so an identity check was impossible: tests could pass with a handle the real federation rejects. Store the returned handle with the definition and require the enqueued handle to be that very object, with the same error message production uses. A regression test defines the same task name on two mock federations and asserts the foreign handle is rejected without running any handler. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-fable-5
The custom background task APIs added on this branch were annotated with @SInCE 2.3.0, but the release that will include them is not yet decided. Replace those tags with the placeholder 2.x.x so the documentation does not promise a specific version prematurely. Affected APIs: Context.enqueueTask and enqueueTaskMany, the taskRetryPolicy and taskQueueResolution federation options, the task queue option, TaskMessage, and the task definition types (TaskHandler, TaskDefinitionOptions, TaskDefinition, TaskRegistry, and TaskEnqueueOptions). Assisted-by: Claude Code:claude-opus-4-8
Local review of the custom background task PR flagged several
documentation-level problems; no runtime behavior is affected:
- The tasks manual claimed the API ships in Fedify 2.3.0, while the
new APIs' JSDoc had already moved to `@since 2.x.x` because the
containing release is undecided. Align the manual with the JSDoc.
- The manual also claimed a per-task queue defined after the queue
machinery starts "never gets a worker." Without
`manuallyStartQueue`, the next request or enqueue starts the
worker, so soften the claim to match the implementation.
- A comment in codec.test.ts described an instance-level `#seen` map
that does not exist—each `deserialize()` call builds its own
per-decode map—and the first test's title claimed a fresh instance
per operation while the tests share one module-level codec.
Correct both.
- Fix grammar errors in the new *AGENTS.md* paragraph about
`mise tasks`.
fedify-dev#803
Assisted-by: Claude Code:claude-fable-5
Assisted-by: Codex:gpt-5-5
The custom task API's producer side—Context.enqueueTask() and
enqueueTaskMany()—had no direct coverage at the middleware layer. Add
tests that drive a real ContextImpl against a recording queue and assert
on what it enqueues:
- enqueueTask() builds a well-formed task message (type, taskName,
baseUrl, attempt, UUID id, parseable started instant, trace context)
and round-trips a vocab payload through the codec as JSON-LD.
- enqueueTaskMany() routes a multi-item batch through enqueueMany(),
preserving order and forwarding delay/orderingKey, while a
single-item batch uses enqueue() instead.
- When the queue lacks enqueueMany(), the batch falls back to
concurrent single enqueues—verified with a rendezvous queue that
blocks until both are in flight—still preserving order and options.
- An invalid payload anywhere in the batch rejects with a schema
TypeError and enqueues nothing.
To avoid duplicating fixtures, the MockQueue and Standard Schema test
helpers that tasks.test.ts defined inline move to testing/tasks.ts
(re-exported by testing/mod.ts); both suites now import the single
implementation, and the fixture-usage allowlist covers the new file.
fedify-dev#803
Assisted-by: Claude Code:claude-opus-4-8
The array reviver in TaskCodec restored elements with `arr.push(...await Array.fromAsync(node, revive))`. Spreading the revived elements into a single call hits the engine's argument-count limit, so a large enough array throws `RangeError: Maximum call stack size exceeded` during decode. Since the worker drops decode failures without retry, an otherwise-valid payload that enqueued fine is silently lost on the dequeue side. Replace the spread with a per-item loop append, matching the existing Map and Set revivers, so revival no longer depends on the array length. Add a regression test that round-trips a 200,000-element array. fedify-dev#803 (comment) Assisted-by: Claude Code:claude-opus-4-8
Three documentation points raised on the task API review, plus a
regression test backing the Temporal claim:
- The payload codec round-trips devalue's built-in Temporal types
with no extra code, but the supported-payload list omitted them.
List `Temporal` (with `Temporal.Instant` / `Temporal.Duration`
examples) and add a serialize/deserialize round-trip test so the
documented support stays covered.
- The vocab import example used the compatibility path
`@fedify/fedify/vocab`. Switch it to `@fedify/vocab`, matching the
surrounding docs and the current package boundary so copied code
does not bind to a path slated for removal.
- Task payloads now cross durable queue storage and can hold arbitrary
application data. Add a trust-boundary security note to the queue
isolation section: treat the backend and payloads as internal
trusted storage, pass identifiers the worker resolves rather than
long-lived secrets, and use a dedicated task queue with
`taskQueueResolution: "strict"` when isolation is required.
fedify-dev#803 (comment)
fedify-dev#803 (comment)
fedify-dev#803 (comment)
Assisted-by: Claude Code:claude-opus-4-8
Context.enqueueTask() and enqueueTaskMany() now accept a
deduplicationKey requesting at-most-once enqueue for tasks that share
it (new TaskEnqueueOptions.deduplicationKey).
Resolution follows the queue and key-value store capabilities:
- A queue declaring the new MessageQueue.nativeDeduplication owns the
check; the key is forwarded through the new
MessageQueueEnqueueOptions.deduplicationKey.
- Otherwise Fedify applies a best-effort guard through the optional
KvStore.cas primitive under a new taskDeduplication key prefix,
tunable with the new FederationOptions.taskDeduplicationTtl and
taskDeduplicationFallback options.
For enqueueTaskMany(), a single key governs the whole batch. A native
queue that does not implement enqueueMany() cannot express batch-level
at-most-once with a per-message key, so such a multi-item enqueue is
rejected with a TypeError instead of silently leaking duplicates.
Configuration errors that are decidable without a payload (a native
queue lacking enqueueMany, or a closed fallback without cas) are
checked before payloads are validated and encoded, so they reject
before any user schema runs or any key is reserved.
fedify-dev#798
Assisted-by: Claude Code:claude-opus-4-8
The #enqueueTasks and #encodeTaskMessage methods made ContextImpl oversized, so move the handle validation, deduplication planning, payload encoding, and queue dispatch into a new tasks/enqueue.ts module. ContextImpl now delegates to enqueueTasks(), passing only the small slice of itself (federation, codec, origin, data) the pipeline needs. Pull the shared task-test helpers (the schema factory, stock schemas, base federation options, and the recording MockQueue) into a new testing/mq-tasks.ts module, and split the enqueue-specific cases out of tasks.test.ts into enqueue.test.ts. Teach the fixture-usage check to expand glob patterns in its allowlist so the whole testing/ directory is covered by a single entry instead of one path per file. Assisted-by: Claude Code:claude-opus-4-8
Two branches both touched the task testing utilities and diverged: one split MockQueue and the shared schemas/options out into mq-tasks.ts, while the other kept evolving them in tasks.ts. After rebasing the common edits, consolidate everything back into a single tasks.ts and drop the now-redundant mq-tasks.ts. Assisted-by: Claude Code:claude-opus-4-8
The key-value deduplication path reserved a marker before dispatching to the queue but never undid it when the dispatch failed. A transient backend failure therefore left the marker behind, so the retry was silently deduplicated against a task that had never reached the queue. The cas claim now stores a unique token instead of a bare `true`, and a failed dispatch conditionally clears it (cas succeeds only while the stored value is still our token). The conditional clear keeps a stale rollback from deleting a marker that another concurrent enqueue has already re-claimed. A rollback that itself fails is logged and swallowed so the original enqueue error still reaches the caller. The enqueueMany requirement for deduplicated multi-item batches now keys on whether deduplication is actually applied—a native queue or the cas fallback—rather than on nativeDeduplication alone. Under the "open" fallback (no native dedup, no cas) no marker is taken, so the batch fans out without deduplication instead of throwing. ParallelMessageQueue likewise rejects a deduplicated batch when the wrapped queue lacks enqueueMany, since fanning out cannot carry one key atomically. fedify-dev#798 Assisted-by: Claude Code:claude-opus-4-8
Layer task-specific telemetry onto the custom background task dispatch path, reusing the queue-task metric pattern and mirroring the existing `http_signatures.failure_reason` enum in metrics.ts. Each dequeued task now runs in a `fedify.task` span that inherits the enqueue site's trace context and carries `fedify.task.name`, `fedify.task.attempt`, and, on a terminal failure, `fedify.task.failure_reason`. The `fedify.queue.task.*` metrics report task runs under the new `"task"` role with the task name and, on failure, a bounded `fedify.task.failure_reason`. To tell the failure reasons apart, `#listenTaskMessage` splits the former `decode()` call into its deserialize and validate phases and returns the decision point that failed: `deserialization`, `validation`, `unknown_task`, or `handler`. A swallowed abort is reported as a graceful interruption, not a failure. The reported `fedify.queue.backend` reflects the resolved queue so it stays accurate under the outbox fallback. Public surface: `QueueTaskRole` gains `"task"`, `QueueTaskCommonAttributes` gains `taskName`, and a new `QueueTaskFailureReason` type plus an optional trailing `failureReason` parameter on `recordQueueTaskOutcome()` carry the reason. `TaskCodec` exposes an instance `validate()` wrapper so the dispatch site can split decoding without importing the class. fedify-dev#799 Assisted-by: Claude Code:claude-opus-4-8
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a custom background task API to Fedify, allowing developers to define, enqueue, and process arbitrary background jobs with type-safe payload validation via Standard Schema. The implementation supports robust serialization of complex types and Activity Vocabulary objects using devalue, customizable retry policies, queue routing, best-effort or native deduplication, and OpenTelemetry instrumentation. Feedback on the changes highlights a compatibility issue with Node.js 20 due to the use of Array.fromAsync in codec.ts, suggesting standard for...of loops instead, and recommends implementing a recursion depth limit during deserialization to prevent potential Denial of Service (DoS) attacks from deeply nested payloads.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| classReviver( | ||
| isInstanceOf(Array), | ||
| (): unknown[] => [], | ||
| async (revive, node, arr) => { | ||
| for (const item of await Array.fromAsync(node, revive)) arr.push(item); | ||
| }, | ||
| ), |
There was a problem hiding this comment.
Using Array.fromAsync introduces compatibility issues with Node.js 20 (which is an active LTS version of Node.js) as it was introduced in ES2024 and is only natively supported in Node.js 22+. To maintain compatibility across all target environments (including Node.js 20), replace it with a standard for...of loop with await.
classReviver(
isInstanceOf(Array),
(): unknown[] => [],
async (revive, node, arr) => {
for (const item of node) {
arr.push(await revive(item));
}
},
),| isInstanceOf(Set), | ||
| () => new Set<unknown>(), | ||
| async (revive, node, set) => { | ||
| for (const v of await Array.fromAsync(node, revive)) set.add(v); | ||
| }, | ||
| ), |
There was a problem hiding this comment.
Using Array.fromAsync introduces compatibility issues with Node.js 20 (which is an active LTS version of Node.js) as it was introduced in ES2024 and is only natively supported in Node.js 22+. To maintain compatibility across all target environments (including Node.js 20), replace it with a standard for...of loop with await.
| isInstanceOf(Set), | |
| () => new Set<unknown>(), | |
| async (revive, node, set) => { | |
| for (const v of await Array.fromAsync(node, revive)) set.add(v); | |
| }, | |
| ), | |
| classReviver( | |
| isInstanceOf(Set), | |
| () => new Set<unknown>(), | |
| async (revive, node, set) => { | |
| for (const v of node) { | |
| set.add(await revive(v)); | |
| } | |
| }, | |
| ), |
| #revive = (seen: Seen): Revive => { | ||
| const inner: Revive = async (node) => { | ||
| if (node === null || typeof node !== "object") return node; | ||
| if (seen.has(node)) return seen.get(node); | ||
| for (const reviver of this.#classRevivers) { | ||
| const out = reviver(seen, inner, node); | ||
| if (out !== undefined) return await out; | ||
| } | ||
| // devalue can handle non-container objects. | ||
| return node; | ||
| }; | ||
| return inner; | ||
| }; |
There was a problem hiding this comment.
To prevent stack overflow or resource exhaustion from maliciously crafted deep JSON payloads, implement a depth limit in recursive traversal functions. The limit should be high enough for legitimate data (e.g., 1000) but low enough to prevent DoS attacks.
#revive = (seen: Seen, maxDepth = 1000): Revive => {
const inner = async (node: unknown, depth = 0): Promise<unknown> => {
if (node === null || typeof node !== "object") return node;
if (seen.has(node)) return seen.get(node);
if (depth >= maxDepth) {
throw new TypeError("Maximum depth limit exceeded during deserialization.");
}
const nextRevive = (n: unknown) => inner(n, depth + 1);
for (const reviver of this.#classRevivers) {
const out = reviver(seen, nextRevive, node);
if (out !== undefined) return await out;
}
// devalue can handle non-container objects.
return node;
};
return inner;
};References
- To prevent stack overflow from maliciously crafted deep JSON, implement a depth limit in recursive traversal functions. The limit should be high enough for legitimate data but low enough to prevent DoS attacks.
Resolves #799, the third and final sub-issue of #206 (custom background tasks). Once this lands, #206 is fully resolved.
Background
The core task API (#797/#803) shipped task dispatch behavior and structured logging, but the task worker carries no span and no metrics: of the message variants handled in
processQueuedTask, every other branch (fanout/outbox/inbox) is dispatched with instrumentation, buttask.This PR closes that gap by layering task-specific telemetry onto the decision points the core already established. It reuses the queue-task metric pattern introduced in #759 and mirrors the existing
http_signatures.failure_reasonenum in metrics.ts. It changes no drop/retry behavior: telemetry is observed, never enforced.What changes
Span
Each dequeued task now runs inside a
fedify.taskconsumer span. The name is namespaced underfedify.rather thanactivitypub.because tasks are not part of ActivityPub, paralleling the existingactivitypub.inbox/outbox/fanoutspans. The span:fedify.task.nameandfedify.task.attempt(the zero-based attempt number).fedify.task.failure_reasonand sets its status toERRORon a terminal failure, so trace backends surface failed tasks without re-deriving the reason from logs.Failure attribution
#listenTaskMessagenow returns the failure reason (orundefinedon success) so the span/metric wrapper can attribute it. To distinguish a deserialization failure from a validation failure, the former combinedcodec.decode(...)call is split into its existingdeserializethenvalidatephases. This is behavior-preserving—decodeis literallyvalidate(schema, await deserialize(raw))—andTaskCodecgains a thin instancevalidate()wrapper so the dispatch site can split the two phases without importing the class.The four bounded
fedify.task.failure_reasonvalues map one-to-one to the worker's dispatch decision points:deserialization— the wire payload could not be deserialized.validation— the deserialized payload failed schema validation.unknown_task— the task name has no registered handler.handler— the registered handler threw.A worker shutdown is the one exception: an interrupted attempt is reported as an
abortedoutcome with nofedify.task.failure_reason, never as ahandlerfailure.Metrics surface
Tasks reuse the
fedify.queue.task.*metric family under a newtaskrole:QueueTaskRolegains"task".QueueTaskCommonAttributesgainstaskName, emitted asfedify.task.name.QueueTaskFailureReasontype, mirroringHttpSignatureMetricFailureReason.recordQueueTaskOutcome()gains an optional trailingfailureReasonparameter (non-breaking); it is emitted asfedify.task.failure_reasononly on afailedresult.recordQueueTaskEnqueuedrecordsrole: "task"at both the enqueue site (after a genuine dispatch, never on a dedup skip or a failed enqueue) and the retry re-enqueue site.fedify.queue.backendreports the resolved queue—the one actually used after routing, which may be the outbox queue under the fallback mode—so the metric stays accurate regardless of routing.Cardinality
Bounded by construction: task names are a registered, known-at-startup set (never derived from message content), and
failure_reasonis a four-value bounded enum. Combined cardinality istaskName × |failure_reason| × queue.backend, within OTel attribute safety. The process-localin_flightUpDownCounter omitsfedify.task.nameso its series stays drained.Out of scope
taskName(would risk unbounded cardinality).QueueTaskFailureReasonset—explicitly open to later refinement as long as it stays a small bounded set.Tests
packages/fedify/src/federation/tasks/tasks.test.ts gains a telemetry block with one assertion per acceptance criterion, using
TestSpanExporter/createTestTracerProvider/createTestMeterProviderfrom@fedify/fixture. Coverage:fedify.taskspan exists withfedify.task.nameandfedify.task.attempt.fedify.task.failure_reason.fedify.queue.backendreflects the resolved queue, including the outbox fallback.recordQueueTaskEnqueued/recordQueueTaskOutcomecarryrole: "task".Verified across Deno, Node.js, and Bun.
Documentation
fedify.taskspan row, thetaskvalue added to thefedify.queue.roleenumeration, a widenedfailed-result definition covering acked task drops, and thefedify.task.name/fedify.task.attempt/fedify.task.failure_reasonattribute rows.AI disclosure
Assisted-by: Claude Code:claude-opus-4-8