Parallelize startup and harden playback/export A/V sync by richiemcilroy · Pull Request #1893 · CapSoftware/Cap

richiemcilroy · 2026-06-06T23:28:24Z

Parallelize editor startup (mic/system-audio load + GPU layer warmup overlap decoder creation)
- Replace audio-timing-repair log-scraping with structured AudioGapSummary metadata (recorder → AudioMeta → editor)
- Harden audio mixer against offset overflow (i128 intermediates) and skip decode work for sub-fallback frames
- Synthesize a default timeline for un-edited recordings (decoded media duration) so raw/CLI exports render
- Gapless export audio on zero-frame fallback; fragmented-recording remux on finalize/export
- Tighten the audio-ready handshake timeout to prevent unsynced playback

Greptile Summary

This PR hardens A/V sync across playback and export through several coordinated changes: structured AudioGapSummary metadata replaces log-scraping for startup-offset repair, GPU layer warmup is parallelised with decoder creation, the video clock is now stamped only after the audio device fires its first callback, and a default timeline is synthesised from encoded frame counts at recording stop so un-edited recordings are immediately exportable.

A/V sync: start = Instant::now() moved after AudioPlayback::spawn() returns, which blocks on the first device callback via a new channel handshake; initial latency offset switched from initial_compensation_secs (2× safety multiplier) to raw initial_output_latency_secs; silent audio frame substituted for None renders during export to prevent gaps.
AudioGapSummary pipeline: structured startup_overlap_trimmed_ms / startup_overlap_drops counters are accumulated in AudioGapTracker, stored in a OnceLock on FinishedOutputPipeline, written into AudioMeta.gap_summary, and consumed by audio_timing_repair_offset in the editor—replacing fragile log-scraping while keeping a legacy fallback for old recordings.
Overflow hardening: audio mixer uses i128 intermediates and usize::try_from instead of isize and wrapping_add_signed to prevent index overflow on large negative track offsets.

Confidence Score: 4/5

Safe to merge; the changes are well-tested and correctness-focused, with one open concern around startup latency for very long recordings.

The audio timing-repair rewrite is thorough and well-covered by new unit tests. The start = Instant::now() move correctly ties video to the audio device's first callback, fixing the sync regression. The one remaining concern is that AudioPlayback::spawn() has no upper-bound timeout for the PrerenderedAudioBuffer::new() pre-render phase: for a multi-hour recording the polling loop can silently stall for tens of seconds before playback begins.

crates/editor/src/playback.rs — the spawn() polling loop should have an outer timeout covering the pre-render phase, not just AUDIO_FIRST_CALLBACK_TIMEOUT which starts after stream.play().

Important Files Changed

Filename	Overview
crates/editor/src/playback.rs	Moves `start = Instant::now()` after `audio_playback.spawn()` to synchronize A/V start; adds first-callback handshake with `AUDIO_FIRST_CALLBACK_TIMEOUT`; hardens `prefetch_duration` and `duration` against invalid timelines; changes initial latency from `initial_compensation_secs` (2× multiplier) to raw `initial_output_latency_secs`.
crates/editor/src/editor_instance.rs	Replaces log-scraping A/V timing repair with structured `AudioGapSummary` metadata; adds `LegacyAudioTimingRepair` fallback for old recordings; parallelises audio I/O via `spawn_audio_load`; removes 5 s hardcoded fallback duration; minor redundant guard in `audio_timing_repair_offset`.
crates/recording/src/output_pipeline/core.rs	Adds `AudioGapSummary` struct and per-event startup/total accounting to `AudioGapTracker`; passes `frame_count` to `record_overlap`; stores summary in `OnceLock` on `OutputPipeline` and surfaces it in `FinishedOutputPipeline`.
crates/recording/src/studio_recording.rs	Synthesizes `TimelineConfiguration` and `ClipConfiguration` from encoded frame_count/fps at recording stop; propagates `gap_summary` into `AudioMeta`; persists populated project config so un-edited recordings have a valid timeline for export/playback.
crates/audio/src/renderer.rs	Replaces i32/isize arithmetic with i128 intermediates and `usize::try_from` to prevent overflow on large offsets; fixes stereo base-index calculation; adds regression test for negative offset (leading-silence) behaviour.
crates/export/src/mp4.rs	Emits a silent audio frame instead of skipping on zero-frame fallback, ensuring gapless export when the audio renderer returns None for a given time slot.
crates/editor/src/editor.rs	Splits `start_renderer_layers_creation` into start + `finish_renderer_layers_creation` so GPU layer warmup overlaps decoder/segment creation; removes now-unused `total_frames` field.
crates/recording/src/feeds/microphone.rs	Adds `normalize_pending_timestamps` to correct stale startup burst timestamps before they are enqueued, preventing phantom overlap events that would otherwise inflate the gap summary.
crates/rendering/src/decoder/avassetreader.rs	Computes `minimum_fallback_frame` from pending requests and skips decode work for frames below that bound, reducing unnecessary decode overhead during seek.

Comments Outside Diff (1)

crates/editor/src/editor_instance.rs, line 176-180 (link)

Hardcoded 5.0 s fallback creates incorrect timeline for any recording not ~5 s long

When both Video::new and get_video_duration_fallback fail (e.g., partially written or corrupt display file), the synthesized TimelineSegment uses end: 5.0. A 30-second recording whose display file is unreadable would export with a 5-second timeline, silently truncating content. Since this fallback is exercised for real on first open (before recovery or remux), the project config is then written with the wrong end value and persisted. Returning None from the filter in the MultipleSegments arm (same as the explicit duration <= 0.0 guard just below) and warning the user would be safer than silently synthesising an arbitrary duration.

Prompt To Fix With AI

This is a comment left during a code review.
Path: crates/editor/src/editor_instance.rs
Line: 176-180

Comment:
**Hardcoded 5.0 s fallback creates incorrect timeline for any recording not ~5 s long**

When both `Video::new` and `get_video_duration_fallback` fail (e.g., partially written or corrupt display file), the synthesized `TimelineSegment` uses `end: 5.0`. A 30-second recording whose display file is unreadable would export with a 5-second timeline, silently truncating content. Since this fallback is exercised for real on first open (before recovery or remux), the project config is then written with the wrong `end` value and persisted. Returning `None` from the filter in the `MultipleSegments` arm (same as the explicit `duration <= 0.0` guard just below) and warning the user would be safer than silently synthesising an arbitrary duration.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/editor/src/playback.rs:1285-1356
**Pre-render phase has no upper-bound timeout in `spawn()`**

`AudioPlayback::spawn()` polls `ready_rx` every 50 ms and only exits when the audio thread sends a value or `stop_rx` fires. The audio thread sends `ready_tx` only _after_ `PrerenderedAudioBuffer::new()` completes **and** the first device callback arrives. `PrerenderedAudioBuffer::new()` is O(recording length), so for a multi-hour recording on modest hardware it can take well over 10–20 s. During that entire window the polling loop spins with no way out except user cancellation.

`AUDIO_FIRST_CALLBACK_TIMEOUT` does not help here: it only starts timing after `stream.play()`, which is called _after_ pre-render finishes. The net effect is that pressing play on a long recording can silently stall for tens of seconds before video begins—trading the previous async desync for a synchronous startup freeze.

### Issue 2 of 2
crates/editor/src/editor_instance.rs:759-762
The `overlap_dropped_frames < MIN_STALE_STARTUP_DROPS` guard is unreachable: `startup_overlap_drops ≤ overlap_dropped_frames` by construction, so if the `startup_overlap_drops` check passes (`>= 3`), `overlap_dropped_frames >= 3` is guaranteed. Removing the redundant condition avoids misleading future readers into thinking total drops could prevent repair when startup drops are sufficient.

```suggestion
    if summary.startup_overlap_drops < MIN_STALE_STARTUP_DROPS
        || !(MIN_STALE_STARTUP_TRIMMED_MS..=MAX_STALE_STARTUP_REPAIR_MS)
            .contains(&summary.startup_overlap_trimmed_ms)
```

_{Reviews (3): Last reviewed commit: "Add test for negative timing offset cons..." | Re-trigger Greptile}

polarityinc · 2026-06-06T23:28:42Z

Paragon Review Skipped

Hi @richiemcilroy! Your Polarity credit balance is insufficient to complete this review.

Please visit https://app.paragon.run to finish your review.

tembo · 2026-06-06T23:33:03Z

        self.total_overlap_trimmed = self.total_overlap_trimmed.saturating_add(overlap);
        if dropped_whole_frame {
            self.overlap_dropped_frames += 1;
+            if frame_count <= STARTUP_OVERLAP_DROP_FRAME_LIMIT {


Minor off-by-one risk: frame_count looks 0-based (incremented after processing), so <= STARTUP_OVERLAP_DROP_FRAME_LIMIT counts 4 frames when the limit is 3. If you meant “first 3 frames”, consider switching to < (or renaming the constant/param to make the indexing explicit).

Suggested change

if frame_count <= STARTUP_OVERLAP_DROP_FRAME_LIMIT {

if frame_count < STARTUP_OVERLAP_DROP_FRAME_LIMIT {

self.startup_overlap_drops += 1;

}

tembo · 2026-06-06T23:33:04Z

+        return 0.0;
+    };
+
+    if summary.startup_overlap_drops < MIN_STALE_STARTUP_DROPS


audio_timing_repair_offset applies total_overlap_trimmed_ms (whole recording). If there are any whole-frame overlap drops outside the startup window, this could over-shift audio. One cheap guard is to only apply when all drops were in startup.

Suggested change

if summary.startup_overlap_drops < MIN_STALE_STARTUP_DROPS

if summary.startup_overlap_drops < MIN_STALE_STARTUP_DROPS

|| summary.overlap_dropped_frames < MIN_STALE_STARTUP_DROPS

|| summary.startup_overlap_drops != summary.overlap_dropped_frames

|| !(MIN_STALE_STARTUP_TRIMMED_MS..=MAX_STALE_STARTUP_REPAIR_MS)

.contains(&summary.total_overlap_trimmed_ms)

{

return 0.0;

}

greptile-apps · 2026-06-06T23:38:09Z

+fn audio_timing_repair_offset(summary: Option<&cap_project::AudioGapSummary>) -> f32 {
+    let Some(summary) = summary else {
+        return 0.0;
+    };
+
+    if summary.startup_overlap_drops < MIN_STALE_STARTUP_DROPS
+        || summary.overlap_dropped_frames < MIN_STALE_STARTUP_DROPS
+        || !(MIN_STALE_STARTUP_TRIMMED_MS..=MAX_STALE_STARTUP_REPAIR_MS)
+            .contains(&summary.total_overlap_trimmed_ms)
+    {
+        return 0.0;
+    }
+
+    -(summary.total_overlap_trimmed_ms as f32 / 1_000.0)


total_overlap_trimmed_ms may include mid-recording trims, overcorrecting silence insertion

audio_timing_repair_offset uses summary.total_overlap_trimmed_ms — which accumulates ALL overlap trims throughout the entire recording — as the correction amount. The startup_overlap_drops >= 3 guard confirms that a stale-startup burst occurred, but the silence inserted equals the full lifetime total, not just the startup portion. A recording that has, say, 200 ms of startup drops and 800 ms of unrelated mid-recording trims would insert 1 000 ms of leading silence, shifting audio 800 ms too late and creating new A/V desync instead of fixing it. Adding a startup_overlap_trimmed_ms field alongside startup_overlap_drops in AudioGapSummary would allow the correction to be scoped to the actual startup trim.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/editor/src/editor_instance.rs Line: 646-659 Comment: **`total_overlap_trimmed_ms` may include mid-recording trims, overcorrecting silence insertion** `audio_timing_repair_offset` uses `summary.total_overlap_trimmed_ms` — which accumulates ALL overlap trims throughout the entire recording — as the correction amount. The `startup_overlap_drops >= 3` guard confirms that a stale-startup burst occurred, but the silence inserted equals the full lifetime total, not just the startup portion. A recording that has, say, 200 ms of startup drops and 800 ms of unrelated mid-recording trims would insert 1 000 ms of leading silence, shifting audio 800 ms too late and creating new A/V desync instead of fixing it. Adding a `startup_overlap_trimmed_ms` field alongside `startup_overlap_drops` in `AudioGapSummary` would allow the correction to be scoped to the actual startup trim. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-06T23:38:10Z

+/// How long the caller waits for the audio thread's ready verdict. This MUST stay strictly larger
+/// than [`AUDIO_FIRST_CALLBACK_TIMEOUT`] plus the thread's setup time so the thread's verdict
+/// always wins the race. The setup includes a synchronous full pre-render of the recording, which
+/// scales with its length, hence the generous headroom. If the caller times out first it concludes
+/// "no audio" and stops feeding the playhead while the audio thread may still start a now-unsynced
+/// stream — the desync this guards against. The thread reports failures immediately, so this only
+/// bounds a genuinely wedged device and is never waited out in normal operation.
+const AUDIO_READY_TIMEOUT: Duration = Duration::from_secs(15);


Windows pre-render can exceed AUDIO_READY_TIMEOUT, causing UI freeze then unsynced audio

On Windows, create_stream_prerendered builds a PrerenderedAudioBuffer entirely in memory before the stream is created — this work is O(recording length). For a multi-hour recording on modest hardware, the pre-render can take well over 15 seconds. When ready_rx.recv_timeout(AUDIO_READY_TIMEOUT) fires first, spawn() returns false, the video starts playing with has_audio = false, and start = Instant::now() is set. The audio thread eventually finishes pre-rendering, builds and plays the stream, and calls handle.block_on(stop_rx.changed()) — by which point the video playhead has already advanced by the overshoot time, so audio plays unsynced for the rest of the session. The desync scenario described in the comment is live for Windows users on long recordings.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/editor/src/playback.rs Line: 1180-1187 Comment: **Windows pre-render can exceed `AUDIO_READY_TIMEOUT`, causing UI freeze then unsynced audio** On Windows, `create_stream_prerendered` builds a `PrerenderedAudioBuffer` entirely in memory before the stream is created — this work is O(recording length). For a multi-hour recording on modest hardware, the pre-render can take well over 15 seconds. When `ready_rx.recv_timeout(AUDIO_READY_TIMEOUT)` fires first, `spawn()` returns `false`, the video starts playing with `has_audio = false`, and `start = Instant::now()` is set. The audio thread eventually finishes pre-rendering, builds and plays the stream, and calls `handle.block_on(stop_rx.changed())` — by which point the video playhead has already advanced by the overshoot time, so audio plays unsynced for the rest of the session. The desync scenario described in the comment is live for Windows users on long recordings. How can I resolve this? If you propose a fix, please make it concise.

richiemcilroy · 2026-06-07T16:07:23Z

hey @greptileai please re-review the PR

richiemcilroy · 2026-06-07T16:30:04Z

hey @greptileai please re-review the PR

richiemcilroy added 22 commits June 7, 2026 00:26

feat(project): add AudioGapSummary to persisted audio metadata

00fe725

feat(recording): track audio gap summary in output pipeline

13a8834

feat(recording): persist gap summary and timeline on studio stop

c0aee5a

fix(recording): set gap_summary on recovered segment audio

d1b3fab

fix(recording): normalize stale pending microphone timestamps

72fad01

chore(recording): add duration flag to real-device test runner

f2755ea

fix(audio): support negative timing offsets in audio renderer

0f9b64e

feat(audio): add initial_output_latency_secs to latency corrector

8e05b0a

feat(editor): add audio track timing offset support

a105eca

refactor(editor): simplify renderer spawn and layer creation API

11f2a4e

feat(editor): parallelize startup and repair stale audio drift

93d270d

fix(editor): wait for audio callback before starting playback

b694f72

fix(desktop): set gap_summary when importing audio metadata

f30cc2b

perf(rendering): tighten initial decode fallback distance

769dc98

perf(enc-avfoundation): shorten keyframe interval to 750ms

3e30e64

fix(export): emit silent audio frames when render returns none

cc0c467

fix(desktop): prefer debug exporter binary in dev builds

4048a35

test(editor): add startup benchmark example

81a1139

test(editor): extend playback pipeline benchmark reporting

cc3b363

test(editor): add start frame option to playback benchmark

bced355

test(desktop): update display transport benchmark for renderer API

9ee1eae

Update tauri.ts

3d42742

tembo Bot reviewed Jun 6, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 6, 2026

View reviewed changes

richiemcilroy added 5 commits June 7, 2026 17:06

feat(recording): track startup overlap trim in gap tracker

bba7553

feat(project): add startup_overlap_trimmed_ms to audio gap summary

e6d280c

fix(recording): persist startup overlap trim in project metadata

479189a

fix(editor): use startup overlap trim for audio timing repair

58f72fc

fix(editor): leave timeline unset when display duration is unknown

3f096d5

richiemcilroy added 2 commits June 7, 2026 17:06

fix(editor): reject invalid timeline durations during playback

8468ecf

fix(editor): cancel audio ready wait when playback stops

fb23122

richiemcilroy added 2 commits June 7, 2026 17:21

Add legacy audio timing repair from log

0ace2b4

Add test for negative timing offset consistency

105ddce

richiemcilroy merged commit 2deaf26 into main Jun 7, 2026
20 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize startup and harden playback/export A/V sync#1893

Parallelize startup and harden playback/export A/V sync#1893
richiemcilroy merged 31 commits into
mainfrom
editor-performance

richiemcilroy commented Jun 6, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

polarityinc Bot commented Jun 6, 2026

Uh oh!

tembo Bot Jun 6, 2026

Uh oh!

tembo Bot Jun 6, 2026

Uh oh!

greptile-apps Bot Jun 6, 2026

Uh oh!

greptile-apps Bot Jun 6, 2026

Uh oh!

richiemcilroy commented Jun 7, 2026

Uh oh!

richiemcilroy commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-            if frame_count <= STARTUP_OVERLAP_DROP_FRAME_LIMIT {
+            if frame_count < STARTUP_OVERLAP_DROP_FRAME_LIMIT {
+                self.startup_overlap_drops += 1;
+            }

Conversation

richiemcilroy commented Jun 6, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

polarityinc Bot commented Jun 6, 2026

Uh oh!

tembo Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

tembo Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

richiemcilroy commented Jun 7, 2026

Uh oh!

richiemcilroy commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

richiemcilroy commented Jun 6, 2026 •

edited by greptile-apps Bot

Loading