AudioResampler 16kHz→24kHz outputs near-silence when fed 10ms frames (realtime framing)

## Versions
`@livekit/rtc-node` 0.13.29, linux x64 (also reproduced inside the `@livekit/agents` 1.4.4 STT pipeline, which constructs `AudioResampler(16000, 24000)` internally for realtime STT plugins).

## Behavior
Upsampling 16 kHz mono pcm16 speech to 24 kHz with `AudioResampler` destroys the signal when the input is pushed in **10 ms frames (160 samples)** — the framing `AudioStream` emits and the realtime case. Larger pushes progressively recover:

| Push size | Output RMS (input RMS 3770) | Output sample count |
|---|---|---|
| 160 samples (10 ms) | **34.6 — near-silence** | 177,600 (correct) |
| 1,600 samples (100 ms) | 2,549 — degraded | 177,600 |
| 16,000 samples (1 s) | 4,280 — healthy | 168,000 |

The output sample count is correct in all cases — only the content collapses. No aliasing/buffer-reuse involved (snapshots taken at return time equal late reads).

## Impact
Any pipeline that feeds live `AudioStream` frames (10 ms) through `AudioResampler(16000→24000)` — e.g. `@livekit/agents`' base STT class resampling for OpenAI realtime STT (24 kHz) — sends near-silence to the provider. Server-side VAD never triggers, so the result is zero transcription events with no errors anywhere: a fully silent failure.

## Repro
```js
import { AudioFrame, AudioResampler } from '@livekit/rtc-node';

// s16: Int16Array of 16kHz mono speech (we used ~7s of TTS audio)
const rms = (a) => { let s = 0; for (let i = 0; i < a.length; i++) s += a[i] * a[i]; return Math.sqrt(s / a.length); };
console.log('input rms:', rms(s16));

for (const chunk of [160, 1600, 16000]) {
  const r = new AudioResampler(16000, 24000);
  const out = []; let n = 0;
  for (let i = 0; i + chunk <= s16.length; i += chunk)
    for (const f of r.push(new AudioFrame(s16.subarray(i, i + chunk), 16000, 1, chunk))) {
      out.push(Int16Array.from(f.data)); n += f.samplesPerChannel;
    }
  const all = new Int16Array(n); let p = 0;
  for (const s of out) { all.set(s, p); p += s.length; }
  console.log(`chunk=${chunk}: outSamples=${n} rms=${rms(all).toFixed(1)}`);
}
```

## Workaround
Request the target rate from `AudioStream` directly (`new AudioStream(track, { sampleRate: 24000 })`) so WebRTC's internal resampler runs instead — output is correct and downstream transcription works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AudioResampler 16kHz→24kHz outputs near-silence when fed 10ms frames (realtime framing) #679

Versions

Behavior

Impact

Repro

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Push size	Output RMS (input RMS 3770)	Output sample count
160 samples (10 ms)	34.6 — near-silence	177,600 (correct)
1,600 samples (100 ms)	2,549 — degraded	177,600
16,000 samples (1 s)	4,280 — healthy	168,000

AudioResampler 16kHz→24kHz outputs near-silence when fed 10ms frames (realtime framing) #679

Description

Versions

Behavior

Impact

Repro

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions