Versions
@livekit/rtc-node 0.13.29, linux x64 (also reproduced inside the @livekit/agents 1.4.4 STT pipeline, which constructs AudioResampler(16000, 24000) internally for realtime STT plugins).
Behavior
Upsampling 16 kHz mono pcm16 speech to 24 kHz with AudioResampler destroys the signal when the input is pushed in 10 ms frames (160 samples) — the framing AudioStream emits and the realtime case. Larger pushes progressively recover:
| Push size |
Output RMS (input RMS 3770) |
Output sample count |
| 160 samples (10 ms) |
34.6 — near-silence |
177,600 (correct) |
| 1,600 samples (100 ms) |
2,549 — degraded |
177,600 |
| 16,000 samples (1 s) |
4,280 — healthy |
168,000 |
The output sample count is correct in all cases — only the content collapses. No aliasing/buffer-reuse involved (snapshots taken at return time equal late reads).
Impact
Any pipeline that feeds live AudioStream frames (10 ms) through AudioResampler(16000→24000) — e.g. @livekit/agents' base STT class resampling for OpenAI realtime STT (24 kHz) — sends near-silence to the provider. Server-side VAD never triggers, so the result is zero transcription events with no errors anywhere: a fully silent failure.
Repro
import { AudioFrame, AudioResampler } from '@livekit/rtc-node';
// s16: Int16Array of 16kHz mono speech (we used ~7s of TTS audio)
const rms = (a) => { let s = 0; for (let i = 0; i < a.length; i++) s += a[i] * a[i]; return Math.sqrt(s / a.length); };
console.log('input rms:', rms(s16));
for (const chunk of [160, 1600, 16000]) {
const r = new AudioResampler(16000, 24000);
const out = []; let n = 0;
for (let i = 0; i + chunk <= s16.length; i += chunk)
for (const f of r.push(new AudioFrame(s16.subarray(i, i + chunk), 16000, 1, chunk))) {
out.push(Int16Array.from(f.data)); n += f.samplesPerChannel;
}
const all = new Int16Array(n); let p = 0;
for (const s of out) { all.set(s, p); p += s.length; }
console.log(`chunk=${chunk}: outSamples=${n} rms=${rms(all).toFixed(1)}`);
}
Workaround
Request the target rate from AudioStream directly (new AudioStream(track, { sampleRate: 24000 })) so WebRTC's internal resampler runs instead — output is correct and downstream transcription works.
Versions
@livekit/rtc-node0.13.29, linux x64 (also reproduced inside the@livekit/agents1.4.4 STT pipeline, which constructsAudioResampler(16000, 24000)internally for realtime STT plugins).Behavior
Upsampling 16 kHz mono pcm16 speech to 24 kHz with
AudioResamplerdestroys the signal when the input is pushed in 10 ms frames (160 samples) — the framingAudioStreamemits and the realtime case. Larger pushes progressively recover:The output sample count is correct in all cases — only the content collapses. No aliasing/buffer-reuse involved (snapshots taken at return time equal late reads).
Impact
Any pipeline that feeds live
AudioStreamframes (10 ms) throughAudioResampler(16000→24000)— e.g.@livekit/agents' base STT class resampling for OpenAI realtime STT (24 kHz) — sends near-silence to the provider. Server-side VAD never triggers, so the result is zero transcription events with no errors anywhere: a fully silent failure.Repro
Workaround
Request the target rate from
AudioStreamdirectly (new AudioStream(track, { sampleRate: 24000 })) so WebRTC's internal resampler runs instead — output is correct and downstream transcription works.