Skip to content

Troubleshooting voice scenarios

This page covers the failure modes that show up most often when running voice scenarios. Each entry states the symptom, what is actually going wrong, and the minimum steps to fix it. If your failure is not listed here, check the voice agents feature file for the full behavioral contract.


Current failure modes

"ElevenLabs HTTP 401 quota_exceeded"

Symptom: The ElevenLabs adapter connection is rejected with HTTP 401 and the body contains "quota_exceeded".

Diagnosis: Your ElevenLabs account has exhausted its character quota for the current billing period. The API refuses all new Convai WebSocket connections until the quota resets or is topped up.

Fix: Top up your character balance at elevenlabs.io/app/usage. After topping up, the next scenario run will connect normally. If you are on the free tier, upgrade to a paid plan to remove the hard quota.


"Twilio HTTP 401 code 20003"

Symptom: Twilio rejects the request with HTTP 401 and error code 20003 ("Authenticate").

Diagnosis: Your Twilio auth token has been rotated (or never set correctly). Twilio returns code 20003 specifically when the auth token in the request does not match the current token on the account — not when the account SID is wrong.

Fix:
  1. Regenerate the auth token at console.twilio.com → Account → General settings → Auth token.
  2. Copy the new primary auth token.
  3. Update TWILIO_AUTH_TOKEN in your .env (python/.env for the Python SDK; the env your test runner loads for TypeScript).
  4. Re-run the scenario.

"VAD didn't fire"

Symptom: The voice scenario hangs waiting for a speech-start event, or the agent never receives the user's audio, or result.turns is empty.

Diagnosis: The adapter's voice-activity detection did not fire on the incoming audio. Two sub-causes:

  1. Native VAD missing — adapters such as TwilioAgentAdapter / twilioAgent have native_vad/nativeVad = false. When this is the case, the SDK falls back to webrtcvad running on the incoming PCM16 audio stream (the Python webrtcvad-wheels build; a WASM webrtcvad build in TypeScript). A warning is emitted once per process when the fallback activates:

    Adapter 'TwilioAgentAdapter' has no native VAD — using SDK-side webrtcvad. Accuracy may differ from native VAD.
  2. Aggressiveness too high — the WebRTCVadFallback (Python) / voice.WebRTCVadFallback (TypeScript) default aggressiveness is 2 (0 = least selective, 3 = most selective). At level 3, low-energy speech or TTS audio may be classified as silence.

Fix:
  • Python: webrtcvad-wheels is a base dependency of langwatch-scenario (see pyproject.toml). If somehow it's not installed, pip install webrtcvad-wheels will fix it. See scenario.voice.vad.WebRTCVadFallback.

  • TypeScript: a pure-JS RMS-energy + hysteresis VAD ships with @langwatch/scenario as voice.WebRTCVadFallback — no extra install needed. Accuracy may differ from a native webrtcvad build; a WASM webrtcvad backend is deferred (see javascript/src/voice/vad.ts).

  • Lowering aggressiveness is not yet exposed as an adapter constructor parameter in either SDK; track this in a follow-up issue.

  • For adapters with native VAD (Pipecat, ElevenLabs, Gemini Live), a missing speech-start event usually means the bot-side VAD threshold is set too aggressively. Consult your bot framework's VAD configuration.


"ffmpeg not found for live playback"

Symptom: Live audio playback fails with an error resembling ffmpeg not found or imageio_ffmpeg.get_ffmpeg_exe() failed, or the scenario exits without playing audio during a live demo run.

Diagnosis: The Scenario SDK uses ffmpeg for live PCM → speaker playback and for transcoding recordings to compressed formats (.mp3 / .ogg / .flac).

  • Python bundles its own ffmpeg binary via imageio-ffmpeg (imageio_ffmpeg.get_ffmpeg_exe()). If imageio-ffmpeg is missing or its binary path is not resolvable, playback silently degrades (a DEBUG-level log) — but a missing dependency can also surface as an import error.
  • TypeScript uses the system ffmpeg on your PATH. WAV is written natively (no ffmpeg needed); only compressed-format transcoding and live playback require ffmpeg to be installed.
Fix:
python
# imageio-ffmpeg is a base dependency of langwatch-scenario; reinstall if missing
pip install imageio-ffmpeg

Then re-run the scenario — the SDK picks up the binary automatically (Python via imageio_ffmpeg.get_ffmpeg_exe(); TypeScript via the ffmpeg on PATH).


"Demo recording is empty"

Symptom: After a scenario run, the saved recording directory exists but the audio file is empty, manifest.json reports zero segments, or no audio plays back.

Diagnosis: The adapter's audio path was not wired correctly — audio chunks were never appended to the internal VoiceRecording buffer, so save_segments() (Python) / saveSegments() (TypeScript) wrote an empty recording. Common causes:

  • The adapter's on_audio_chunk callback was not registered, or was registered after the scenario started streaming audio.
  • The adapter connected but the bot never sent audio (check bot-side logs).
  • The scenario completed in fewer turns than expected, leaving a zero-length buffer.
Fix:
  1. Check recordings/<demo>/manifest.json (written by save_segments() / saveSegments()):

    cat recordings/<demo>/manifest.json

    Look at "segments" — a count of 0 means no audio was captured. A count > 0 with an empty file means the segment files are missing.

  2. Ensure the adapter is passed to scenario.run() before audio starts flowing. The adapter's connect() must complete before the bot begins transmitting.

  3. Check the bot-side logs to confirm it is sending audio frames. The adapter can only record what it receives.

  4. If using a custom adapter subclass, verify the recording buffer is appended inside the audio-receive loop — self._recording.append(chunk) in Python, or the equivalent append in your receiveAudio() override (subclass voice.VoiceAgentAdapter) in TypeScript.


receiveAudio timed out (hosted ElevenLabs)

If a hosted elevenLabsAgent scenario fails with ElevenLabsAgentAdapter: receiveAudio timed out, you almost certainly scripted more than one user() turn. The hosted ElevenLabs Conversational AI transport is server-VAD-driven and supports only a single greeting-led exchange: a scripted second user() turn does not re-engage the server's turn-taking, so the following agent() waits for a response that never arrives and times out.

Fix:
  • Use the single greeting-led shape: agent() → user("...") → agent() → judge(). Lead with agent() so the on-connect greeting (first_message) drains before your user audio hits the wire.
  • For genuine multi-turn voice (more than one user→agent exchange), switch to a composable adapter — ElevenLabsVoiceAgent, pipecatAgent, Gemini Live, or OpenAI Realtime — which own turn-taking in-process. See the multi-turn recipe.

"Audio duration mismatch" / "non-continuous audio input" warning

This warning is emitted by the ElevenLabs server, not the Scenario SDK. It is benign: it reflects that a scripted voice turn sends a discrete speech chunk plus a short silence pad rather than the continuous microphone stream ElevenLabs' VAD expects. It does not by itself indicate an SDK bug, and the single greeting-led exchange completes normally despite it. If you also see receiveAudio timed out, the cause is the multi-turn limitation above, not this warning.


Historical fixes

Resolved in prior versions
  • Gemini Live: agent reply is ~60 bytes on turn 2+ — Fixed in commit 760a464 (PR #355). The Gemini adapter emitted a spurious empty-interrupt turn between agent replies, which caused the second agent message to be truncated to the interrupt header bytes (~60 bytes) rather than the full reply. If you are on a build older than commit 760a464, upgrade. The mechanism: Gemini Live fires an interrupted event at the start of every agent turn (not just actual barge-ins); the adapter now filters these no-op interrupts before they reach the timeline so turn 2+ audio accumulates correctly.