Testing Voice Agents
Scenario tests voice agents end-to-end over real audio — it synthesizes a caller, speaks to your agent through its real transport, and judges the conversation. It’s the samescenario.run() and the same JudgeAgent you use for text — only the medium changes.

The highlights
Real-audio, multi-turn calls
A synthesized user actually speaks to your agent and reacts to its replies, turn after turn.
Audio effects
Inject background noise, phone-quality codec degradation, or custom WAV clips to test robustness.
Interruptions
Script the user barging in mid-reply — native barge-in or VAD-driven fallback per adapter.
Latency metrics
Time-to-first-byte and p50/p95 per turn, so you catch a slow agent before users do.
Works with your stack
One API, every major voice transport — pick the adapter that matches what you’ve already deployed:| Transport | Adapter |
|---|---|
| Pipecat / Twilio Media Streams | PipecatAgentAdapter |
| ElevenLabs hosted ConvAI | ElevenLabsAgentAdapter |
| OpenAI Realtime | OpenAIRealtimeAgentAdapter |
| Gemini Live | GeminiLiveAgentAdapter |
| Twilio phone number (PSTN) | TwilioAgentAdapter |
| Text-only agent (no transport yet) | ComposableVoiceAgent |
Full constructors, the per-adapter capability matrix, and guidance on choosing an adapter live in the Scenario docs.
Get it running
New to Scenario? Start with Getting Started for the basics (API key, writing your agent’scall() adapter), then come back for the voice specifics.
Point at your agent, add a voice user and a judge
Swap
BOT_WS_URL for your running bot (or use the adapter for your transport from the table above). The user simulator speaks with a real voice; the judge scores the call against your criteria.- Python
- TypeScript
Run it
Run your usual test command (
pytest or vitest). With LANGWATCH_API_KEY set, the run streams to the Simulations dashboard with full audio playback and per-segment transcripts, and writes a recordings/<scenario>/full.wav to listen back.Dig into the full docs
Voice quick start
The complete five-minute walkthrough with a worked example per adapter.
Audio effects & interruptions
Background noise, codec degradation, custom WAVs, and barge-in recipes.
Capability matrix
Exactly which features each adapter supports.
Runnable examples
Demos per adapter and use case on GitHub.