Skip to main content
Fastest path: run /scenarios add voice testing to my agent in your coding assistant after installing the /scenarios skill — it detects your transport, picks the matching adapter, and wires it to your deployed agent. Prefer to do it by hand? Follow the steps below.

Testing Voice Agents

Scenario tests voice agents end-to-end over real audio — it synthesizes a caller, speaks to your agent through its real transport, and judges the conversation. It’s the same scenario.run() and the same JudgeAgent you use for text — only the medium changes. Voice Agent Testing

The highlights

Real-audio, multi-turn calls

A synthesized user actually speaks to your agent and reacts to its replies, turn after turn.

Audio effects

Inject background noise, phone-quality codec degradation, or custom WAV clips to test robustness.

Interruptions

Script the user barging in mid-reply — native barge-in or VAD-driven fallback per adapter.

Latency metrics

Time-to-first-byte and p50/p95 per turn, so you catch a slow agent before users do.

Works with your stack

One API, every major voice transport — pick the adapter that matches what you’ve already deployed:
TransportAdapter
Pipecat / Twilio Media StreamsPipecatAgentAdapter
ElevenLabs hosted ConvAIElevenLabsAgentAdapter
OpenAI RealtimeOpenAIRealtimeAgentAdapter
Gemini LiveGeminiLiveAgentAdapter
Twilio phone number (PSTN)TwilioAgentAdapter
Text-only agent (no transport yet)ComposableVoiceAgent
Full constructors, the per-adapter capability matrix, and guidance on choosing an adapter live in the Scenario docs.

Get it running

New to Scenario? Start with Getting Started for the basics (API key, writing your agent’s call() adapter), then come back for the voice specifics.
1

Install Scenario

pip install langwatch-scenario
2

Point at your agent, add a voice user and a judge

Swap BOT_WS_URL for your running bot (or use the adapter for your transport from the table above). The user simulator speaks with a real voice; the judge scores the call against your criteria.
import scenario

scenario.configure(default_model="openai/gpt-5-mini")

BOT_WS_URL = "ws://localhost:8765/stream"  # your deployed bot

result = await scenario.run(
    name="billing inquiry",
    description="An upset customer calls about a duplicate charge.",
    agents=[
        scenario.PipecatAgentAdapter(url=BOT_WS_URL),
        scenario.UserSimulatorAgent(
            voice="elevenlabs/EXAVITQu4vr4xnSDxMaL",
            audio_effects=[scenario.effects.phone_quality()],
        ),
        scenario.JudgeAgent(criteria=[
            "Acknowledged the frustration before logistics",
            "Verified identity before any account action",
        ]),
    ],
    script=[scenario.agent(), scenario.user(), scenario.proceed(turns=5), scenario.judge()],
)
assert result.success
3

Run it

Run your usual test command (pytest or vitest). With LANGWATCH_API_KEY set, the run streams to the Simulations dashboard with full audio playback and per-segment transcripts, and writes a recordings/<scenario>/full.wav to listen back.
Voice scenarios are slower than text — TTS + transport + multi-turn means 30–120s per run, so give your test runner a generous timeout.

Dig into the full docs

Voice quick start

The complete five-minute walkthrough with a worked example per adapter.

Audio effects & interruptions

Background noise, codec degradation, custom WAVs, and barge-in recipes.

Capability matrix

Exactly which features each adapter supports.

Runnable examples

Demos per adapter and use case on GitHub.