Skip to content

Observability

This recipe documents the hooks and result fields that let you instrument a voice scenario for offline review, performance measurement, and debugging without modifying your agent.

Pattern

Hooks

Register on_audio_chunk and on_voice_event callbacks on scenario.run():

from scenario.voice import AudioChunk, VoiceEvent
 
audio_chunks: list[AudioChunk] = []
voice_events: list[VoiceEvent] = []
 
def on_audio_chunk(chunk: AudioChunk) -> None:
    audio_chunks.append(chunk)
 
def on_voice_event(event: VoiceEvent) -> None:
    print(f"[voice_event] {event.type} @ {event.time:.3f}s")
    voice_events.append(event)
 
result = await scenario.run(
    name="my_scenario",
    ...
    on_audio_chunk=on_audio_chunk,
    on_voice_event=on_voice_event,
)

on_audio_chunk fires for every raw PCM16 audio chunk that flows through the pipeline. on_voice_event fires for higher-level events (turn start, turn end, VAD detection, etc.) and includes a wall-clock timestamp at event.time.

Saving the conversation audio

After the run, call result.audio.save(path) to write the conversation audio to disk:

if result.audio is not None:
    result.audio.save("recordings/my_scenario.wav")

Latency metrics

result.latency exposes a LatencyMetrics object with the following fields:

FieldDescription
time_to_first_byteWall-clock seconds from when the user turn ended to when the first byte of agent audio arrived.
avg_response_timeMean agent response time across all turns.
p50_response_timeMedian agent response time across all turns.
p95_response_time95th-percentile agent response time across all turns.
interrupt_response_timeTime from interrupt signal to agent resume (populated only when scenario.interrupt() was used).
if result.latency is not None:
    print(f"TTFB: {result.latency.time_to_first_byte:.3f}s")
    print(f"avg:  {result.latency.avg_response_time:.3f}s")
    print(f"p50:  {result.latency.p50_response_time:.3f}s")
    print(f"p95:  {result.latency.p95_response_time:.3f}s")

Worked example

observability.py — wires on_audio_chunk and on_voice_event callbacks, prints chunk and event counts post-run, asserts time_to_first_byte > 0, and raises a hard assertion if result.latency is None. Canonical reference for the full observability surface.

See also