# Scenario: Agent Testing with Simulation-Based Workflows

> Test AI agents with simulation-based testing. LLM-powered user simulators validate agent behavior, tool calling, and multi-turn conversations in LangGraph, CrewAI, Pydantic AI.

## Docs

- [Agent Integration](/scenario/agent-integration.md): Learn how to connect your existing AI agents to the Scenario testing framework
- [Community & Support](/scenario/community-support.md): Join Scenario community to connect with other users, get help, and explore the agent development ecosystem
- [Visualizing Simulations](/scenario/visualizations.md): The LangWatch Simulations visualizer enables inspection and analysis of agent test results. The interface provides detailed views of simulation runs, helping you debug agent behavior and collaborate with your team.
- [Blackbox Testing](/scenario/testing-guides/blackbox-testing.md): Blackbox testing evaluates software functionality through external interfaces without examining internal implementation. [Learn more about blackbox testing](https://martinfowler.com/bliki/BlackBoxTesting.html).
- [Testing Customer Support Agents](/scenario/testing-guides/customer-support-agent.md): Validating multi-agent coordination and intelligent escalation
- [Test Fixtures](/scenario/testing-guides/fixtures.md): Fixtures are static assets (images, audio clips, JSON payloads) loaded at runtime by scenario tests to ensure **repeatable, offline-friendly** execution.
- [Mocking External Dependencies](/scenario/testing-guides/mocks.md): Mocks are simulated implementations of external dependencies (APIs, databases, tools, or services) that scenario tests use to ensure **deterministic, offline-friendly** execution.
- [Testing SQL Agents](/scenario/testing-guides/sql-agent.md): Validating SQL generation and safe database access
- [Testing Tool Calls in Scenarios](/scenario/testing-guides/tool-calling.md): Tool calls are essential to modern agent workflows. This guide covers verifying tool usage, asserting on tool call behavior, and mocking tool responses for robust, deterministic tests.
- [What is Scenario?](/scenario/scenario.md): Scenario is an Agent Testing Framework based on simulations
- [Getting Started: Your First Scenario](/scenario/introduction/getting-started.md): Install the Scenario SDK and a test runner (pytest or vitest):
- [Simulation-Based Testing for AI Agents](/scenario/introduction/simulation-based-testing.md): Traditional evaluation methods designed for static, single-turn LLMs cannot adequately test agents. Agents are stateful, dynamic systems that make decisions over time, recover from errors, and adapt to new information. Building robust, autonomous agents requires evaluation with the same rigor used to design their architecture.
- [Testing Remote Agents](/scenario/examples/testing-remote-agents.md): Adapters for HTTP-deployed agents
- [JSON Response Pattern](/scenario/examples/testing-remote-agents/json.md): Test [agents](/agent-integration) that return complete JSON responses via HTTP POST requests. This is the most common pattern for REST APIs.
- [Server-Sent Events (SSE) Pattern](/scenario/examples/testing-remote-agents/sse.md): Test [agents](/agent-integration) that stream responses using Server-Sent Events format. SSE uses `data:` prefixed lines and ends with `data: [DONE]`.
- [Stateful Pattern with Thread ID](/scenario/examples/testing-remote-agents/stateful.md): Test [agents](/agent-integration) that maintain conversation history server-side using thread identifiers. The [adapter](/agent-integration) sends only the latest message plus a thread ID, and the server looks up the full history.
- [Streaming Response Pattern](/scenario/examples/testing-remote-agents/streaming.md): Test [agents](/agent-integration) that stream responses in chunks using HTTP chunked transfer encoding. This provides progressive output as the LLM generates text.
- [Audio → Audio Testing](/scenario/examples/multimodal/audio-to-audio.md): Test agents that listen to audio input and reply with audio responses. This pattern is ideal for voice assistants, conversational AI, and any agent that needs to respond in a natural spoken voice.
- [Audio → Text Testing](/scenario/examples/multimodal/audio-to-text.md): Test agents that receive audio files (like WAV or MP3) and respond with textual answers. This pattern is ideal for transcription services, audio-based Q\&A, or any agent that processes voice input but responds in text.
- [Multimodal File Analysis](/scenario/examples/multimodal/multimodal-files.md): This page demonstrates how to write Scenario tests where the user provides **files** (PDF, CSV, etc.) as part of the conversation and the agent must parse and respond appropriately.
- [Multimodal Image Generation (Coming Soon)](/scenario/examples/multimodal/multimodal-image-generation.md): This page will demonstrate how to build Scenario tests where the agent generates **images from text prompts** and the judge evaluates both the prompt quality and the generated image quality.
- [Multimodal Image Analysis](/scenario/examples/multimodal/multimodal-images.md): Use Case
- [Multimodal Use Cases – Overview](/scenario/examples/multimodal/overview.md): Many modern agents must process more than just text. Scenario supports tests where your agent receives **images**, **files**, **audio**, and other modalities – individually or combined.
- [Testing Voice Agents](/scenario/examples/multimodal/testing-voice-agents.md): Scenario lets you write **end-to-end tests** for agents that listen to audio, think, and respond with either text *or* audio.
- [Voice-to-Voice Conversation Testing](/scenario/examples/multimodal/voice-to-voice.md): Test complete voice conversations where both the user simulator *and* the agent speak over multiple turns. This pattern is perfect for testing complex dialogues, multi-turn reasoning, and natural conversation flow.
- [Domain-Driven TDD](/scenario/best-practices/domain-driven-tdd.md): Using scenarios to drive implementation
- [The Agent Testing Pyramid](/scenario/best-practices/the-agent-testing-pyramid.md): Ever since we have given tools in the hands of LLMs, we have been facing a persistent challenge: how do we systematically ensure that our agents really work? How do evals fit the picture? How do we know it’s reliable enough?
- [The Vibe-Eval Loop: TDD for Agents](/scenario/best-practices/the-vibe-eval-loop.md): The current state of the industry is that most people are building AI agents relying on **vibes-only**. This is great for quick POCs, but super hard to keep evolving past the initial demo stage. The biggest challenge is capturing all the edge cases people identify along the way, plus fixing them and proving that it works better after.
- [Cache](/scenario/basics/cache.md): Make your scenario tests deterministic and faster by caching LLM calls and other non-deterministic operations
- [CI/CD Integration](/scenario/basics/ci-cd-integration.md): Automate your Scenario tests in CI/CD pipelines to catch regressions early and maintain AI agent quality across your team. This guide shows you how to set up GitHub Actions for both TypeScript and Python.
- [Core Concepts](/scenario/basics/concepts.md): Scenario uses simulation testing to validate AI agents. This methodology simulates different situations and user interactions, then evaluates responses against defined criteria or custom assertions.
- [Configuration](/scenario/basics/configuration.md): Scenario supports flexible configuration via several different methods. Not all methods are available in all implementations. The table below shows which methods are available in your implementation:
- [Custom Clients](/scenario/basics/custom-clients.md): Advanced configuration for custom LLM clients and parameters
- [Debug Mode](/scenario/basics/debug-mode.md): Step through scenarios interactively
- [Judge Agent](/scenario/basics/judge-agent.md): The **Judge Agent** is an LLM-powered evaluator that automatically determines whether your agent under test meets defined success criteria. Instead of writing complex assertion logic, you describe what success looks like in natural language, and the judge evaluates each conversation turn to decide whether to continue, succeed, or fail the test.
- [No-Code Scenario Builder](/scenario/basics/no-code-scenario.md): The visual scenario builder enables product managers, QA teams, and domain experts to create and refine test scenarios alongside engineers. Whether you use AI generation or build scenarios manually, everyone can contribute to improving agent quality without writing code.
- [Scripted Simulations: Precise Flow Control](/scenario/basics/scripted-simulations.md): Automatic simulations provide powerful testing capabilities. For scenarios requiring precise control, use scripted simulations to orchestrate exactly how conversations unfold, when evaluations occur, and what custom logic runs at each step.
- [Test Runner Integration](/scenario/basics/test-runner-integration.md): Scenario supports seamless integration with popular test runners in both JavaScript/TypeScript and Python. Please refer to the section for your language below.
- [User Simulator Agent](/scenario/basics/user-simulator.md): The **User Simulator Agent** is an LLM-powered agent that simulates realistic user behavior during scenario tests. Instead of writing scripted user messages, you describe the user's context and goals, and the simulator generates natural, contextually appropriate messages that drive the conversation forward.
- [Writing Effective Scenarios](/scenario/basics/writing-scenarios.md): Every scenario test follows this basic pattern:
- [Inngest AgentKit Integration](/scenario/agent-integration/agentkit.md): Learn how to integrate AgentKit agents with the Scenario testing framework
- [Agno Integration](/scenario/agent-integration/agno.md): Learn how to integrate Agno agents with the Scenario testing framework
- [CrewAI Integration](/scenario/agent-integration/crewai.md): Learn how to integrate CrewAI crews with the Scenario testing framework
- [Google ADK Integration](/scenario/agent-integration/google-adk.md): Learn how to integrate Google ADK agents with the Scenario testing framework
- [HTTPS Integration](/scenario/agent-integration/https.md): Learn how to test agents deployed as HTTP/HTTPS services
- [LangGraph Integration](/scenario/agent-integration/langgraph.md): Learn how to integrate LangGraph agents with the Scenario testing framework
- [LiteLLM Integration](/scenario/agent-integration/litellm.md): Learn how to integrate LiteLLM agents with the Scenario testing framework
- [Mastra Integration](/scenario/agent-integration/mastra.md): Learn how to integrate Mastra agents with the Scenario testing framework
- [OpenAI Integration](/scenario/agent-integration/openai.md): Learn how to integrate OpenAI agents with the Scenario testing framework
- [Pydantic AI Integration](/scenario/agent-integration/pydantic-ai.md): Learn how to integrate Pydantic AI agents with the Scenario testing framework
- [Vercel AI SDK Integration](/scenario/agent-integration/vercel-ai.md): Learn how to integrate Vercel AI SDK agents with the Scenario testing framework
- [Async-Native Parallelism](/scenario/advanced/async-native-parallelism.md): When to prefer scenario.arun over the default threaded run
- [Custom Clients](/scenario/advanced/custom-clients.md): Advanced configuration for custom LLM clients and parameters
- [Custom Judge](/scenario/advanced/custom-judge.md): The built-in `JudgeAgent` handles most evaluation needs out of the box. But sometimes you need more control: a domain-specific prompt, a different LLM provider, or a custom agent framework for evaluation.
- [Custom Observability](/scenario/advanced/custom-observability.md): Control which spans Scenario exports
- [How Judging Works](/scenario/advanced/how-judging-works.md): Inside the built-in judge
- [Red Teaming](/scenario/advanced/red-teaming.md): `RedTeamAgent` is a drop-in replacement for `UserSimulatorAgent` that runs structured adversarial attacks against your agent. It plugs into the same `scenario.run()` loop, judges, and CI pipeline.
- [Quick Start](/scenario/advanced/red-teaming/quick-start.md): Generate a multi-turn adversarial test for your agent and run it as part of your normal test suite. Takes about five minutes.