Introducing Scenario: Use an Agent to Test Your Agent

Rogerio Chaves

Apr 8, 2025

Scenario is now live on GitHub — check it out and give it a ⭐️

Agent development today is stuck in a loop. You tweak a prompt, run a pretend conversation, hope it behaves better… and repeat. For every minor improvement, you risk breaking something else. It’s slow, manual, and fragile.

We’ve been there. So we built Scenario.

Scenario is an open-source testing library that uses an AI agent to test your AI agent, eliminating the tedious back-and-forth of manual testing and making sure your agent works in real-world scenarios — not just in your head.

The Problem: Agent Development is broken

If you've built an AI agent, you know the cycle:

Change a prompt ✍️
Talk to your agent pretending to be a user 💬
See if it works better 🤔
Repeat dozens of times 😫

This manual testing approach is time-consuming, non-repeatable, vibey-only and doesn't scale as your agent grows more complex. Without a way to automatically verify that each enhancement doesn't break existing functionality, agents become fragile and unpredictable. As a result, teams stop touching the code and the prompt, and stop moving forward.

How Scenario works

Scenario transforms agent testing by simulating how real users interact with your agent. Instead of writing rigid input-output tests that break with every prompt change, you define high-level scenarios and success criteria. Our testing agent then dynamically converses with your agent until goals are met or failures are detected.

For example, here's how you would test a complex code assistant or website builder agent, like Cursor or Lovable:

def test_generate_landing_page():
    scenario = Scenario(
        name="build landing page",
        user_goal="Generate a landing page with pricing table and hero section",
        success_criteria=[
            "Includes a pricing table",
            "Includes a hero section",
            "Outputs valid HTML"
        ]
    )
    run_scenario(scenario)

As you run it, you'll see the testing agents talking to your agent as if it was a user. It's mind-blowing.

Why You'll Love It

Natural Interactions: Testing agent simulates real user behavior with natural variation
End-to-End Testing: Tests the entire agent experience, not just isolated components, allowing for refactoring with confidence
Fast Feedback Loop: Quickly identify when changes break existing functionality
Debug Mode: Watch conversations unfold in slow motion and intervene for debugging
Feels Like Standard Testing: Integrates seamlessly with pytest and your existing CI/CD pipeline
Deterministic When Needed: Caching options for reproducible tests during development

Real-World Examples

Scenario excels at testing complex agent interactions:

Support bots: Ensure your agent handles common support requests correctly on long multi-turn conversations in various scenarios
Travel bookings: Test that your agent collects all required information and helps the user book hotel, flight and car through various scenarios
Coaching agents: Verify that the agent is able to coach the user through a large range of topics and conversations that may happen
Swarm of agents: Test that all the agents coordinate to complete multiple complex tasks
Building assistants: Validate your agent generates working code or rules and modify them as user requests mount

Install Scenario with pip:

Create your first scenario, and you're ready to test your agent with a single pytest command:

pytest test_my_agent.py -v

More Than Just Testing

Scenario helps you:

Document behavior: Scenarios serve as living documentation of how your agent should behave
Prevent regressions: Catch unintended side effects of prompt or model changes
Evolve with confidence: Add new features knowing existing functionality remains intact

Case Study: The Lovable Clone

For the launch of Scenario we wrote our own Lovable agent clone. If you are not familiar, Lovable is a website that allows you to create websites with a prompt. This is very powerful, but building a website is complex, for it to work well you need a very good prompt and then let the agent "run free" and deciding which tools to use.

Scenario then really helped us developing this Lovable clone, in this video, I show how it works and how can we extend it further while fully testing it with Scenario:

You can check out the full example on the GitHub examples folder too.

Try It Today

Visit GitHub for full examples:

🔁 Multi-turn tool-using agents
🐞 Debug mode
⚡ Parallel testing

👉 Star Scenario on GitHub 👉 Read the docs

Scenario is your agent's test suite. Use an agent to test your agent — and ship with confidence.

Scenario is now live on GitHub — check it out and give it a ⭐️

We’ve been there. So we built Scenario.

The Problem: Agent Development is broken

If you've built an AI agent, you know the cycle:

Change a prompt ✍️
Talk to your agent pretending to be a user 💬
See if it works better 🤔
Repeat dozens of times 😫

How Scenario works

For example, here's how you would test a complex code assistant or website builder agent, like Cursor or Lovable:

def test_generate_landing_page():
    scenario = Scenario(
        name="build landing page",
        user_goal="Generate a landing page with pricing table and hero section",
        success_criteria=[
            "Includes a pricing table",
            "Includes a hero section",
            "Outputs valid HTML"
        ]
    )
    run_scenario(scenario)

As you run it, you'll see the testing agents talking to your agent as if it was a user. It's mind-blowing.

Why You'll Love It

Natural Interactions: Testing agent simulates real user behavior with natural variation
End-to-End Testing: Tests the entire agent experience, not just isolated components, allowing for refactoring with confidence
Fast Feedback Loop: Quickly identify when changes break existing functionality
Debug Mode: Watch conversations unfold in slow motion and intervene for debugging
Feels Like Standard Testing: Integrates seamlessly with pytest and your existing CI/CD pipeline
Deterministic When Needed: Caching options for reproducible tests during development

Real-World Examples

Scenario excels at testing complex agent interactions:

Support bots: Ensure your agent handles common support requests correctly on long multi-turn conversations in various scenarios
Travel bookings: Test that your agent collects all required information and helps the user book hotel, flight and car through various scenarios
Coaching agents: Verify that the agent is able to coach the user through a large range of topics and conversations that may happen
Swarm of agents: Test that all the agents coordinate to complete multiple complex tasks
Building assistants: Validate your agent generates working code or rules and modify them as user requests mount

Install Scenario with pip:

Create your first scenario, and you're ready to test your agent with a single pytest command:

pytest test_my_agent.py -v

More Than Just Testing

Scenario helps you:

Document behavior: Scenarios serve as living documentation of how your agent should behave
Prevent regressions: Catch unintended side effects of prompt or model changes
Evolve with confidence: Add new features knowing existing functionality remains intact

Case Study: The Lovable Clone

Scenario then really helped us developing this Lovable clone, in this video, I show how it works and how can we extend it further while fully testing it with Scenario:

You can check out the full example on the GitHub examples folder too.

Try It Today

Visit GitHub for full examples:

🔁 Multi-turn tool-using agents
🐞 Debug mode
⚡ Parallel testing

👉 Star Scenario on GitHub 👉 Read the docs

Scenario is your agent's test suite. Use an agent to test your agent — and ship with confidence.

Scenario is now live on GitHub — check it out and give it a ⭐️

We’ve been there. So we built Scenario.

The Problem: Agent Development is broken

If you've built an AI agent, you know the cycle:

Change a prompt ✍️
Talk to your agent pretending to be a user 💬
See if it works better 🤔
Repeat dozens of times 😫

How Scenario works

For example, here's how you would test a complex code assistant or website builder agent, like Cursor or Lovable:

def test_generate_landing_page():
    scenario = Scenario(
        name="build landing page",
        user_goal="Generate a landing page with pricing table and hero section",
        success_criteria=[
            "Includes a pricing table",
            "Includes a hero section",
            "Outputs valid HTML"
        ]
    )
    run_scenario(scenario)

As you run it, you'll see the testing agents talking to your agent as if it was a user. It's mind-blowing.

Why You'll Love It

Natural Interactions: Testing agent simulates real user behavior with natural variation
End-to-End Testing: Tests the entire agent experience, not just isolated components, allowing for refactoring with confidence
Fast Feedback Loop: Quickly identify when changes break existing functionality
Debug Mode: Watch conversations unfold in slow motion and intervene for debugging
Feels Like Standard Testing: Integrates seamlessly with pytest and your existing CI/CD pipeline
Deterministic When Needed: Caching options for reproducible tests during development

Real-World Examples

Scenario excels at testing complex agent interactions:

Support bots: Ensure your agent handles common support requests correctly on long multi-turn conversations in various scenarios
Travel bookings: Test that your agent collects all required information and helps the user book hotel, flight and car through various scenarios
Coaching agents: Verify that the agent is able to coach the user through a large range of topics and conversations that may happen
Swarm of agents: Test that all the agents coordinate to complete multiple complex tasks
Building assistants: Validate your agent generates working code or rules and modify them as user requests mount

Install Scenario with pip:

Create your first scenario, and you're ready to test your agent with a single pytest command:

pytest test_my_agent.py -v

More Than Just Testing

Scenario helps you:

Document behavior: Scenarios serve as living documentation of how your agent should behave
Prevent regressions: Catch unintended side effects of prompt or model changes
Evolve with confidence: Add new features knowing existing functionality remains intact

Case Study: The Lovable Clone

Scenario then really helped us developing this Lovable clone, in this video, I show how it works and how can we extend it further while fully testing it with Scenario:

You can check out the full example on the GitHub examples folder too.

Try It Today

Visit GitHub for full examples:

🔁 Multi-turn tool-using agents
🐞 Debug mode
⚡ Parallel testing

👉 Star Scenario on GitHub 👉 Read the docs

Scenario is your agent's test suite. Use an agent to test your agent — and ship with confidence.

Boost your LLM's performance today

Get up and running with LangWatch in as little as 10 minutes.

Get started

Boost your LLM's performance today

Get up and running with LangWatch in as little as 10 minutes.

Get started

Boost your LLM's performance today

Get up and running with LangWatch in as little as 10 minutes.

Get started