Scenario MCP: Automatic Agent Test Generation inside your editor

Aryan

Nov 25, 2025

Many teams are not writing tests for their agents, just manually sending the same messages over and over again to see if it works. Scenarios are how you stress-test your agent so it doesn’t break in the hands of real users.

What’s still surprisingly hard is actually testing these agents end-to-end in a way that’s repeatable and trustworthy, instead of relying on one-off demos or manual checks.

In this video you can see how to quick-start proper testing using scenarios MCP and let AI write tests for you.

tl;dr

  • Your preferred agent can now write tests (scenarios) by using the LangWatch MCP to make your agents ready for the real world

  • Describe your scenario, trigger the MCP command, and Scenario writes a complete test file for you. No boilerplate, no custom test language, no extra setup.

To do this with your agent:

Step 1: Get the LangWatch MCP

find more details: https://docs.langwatch.ai/integration/mcp

{
  "mcpServers": {
    "langwatch": {
      "command": "npx",
      "args": [
        "-y",
        "@langwatch/mcp-server"
      ]
    }
  }
}

Step 2: Prompt your favorite agent to automatically have tests written.

Example prompt:

Here is an example scenario that checks for tool calls and also has criteria defined, you can also be more specific in your prompts to have more specific scenarios for your agent.

@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_conversation_summary_request(agent_adapter):
    """Explicit summary requests should call the conversation summary tool."""

    def verify_summary_call(state: scenario.ScenarioState) -> bool:
        args = _require_tool_call(state, "get_conversation_summary")
        assert "conversation_context" in args, "summary tool must include context reference"
        return True

    result = await scenario.run(
        name="conversation summary follow-up",
        description="Customer wants a recap of troubleshooting steps that were discussed.",
        agents=[
            agent_adapter,
            scenario.UserSimulatorAgent(),
            scenario.JudgeAgent(
                criteria=[
                    "Agent provides a clear recap",
                    "Agent confirms next steps and resources",
                ]
            ),
        ],
        script=[
            scenario.user("Thanks for explaining the dispute process earlier."),
            scenario.agent(),
            scenario.user(
                "Before we wrap, can you summarize everything we covered so I don't miss a step?"
            ),
            scenario.agent(),
            verify_summary_call,
            scenario.judge(),
        ],
    )

    assert result.success, result.reasoning

You can then execute the scenario tests that the agent generated for you, and see where is your agent failing:

That’s it! Checkout more about Scenario on the official docs, and check the LangWatch MCP documentation for more details on the MCP

Ship agents with confidence, not crossed fingers

Get up and running with LangWatch in as little as 5 minutes.

Ship agents with confidence, not crossed fingers

Get up and running with LangWatch in as little as 5 minutes.

Ship agents with confidence, not crossed fingers

Get up and running with LangWatch in as little as 5 minutes.