Ship reliable, testable agents – not guesses. Better Agents adds simulations, evaluations, and standards on top of any framework. Explore Better Agents
Set up agent testing with Scenario to test agent behavior through user simulations and edge cases
Automatically instrument your code with LangWatch tracing for any framework (OpenAI, Agno, Mastra, DSPy, and more)
Create and manage prompts using LangWatch’s prompt management system
Set up evaluations to test and monitor your LLM outputs
Add labels, metadata, and custom tracking following LangWatch best practices
Instead of manually reading docs and writing boilerplate code, just ask your AI assistant to instrument your codebase with LangWatch, and it will do it for you.
Refer to your editor’s MCP documentation for the specific configuration file location.
2
Start using it
Open your AI assistant chat (e.g., Cmd/Ctrl + I in Cursor, or Cmd/Ctrl + Shift + P > “Claude Code: Open Chat” in Claude Code) and ask it to help with LangWatch tasks.
Simply ask your AI assistant to write scenario tests for your agents:
Copy
"Write a scenario test that checks the agent calls the summarization tool when requested"
The AI assistant will:
Fetch the Scenario documentation and best practices
Create test files with proper imports and setup
Write scenario scripts that simulate user interactions
Add verification logic to check agent behavior
Include judge criteria to evaluate conversation quality
Example scenario test:Here’s an example scenario that checks for tool calls and includes criteria validation:
Copy
@pytest.mark.agent_test@pytest.mark.asyncioasync def test_conversation_summary_request(agent_adapter): """Explicit summary requests should call the conversation summary tool.""" def verify_summary_call(state: scenario.ScenarioState) -> bool: args = _require_tool_call(state, "get_conversation_summary") assert "conversation_context" in args, "summary tool must include context reference" return True result = await scenario.run( name="conversation summary follow-up", description="Customer wants a recap of troubleshooting steps that were discussed.", agents=[ agent_adapter, scenario.UserSimulatorAgent(), scenario.JudgeAgent( criteria=[ "Agent provides a clear recap", "Agent confirms next steps and resources", ] ), ], script=[ scenario.user("Thanks for explaining the dispute process earlier."), scenario.agent(), scenario.user( "Before we wrap, can you summarize everything we covered so I don't miss a step?" ), scenario.agent(), verify_summary_call, scenario.judge(), ], ) assert result.success, result.reasoning
The LangWatch MCP automatically handles fetching the right documentation, understanding your agent’s framework, and generating tests that follow Scenario best practices.
The LangWatch MCP is so powerful that it can help AI agents automatically instrument themselves while being built. This enables self-improving AI systems that can track and debug their own behavior.