- Set up agent testing with Scenario to test agent behavior through user simulations and edge cases
- Automatically instrument your code with LangWatch tracing for any framework (OpenAI, Agno, Mastra, DSPy, and more)
- Set up evaluations to test and monitor your LLM outputs
- Search and inspect traces from your LangWatch project directly in your editor
- Query analytics to understand performance trends, costs, and error rates
- Manage prompts — list, create, update, and version prompts without leaving your IDE
Setup
Get your API key
Go to your LangWatch project Settings page and copy your API key. The API key is required for observability and prompt tools. Documentation tools work without it.
Configure your MCP
- Claude Code
- Copilot
- Cursor
- ChatGPT
- Claude Chat
- Other
Run this command to add the MCP server:Or add it manually to your See the Claude Code MCP documentation for more details.
~/.claude.json:Configuration
| Environment Variable | CLI Argument | Description |
|---|---|---|
LANGWATCH_API_KEY | --apiKey | API key for authentication |
LANGWATCH_ENDPOINT | --endpoint | API endpoint (default: https://app.langwatch.ai) |
Two Modes
The MCP server runs in two modes:- Local (stdio): Default. Runs as a subprocess of your coding assistant (Claude Code, Copilot, Cursor). API key set via
--apiKeyflag orLANGWATCH_API_KEYenv var. - Remote (HTTP/SSE): For web-based assistants (ChatGPT, Claude Chat). Hosted at
https://mcp.langwatch.ai. API key sent asAuthorization: Bearer <key>per session — each user brings their own key.
Usage Examples
Write Agent Tests with Scenario
Simply ask your AI assistant to write scenario tests for your agents:- Fetch the Scenario documentation and best practices
- Create test files with proper imports and setup
- Write scenario scripts that simulate user interactions
- Add verification logic to check agent behavior
- Include judge criteria to evaluate conversation quality
Instrument Your Code with LangWatch
Simply ask your AI assistant to add LangWatch tracking to your existing code:- Fetch the relevant LangWatch documentation for your framework
- Add the necessary imports and setup code
- Wrap your functions with
@langwatch.trace()decorators - Configure automatic tracking for your LLM calls
- Add labels and metadata following best practices
Set Up Evaluations
Ask your AI assistant to set up evaluation code for your LLM outputs:- Fetch the relevant LangWatch evaluation documentation
- Create evaluation notebooks or scripts with proper setup
- Add evaluation metrics and criteria for your use case
- Include code to run evaluations following Evaluating via Code
Search and Debug Traces
Ask your AI assistant to find and analyze traces from your project:search_traces to find matching traces and get_trace to drill into individual ones. Traces are returned as AI-readable digests by default, showing the full span hierarchy with timing, inputs, outputs, and errors.
Query Analytics
Ask about performance trends, costs, and usage patterns:discover_schema to understand available metrics and filters, then uses get_analytics to query timeseries data.
Manage Prompts
Ask your AI assistant to work with prompts:Advanced: Self-Building AI Agents
The LangWatch MCP is so powerful that it can help AI agents automatically instrument themselves while being built. This enables self-improving AI systems that can track and debug their own behavior.MCP Tools Reference
The MCP server provides 10 tools organized into three categories. Your AI assistant automatically chooses the right tools based on your request.Documentation
| Tool | Description |
|---|---|
fetch_langwatch_docs | Fetch LangWatch integration docs |
fetch_scenario_docs | Fetch Scenario agent testing docs |
Observability (requires API key)
| Tool | Description |
|---|---|
discover_schema | Explore available filter fields, metrics, aggregation types, and group-by options |
search_traces | Search traces with filters, text query, and date range. Returns AI-readable digests by default |
get_trace | Get full trace details by ID with span hierarchy, evaluations, and metadata |
get_analytics | Query timeseries analytics (costs, latency, token usage, etc.) |
Prompts (requires API key)
| Tool | Description |
|---|---|
list_prompts | List all prompts in the project |
get_prompt | Get a prompt with messages, model config, and version history |
create_prompt | Create a new prompt with messages and model configuration |
update_prompt | Update a prompt or create a new version |
Tool Details
discover_schema
Discover available filter fields, metrics, aggregation types, and group-by options for LangWatch queries. Call this before using search_traces or get_analytics to understand available options.
Parameters:
category(required): One of"filters","metrics","aggregations","groups", or"all"
search_traces
Search traces with filters, text query, and date range. Returns AI-readable trace digests by default.
Parameters:
query(optional): Text search querystartDate(optional): Start date — ISO string or relative like"24h","7d","30d". Default: 24h agoendDate(optional): End date — ISO string or relative. Default: nowfilters(optional): Filter object (e.g.{"metadata.labels": ["production"]})pageSize(optional): Results per page (default: 25, max: 1000)scrollId(optional): Pagination token from previous searchformat(optional):"digest"(default, AI-readable) or"json"(full raw data)
get_trace
Get full details of a single trace by ID. Returns AI-readable trace digest by default.
Parameters:
traceId(required): The trace ID to retrieveformat(optional):"digest"(default, AI-readable) or"json"(full raw data)
get_analytics
Query analytics timeseries from LangWatch. Metrics use "category.name" format (e.g., "performance.completion_time").
Parameters:
metric(required): Metric in"category.name"format (e.g.,"metadata.trace_id","performance.total_cost")aggregation(optional):avg,sum,min,max,median,p90,p95,p99,cardinality,terms. Default:avgstartDate(optional): Start date — ISO string or relative. Default: 7 days agoendDate(optional): End date. Default: nowgroupBy(optional): Group results by fieldfilters(optional): Filters to applytimeZone(optional): Timezone. Default: UTC
list_prompts
List all prompts configured in the LangWatch project. No parameters required.
get_prompt
Get a specific prompt by ID or handle, including messages, model config, and version history.
Parameters:
idOrHandle(required): Prompt ID or handleversion(optional): Specific version number (default: latest)
create_prompt
Create a new prompt in the LangWatch project.
Parameters:
name(required): Prompt namemessages(required): Array of{role, content}messagesmodel(required): Model name (e.g.,"gpt-4o","claude-sonnet-4-5-20250929")modelProvider(required): Provider name (e.g.,"openai","anthropic")handle(optional): URL-friendly handledescription(optional): Prompt description
update_prompt
Update an existing prompt or create a new version.
Parameters:
idOrHandle(required): Prompt ID or handle to updatemessages(optional): Updated messages arraymodel(optional): Updated model namemodelProvider(optional): Updated providercreateVersion(optional): Iftrue, creates a new version instead of updating in placecommitMessage(optional): Commit message for the change
fetch_langwatch_docs
Fetches LangWatch documentation pages to understand how to implement features.
Parameters:
url(optional): The full URL of a specific doc page. If not provided, fetches the docs index.
fetch_scenario_docs
Fetches Scenario documentation pages to understand how to write agent tests.
Parameters:
url(optional): The full URL of a specific doc page. If not provided, fetches the docs index.
Your AI assistant will automatically choose the right tools based on your request. You don’t need to call these tools manually.