Skip to main content
The LangWatch MCP Server gives your AI coding assistant (Cursor, Claude Code, Codex, etc.) full access to all LangWatch and Scenario documentation and features via the Model Context Protocol.
  • Set up agent testing with Scenario to test agent behavior through user simulations and edge cases
  • Automatically instrument your code with LangWatch tracing for any framework (OpenAI, Agno, Mastra, DSPy, and more)
  • Set up evaluations to test and monitor your LLM outputs
  • Search and inspect traces from your LangWatch project directly in your editor
  • Query analytics to understand performance trends, costs, and error rates
  • Manage prompts — list, create, update, and version prompts without leaving your IDE
Instead of manually reading docs and writing boilerplate code, just ask your AI assistant to instrument your codebase with LangWatch, and it will do it for you.

Setup

1

Get your API key

Go to your LangWatch project Settings page and copy your API key. The API key is required for observability and prompt tools. Documentation tools work without it.
2

Configure your MCP

Run this command to add the MCP server:
claude mcp add langwatch -- npx -y @langwatch/mcp-server --apiKey your-api-key-here
Or add it manually to your ~/.claude.json:
{
  "mcpServers": {
    "langwatch": {
      "command": "npx",
      "args": ["-y", "@langwatch/mcp-server"],
      "env": {
        "LANGWATCH_API_KEY": "your-api-key-here"
      }
    }
  }
}
See the Claude Code MCP documentation for more details.
3

Start using it

Open your AI assistant chat (e.g., Cmd/Ctrl + I in Cursor, or Cmd/Ctrl + Shift + P > “Claude Code: Open Chat” in Claude Code) and ask it to help with LangWatch tasks.

Configuration

Environment VariableCLI ArgumentDescription
LANGWATCH_API_KEY--apiKeyAPI key for authentication
LANGWATCH_ENDPOINT--endpointAPI endpoint (default: https://app.langwatch.ai)

Two Modes

The MCP server runs in two modes:
  • Local (stdio): Default. Runs as a subprocess of your coding assistant (Claude Code, Copilot, Cursor). API key set via --apiKey flag or LANGWATCH_API_KEY env var.
  • Remote (HTTP/SSE): For web-based assistants (ChatGPT, Claude Chat). Hosted at https://mcp.langwatch.ai. API key sent as Authorization: Bearer <key> per session — each user brings their own key.

Usage Examples

Write Agent Tests with Scenario

Simply ask your AI assistant to write scenario tests for your agents:
"Write a scenario test that checks the agent calls the summarization tool when requested"
The AI assistant will:
  1. Fetch the Scenario documentation and best practices
  2. Create test files with proper imports and setup
  3. Write scenario scripts that simulate user interactions
  4. Add verification logic to check agent behavior
  5. Include judge criteria to evaluate conversation quality
Example scenario test: Here’s an example scenario that checks for tool calls and includes criteria validation:
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_conversation_summary_request(agent_adapter):
    """Explicit summary requests should call the conversation summary tool."""

    def verify_summary_call(state: scenario.ScenarioState) -> bool:
        args = _require_tool_call(state, "get_conversation_summary")
        assert "conversation_context" in args, "summary tool must include context reference"
        return True

    result = await scenario.run(
        name="conversation summary follow-up",
        description="Customer wants a recap of troubleshooting steps that were discussed.",
        agents=[
            agent_adapter,
            scenario.UserSimulatorAgent(),
            scenario.JudgeAgent(
                criteria=[
                    "Agent provides a clear recap",
                    "Agent confirms next steps and resources",
                ]
            ),
        ],
        script=[
            scenario.user("Thanks for explaining the dispute process earlier."),
            scenario.agent(),
            scenario.user(
                "Before we wrap, can you summarize everything we covered so I don't miss a step?"
            ),
            scenario.agent(),
            verify_summary_call,
            scenario.judge(),
        ],
    )

    assert result.success, result.reasoning
The LangWatch MCP automatically handles fetching the right documentation, understanding your agent’s framework, and generating tests that follow Scenario best practices.

Instrument Your Code with LangWatch

Simply ask your AI assistant to add LangWatch tracking to your existing code:
"Please instrument my code with LangWatch"
The AI assistant will:
  1. Fetch the relevant LangWatch documentation for your framework
  2. Add the necessary imports and setup code
  3. Wrap your functions with @langwatch.trace() decorators
  4. Configure automatic tracking for your LLM calls
  5. Add labels and metadata following best practices
Example transformation: Before:
from openai import OpenAI

client = OpenAI()

def chat(message: str):
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": message}]
    )
    return response.choices[0].message.content
After (automatically added by AI assistant):
from openai import OpenAI
import langwatch

client = OpenAI()
langwatch.setup()

@langwatch.trace()
def chat(message: str):
    langwatch.get_current_trace().autotrack_openai_calls(client)
    langwatch.get_current_trace().update(
        metadata={"labels": ["document_parsing"]}
    )

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": message}]
    )
    return response.choices[0].message.content

Set Up Evaluations

Ask your AI assistant to set up evaluation code for your LLM outputs:
"Create a notebook to evaluate the faithfulness of my RAG pipeline using LangWatch's Evaluating via Code guide"
The AI assistant will:
  1. Fetch the relevant LangWatch evaluation documentation
  2. Create evaluation notebooks or scripts with proper setup
  3. Add evaluation metrics and criteria for your use case
  4. Include code to run evaluations following Evaluating via Code

Search and Debug Traces

Ask your AI assistant to find and analyze traces from your project:
"Search for traces with errors in the last 24 hours"
The AI assistant will use search_traces to find matching traces and get_trace to drill into individual ones. Traces are returned as AI-readable digests by default, showing the full span hierarchy with timing, inputs, outputs, and errors.

Query Analytics

Ask about performance trends, costs, and usage patterns:
"Show me the total LLM cost for the last 7 days"
The assistant starts with discover_schema to understand available metrics and filters, then uses get_analytics to query timeseries data.

Manage Prompts

Ask your AI assistant to work with prompts:
"List all prompts in my LangWatch project"
The AI assistant will guide you through creating, versioning, and using prompts from LangWatch’s Prompt Management.

Advanced: Self-Building AI Agents

The LangWatch MCP is so powerful that it can help AI agents automatically instrument themselves while being built. This enables self-improving AI systems that can track and debug their own behavior.

MCP Tools Reference

The MCP server provides 10 tools organized into three categories. Your AI assistant automatically chooses the right tools based on your request.

Documentation

ToolDescription
fetch_langwatch_docsFetch LangWatch integration docs
fetch_scenario_docsFetch Scenario agent testing docs

Observability (requires API key)

ToolDescription
discover_schemaExplore available filter fields, metrics, aggregation types, and group-by options
search_tracesSearch traces with filters, text query, and date range. Returns AI-readable digests by default
get_traceGet full trace details by ID with span hierarchy, evaluations, and metadata
get_analyticsQuery timeseries analytics (costs, latency, token usage, etc.)

Prompts (requires API key)

ToolDescription
list_promptsList all prompts in the project
get_promptGet a prompt with messages, model config, and version history
create_promptCreate a new prompt with messages and model configuration
update_promptUpdate a prompt or create a new version

Tool Details

discover_schema

Discover available filter fields, metrics, aggregation types, and group-by options for LangWatch queries. Call this before using search_traces or get_analytics to understand available options. Parameters:
  • category (required): One of "filters", "metrics", "aggregations", "groups", or "all"

search_traces

Search traces with filters, text query, and date range. Returns AI-readable trace digests by default. Parameters:
  • query (optional): Text search query
  • startDate (optional): Start date — ISO string or relative like "24h", "7d", "30d". Default: 24h ago
  • endDate (optional): End date — ISO string or relative. Default: now
  • filters (optional): Filter object (e.g. {"metadata.labels": ["production"]})
  • pageSize (optional): Results per page (default: 25, max: 1000)
  • scrollId (optional): Pagination token from previous search
  • format (optional): "digest" (default, AI-readable) or "json" (full raw data)

get_trace

Get full details of a single trace by ID. Returns AI-readable trace digest by default. Parameters:
  • traceId (required): The trace ID to retrieve
  • format (optional): "digest" (default, AI-readable) or "json" (full raw data)

get_analytics

Query analytics timeseries from LangWatch. Metrics use "category.name" format (e.g., "performance.completion_time"). Parameters:
  • metric (required): Metric in "category.name" format (e.g., "metadata.trace_id", "performance.total_cost")
  • aggregation (optional): avg, sum, min, max, median, p90, p95, p99, cardinality, terms. Default: avg
  • startDate (optional): Start date — ISO string or relative. Default: 7 days ago
  • endDate (optional): End date. Default: now
  • groupBy (optional): Group results by field
  • filters (optional): Filters to apply
  • timeZone (optional): Timezone. Default: UTC

list_prompts

List all prompts configured in the LangWatch project. No parameters required.

get_prompt

Get a specific prompt by ID or handle, including messages, model config, and version history. Parameters:
  • idOrHandle (required): Prompt ID or handle
  • version (optional): Specific version number (default: latest)

create_prompt

Create a new prompt in the LangWatch project. Parameters:
  • name (required): Prompt name
  • messages (required): Array of {role, content} messages
  • model (required): Model name (e.g., "gpt-4o", "claude-sonnet-4-5-20250929")
  • modelProvider (required): Provider name (e.g., "openai", "anthropic")
  • handle (optional): URL-friendly handle
  • description (optional): Prompt description

update_prompt

Update an existing prompt or create a new version. Parameters:
  • idOrHandle (required): Prompt ID or handle to update
  • messages (optional): Updated messages array
  • model (optional): Updated model name
  • modelProvider (optional): Updated provider
  • createVersion (optional): If true, creates a new version instead of updating in place
  • commitMessage (optional): Commit message for the change

fetch_langwatch_docs

Fetches LangWatch documentation pages to understand how to implement features. Parameters:
  • url (optional): The full URL of a specific doc page. If not provided, fetches the docs index.

fetch_scenario_docs

Fetches Scenario documentation pages to understand how to write agent tests. Parameters:
  • url (optional): The full URL of a specific doc page. If not provided, fetches the docs index.
Your AI assistant will automatically choose the right tools based on your request. You don’t need to call these tools manually.