<- Back to overview

Getting to value with LangWatch, faster than ever - how to migrate from Langfuse to LangWatch with Skills.

Manouk Draisma

Mar 26, 2026

Getting started with LangWatch was never the hard part. You'd sign up, grab your API key, add two lines of SDK setup and a decorator, and traces would start flowing. That part was already quick. The hard part was everything after.

Thinking about quality, setting up evaluations. Writing scenario simulations. Migrating from Langfuse, Langsmith without spending a week unwiring your existing integration. That's where teams would slow down speaking with them, not because the tools weren't there, but because there were real decisions to make and real setup to do before you got to the useful part. But there is a big take here, teams were ready to move from just “logs” to a decent Eval platform where they can collaborate with their non-technical folks on the quality of their Agents. They were ready to move from just “evals” to agent evals for multi-turn agentic systems. We call them agent simulations.

So how do we solve that part of getting our customers quicker to the value we as LangWatch offer?

Skills and MCP changed this. The time between "I want evals" and "I have a complete platform where everything connects together from Traces, to evals, to simulations to prompt management and datasets, all in one place" is now measured in minutes. Same for simulations. Same for migrations. That's what I want to talk about today.

Evals used to be a couple of days project

Getting from traces to a working evaluation suite involved: deciding what criteria to test, expressing those criteria precisely, wiring up your dataset, getting a golden dataset, traces from production, writing the experiment loop, debugging execution, and iterating until something useful came out.

Experienced AI / data teams were more experienced on it, and it clicked better to work with datasets for them than for engineers. Teams with engineers moving into AI, it would often take longer — or got pushed to next sprint indefinitely.

One quote we received yesterday: Great launch with Skills. I’ve implemented AI agents across multiple companies, and every single time evals come up, it’s the same story: “yeah, next sprint.” Fast forward four months—still nothing. I tried Skills with a couple of those teams, and it’s the first time I’ve seen evals actually click.

Now you install the LangWatch MCP once: claude mcp add langwatch -- npx -y @langwatch/mcp-server --apiKey your-api-key-here

And ask: set up evals for my agent

Claude reads your codebase and git history. It derives specific, binary pass/fail criteria from what it finds, not generic quality scores, but things like "did the agent respond in the correct language?" or "were all required data points included?" It builds the Jupyter experiment notebook, runs it, fixes any errors, and loops until the evaluation is green.

You go from zero evals to a running eval suite in one session. The decisions that used to take hours, what to test, how to express it, are made by Claude based on what your agent is actually supposed to do. Obviously it still requires your team to think about what is good or bad, but this speeds up your game extmrely.

Agent Simulations: the same jump

Multi-turn scenario simulations were even more work to think about and set up — three actors (your agent, a user simulator, a judge), a scenario script, evaluation criteria, the test harness. Meaningful setup before a single simulation ran.

set up scenario simulations for my agent

Claude reads your agent, understands its purpose, and scaffolds the full simulation setup: user simulator, judge agent, scenario scripts with realistic test cases, evaluation criteria. It runs them, catches failures, and iterates.

What used to be a full engineering task is now a single conversation.

Migrating from Langfuse or LangSmith

This is personally my favourite, but also the one that now comes up every single week. If you're on LangFuse, LangSmith, the migration used to mean: remove the old SDK, add LangWatch, rewrite your trace decorators, re-create your eval datasets, re-learn the dashboard. So what happened after the first demo with our team, yes we need this, we want this. But then they started planning this into their sprints, and it took time, with sometimes enough work that teams would stay on a platform they'd outgrown rather than deal with the move.

Now:

migrate my existing LangFuse instrumentation to LangWatch

Claude reads your current setup — LangFuse decorators, LangSmith callbacks, whatever integration you have — understands the structure, and rewrites it for LangWatch. For eval datasets and experiment configs, the evaluations skill reads what you have and rebuilds it in LangWatch's format. The migration that used to take a sprint fits inside a single session. And we make our customers now happy with the complete tooling they need if they want to step-up their eval game.

Six skills, the full surface

We wrote six skills covering everything you'd need:

Instrument — tracing setup, framework detection automatic.
Evaluate — derives criteria from your codebase, builds and runs the experiment notebook.
Simulate — multi-turn scenario testing with user simulator and judge agent.
Prompts — prompt versioning, testing, and rollback.
Analytics — production trace queries for latency, failures, behavioral drift.
Level up — reads your current LangWatch setup and tells you the next most valuable thing to add.

Each skill is pre-loaded with your API key and MCP config. Your coding agent has everything it needs from the first paste.

The MCP path: for everyone on your team

One more thing worth calling out. The MCP path isn't just for engineers doing setup — it's how non-technical teammates get visibility into agent quality without learning a new dashboard.

Drop this config into Claude Desktop:

{ "mcpServers": { "langwatch": { "command": "npx", "args": ["-y", "@langwatch/mcp-server"], "env": { "LANGWATCH_API_KEY": "your-api-key-here" } } } }

Now a PM, domain expert, or CEO can ask Claude things like "what were the top failing traces from our agent today?" or "compare eval scores between the last two runs" — without opening LangWatch at all. The people who understand what the agent should be doing get direct visibility into whether it's doing it.

The point of quicker going to value

Tracing was always fast to set up. What Skills and MCP change is the depth you can reach quickly, evaluations, simulations, migrations, team-wide visibility. All the things that used to require real engineering time now fit inside a conversation.

Time to value, now.

Get started

claude mcp add langwatch -- npx -y @langwatch/mcp-server --apiKey your-api-key-here

Full skills directory: langwatch.ai/docs/skills/directory

Free account. Open source. Self-host if you need it.

Or book a demo to get your team started.

Day 4 of LangWatch Skills week. Earlier this week: instrumenting agents, multimodal evaluations, and how non-engineers are running agent simulations.

Ship agents with confidence, not crossed fingers

Get up and running with LangWatch in as little as 5 minutes.

Start Shipping

Ship agents with confidence, not crossed fingers

Get up and running with LangWatch in as little as 5 minutes.

Start Shipping

Ship agents with confidence, not crossed fingers

Get up and running with LangWatch in as little as 5 minutes.

Start Shipping