Langfuse vs LangSmith: which one to pick in 2026
Langfuse vs LangSmith in 2026 mostly comes down to four things: self-hosting, licensing, LangChain coupling, and pricing shape. I checked every claim below against both products' docs and pricing pages in July 2026.
Rogerio Chaves · July 3, 2026 · ArticleYou have an LLM app in production, or close to it, and two names on the shortlist for tracing, evals, and prompt management. Langfuse is an open-source platform you can run on your own infrastructure or use as a managed cloud. LangSmith is the proprietary platform from LangChain, the company behind the framework. Feature lists will not settle the choice: over the past two years both grew traces, datasets, LLM-as-a-judge evals, human annotation, prompt versioning, playgrounds, and OpenTelemetry endpoints. What separates them is how you can run them, what that costs at your volume, and how much the LangChain ecosystem means to your stack.
The short version: a hard requirement to self-host for free settles it in Langfuse's favor, because self-hosted LangSmith is an Enterprise add-on. Building everything on LangChain or LangGraph tilts it toward LangSmith, which the same team builds and which bundles agent deployment. Everyone else is trading off licensing, pricing shape, and ecosystem fit, so that is what the rest of this post walks through.
Langfuse vs LangSmith: the comparison table
I checked every claim here on July 3, 2026 against the linked docs and pricing pages. Both products ship weekly, so treat it as a dated snapshot.
| Langfuse | LangSmith | |
|---|---|---|
| Source code | Open source, MIT except the ee folders | Proprietary platform; SDKs are MIT |
| Self-hosting | Free with all product features, Docker Compose or Helm | Enterprise add-on with a license key, on Kubernetes |
| Managed cloud regions | EU (Ireland), US (Oregon), Japan, plus a HIPAA region | US, EU, and APAC on GCP, plus US on AWS |
| LangChain coupling | Independent company; integrates via LangChain callbacks | Built by LangChain; env-var tracing for LangChain apps, SDKs and OTel for the rest |
| OpenTelemetry | OTLP endpoint, GenAI semantic conventions, OpenLLMetry and OpenInference mapping | OTLP endpoint with regional variants |
| Evals | LLM-as-a-judge, code checks, annotation queues, datasets and experiments, online and offline | LLM-as-a-judge, code, human, and pairwise evaluators; datasets and experiments; online evals with sampling |
| Prompt management | Versions with deployment labels, client-side caching, playground | Versions with tags, Playground with AI assistance, public Prompt Hub |
| Pricing model | Usage units (each trace, observation, or score is one unit): Hobby free, Core $29, Pro $199, Enterprise $2,499 per month | Seats plus traces: Developer free, Plus $39 per seat; $2.50 per 1k base traces, $5.00 per 1k extended |
| Free tier | 50k units per month on cloud; unlimited when self-hosted | 5k base traces per month, 1 seat |
| Enterprise and compliance | SOC 2 Type II, ISO 27001, HIPAA BAA; SSO in the MIT core, SCIM and audit logs commercial | SOC 2 Type II, HIPAA BAA on Enterprise; custom SSO, RBAC, and hybrid or self-hosted deployment on Enterprise |
Self-hosting and licensing, the widest gap
Langfuse's repository is MIT licensed except the ee folders. Since June 2025 that MIT core covers the entire product: managed LLM-as-a-judge evaluators, annotation queues, prompt experiments, and the playground all moved from the commercial license to MIT. What stays commercial targets enterprise platform teams: SCIM provisioning, audit logs, data retention policies, project-level RBAC. Regular SSO ships in the MIT core.
Running it yourself is docker compose up on a VM for a test drive, or the Helm chart for production, with Postgres, ClickHouse, Redis, and S3 underneath. Those four stateful services are the real cost of the free tier: somebody on your team gets to operate ClickHouse.
LangSmith has no free self-hosted story. In the docs' own words, self-hosted LangSmith is "an add-on to the Enterprise plan designed for our largest, most security-conscious customers", it runs on Kubernetes, and you need a license key from sales even to trial it. The SDKs are open, the platform is closed. Where this bites is timing: on Langfuse, "our data cannot leave the VPC" is an afternoon of setup, while on LangSmith it is a procurement cycle.
Neither one requires LangChain
The most repeated worry about LangSmith is coupling, and the docs are plain about it: LangSmith "works with many frameworks and providers", listing OpenAI, Anthropic, CrewAI, Vercel AI SDK, and Pydantic AI, and there is a standard OTLP endpoint for anything OpenTelemetry can reach. The coupling is really a convenience gradient. A LangChain or LangGraph app sets LANGSMITH_TRACING=true plus an API key, and every step shows up with no code changes. Everything else goes through SDK decorators or OTel configuration, which is the same amount of work any vendor asks for.
Langfuse sits one step further from any framework. LangChain support comes through a callback handler you pass into your chain config. Meanwhile, the OTLP endpoint accepts the OpenTelemetry GenAI semantic conventions plus attributes from OpenLLMetry and OpenInference instrumentation. Teams that expect to swap frameworks, or never adopted one, tend to value that neutrality. Teams all-in on LangGraph get more from LangSmith's zero-config integration and its bundled agent deployment, which includes a dev deployment on the Plus plan.
Evals and prompt management have converged
Two years ago a LangSmith vs Langfuse comparison would have listed real feature gaps here; today the core matches almost line for line. Both cover datasets of test cases, experiment runs to compare versions, LLM-as-a-judge scoring, code-based checks, human review, and online evaluators that score a sample of production traffic. Prompt management converged the same way: versioned prompts behind labels or tags, a playground to iterate in, and APIs to fetch prompts at runtime.
The differences live at the edges. LangSmith adds pairwise evaluators that judge two outputs head to head, an AI assistant inside its playground, and a public Prompt Hub for sharing. Langfuse answers with deployment labels resolved through client-side caching, so fetching a prompt does not add a network hop to every request, and with all of it, annotation queues included, available in the free self-hosted build. If the eval feature list is what you are deciding on, honestly, either will do; run one real workflow through both free tiers and see which one your reviewers actually open in week two.
Langfuse vs LangSmith pricing: two different shapes
Langfuse Cloud bills units: every trace, observation, and score counts as one, regardless of size. One user request that produces a trace, four spans, and two judge scores is seven units. So naive Langfuse vs LangSmith price comparisons mislead in both directions: the 50k free units come closer to 7k requests for an agent shaped like that. Also worth knowing: scores that Langfuse features create, LLM-as-a-judge among them, count as billable units too, so turning up online evals raises the bill in proportion. Cloud tiers: Hobby is free with 50k units, 30 days of data access, and two users; Core is $29 per month, Pro $199, Enterprise $2,499, each including 100k units with $8 per additional 100k.
LangSmith bills seats plus traces. Developer is free with one seat and 5k base traces a month; Plus is $39 per seat with 10k included. A base trace costs $2.50 per 1k and lives 14 days, while extended retention costs $5.00 per 1k and lives 400 days. The surprise sits in between: a base trace upgrades to extended retention automatically when feedback lands on it, when a run rule matches it, or when it enters an annotation queue. Online evaluation attaches feedback by design, so monitoring-heavy projects drift toward the higher rate. You can cap extended-retention upgrades per month in workspace settings, but automations pause once you hit the cap.
Self-hosting resets this whole calculation. A self-hosted Langfuse at any volume costs whatever Postgres, ClickHouse, Redis, and S3 cost you to run, plus the time of whoever runs them. For a two-person team the $29 cloud tier beats operating ClickHouse; somewhere well past the Pro tier the math starts favoring your own cluster, if the ops capacity exists.
When Langfuse is the right pick
-
Compliance, data residency, or a security review demands self-hosting, or soon will. You get the full product under MIT on your own infrastructure.
-
Open source is policy, or you want the ability to read and patch the platform you depend on.
-
Your stack is OpenTelemetry-first or framework-diverse, and you want the observability layer independent of any framework vendor.
-
You need EU or Japan data residency with SOC 2 Type II, ISO 27001, and a HIPAA option, available from the self-serve Pro tier rather than an enterprise contract.
-
Trace volume is high and ops capacity exists: the self-host escape hatch caps your worst-case bill.
When LangSmith is the right pick
-
LangChain or LangGraph is the center of your stack: tracing is two environment variables, and the platform grew up around exactly your framework's concepts.
-
You want deployment and observability from one vendor. LangSmith bundles agent deployment, with a dev deployment included on the Plus plan.
-
Pairwise evaluations or the shared Prompt Hub fit how your team already works.
-
You are an enterprise that wants vendor-supported self-hosted or hybrid deployment and will pay for it. The option exists, it just sits behind the Enterprise plan.
-
You prefer a managed product with no infrastructure to think about, and your volume sits comfortably inside the seats-plus-traces model.
Where LangWatch fits
We build LangWatch, so read this section knowing that. Both tools above assume the person running evals writes code, and for plenty of teams that holds. Where it breaks is cross-functional: the person who can tell a right answer from a subtly wrong one is often a domain expert or a PM, and handing them a Python SDK means the eval loop stalls. We built LangWatch around that split. Domain experts create and run evals from the UI while engineers keep code and CI control. Agents also run through simulations before release: full multi-turn conversations against your agent, instead of judging one response at a time. There is a feature-by-feature table on our comparison page if that gap sounds like yours. And if your evals genuinely live with engineers, either tool above will serve you well; the free tiers make trying both cheap.
Frequently asked questions
- Is Langfuse free?
- Yes, in two ways. The platform is MIT-licensed and free to self-host with all product features included, LLM-as-a-judge evaluators, annotation queues, prompt experiments, and the playground among them. Langfuse Cloud also has a free Hobby tier with 50k units per month and 30 days of data access; paid cloud plans start at $29 per month.
- Does LangSmith require LangChain?
- No. LangSmith traces any application through its Python and TypeScript SDKs or its OpenTelemetry endpoint, and the docs list OpenAI, Anthropic, CrewAI, Vercel AI SDK, and Pydantic AI integrations. LangChain and LangGraph apps do get the smoothest path: set LANGSMITH_TRACING=true plus an API key and traces flow with no code changes.
- Can I self-host LangSmith?
- Only on the Enterprise plan. Self-hosted LangSmith is an Enterprise add-on that runs in your Kubernetes cluster with a license key; there is no free self-hosted version. Langfuse, by contrast, can be self-hosted for free under the MIT license with Docker Compose or Helm.
- Langfuse vs LangSmith for production monitoring: which is better?
- Both run online evaluations on production traces and ship dashboards and alerting, so the core capability is comparable. The practical differences are retention and cost shape: LangSmith keeps base traces for 14 days unless they upgrade to 400-day extended retention (which feedback and annotation trigger automatically), while Langfuse Cloud data access ranges from 30 days on the free tier to 3 years on Pro, and a self-hosted Langfuse retains whatever you choose to store.
- Is LangSmith open source?
- The platform is proprietary; only the SDKs are MIT-licensed on GitHub. Langfuse's entire product is MIT-licensed, with the commercial license limited to enterprise security and platform features such as SCIM, audit logs, and data retention policies.