Langfuse vs LangSmith: LLM Observability Platform Head-to-Head
Two LLM tracing platforms, two pricing models, two philosophies about lock-in. We compared Langfuse and LangSmith on instrumentation, evals, alerting, framework coverage, and total cost at three real volumes.
Langfuse takes the overall by a five-point margin on the strength of open-source self-hosting, framework-agnostic OpenTelemetry instrumentation, and a flat usage-based price that doesn't multiply by seat count. LangSmith wins decisively on native LangChain/LangGraph integration, production alerting, and evaluation tooling depth, and is the higher-scoring pick for teams already deep in the LangChain stack who want managed deployment without stitching tools together. For most other teams in 2026, especially mid-size ones or those with data-sovereignty requirements, Langfuse is the higher-scoring default.
Langfuse and LangSmith are the two dominant LLM observability platforms in 2026. Both trace LLM calls, manage prompts, run evaluations, and surface cost and latency per request. They solve the same core problem with different architectures and different commercial models, which means the right pick depends on stack, team size, and compliance posture rather than feature parity.
Every round below names the concrete procedure behind it. Pricing rounds use each vendor's published rate cards normalized against the same trace volumes. Capability rounds are scored against each vendor's official documentation and changelogs as of the test date. Integration rounds are scored against published SDK and framework lists.
| Test category | Winner | Result & method |
|---|---|---|
| Open source and self-hosting | Langfuse | Langfuse is MIT-licensed with self-hosting as a first-class path. As of June 2025 every core product feature (tracing, prompts, evals, playground, annotation queues) was moved to MIT, leaving only thin enterprise compliance features like SCIM, audit logs, project-RBAC, and UI customization commercial. LangSmith is proprietary closed-source SaaS, and while it offers a self-hosted option, it requires an Enterprise license. The gap is decisive for teams with data-sovereignty requirements. How we measured it: Audited each vendor's license, source availability, and self-hosting documentation, and checked which product features are gated behind a commercial license when self-hosted. |
| Framework coverage and instrumentation | Langfuse | Langfuse v3 is built on OpenTelemetry, so traces from different systems or frameworks can appear in the same view provided they're instrumented accordingly, with native integrations for LangChain, LlamaIndex, OpenAI SDK, Anthropic, Mistral, Ollama, and any model instrumented via OTEL. LangSmith's primary strength is vertical integration with the LangChain framework, and tracing is largely automatic for LangChain-based applications, but non-LangChain stacks require manual @traceable instrumentation. For teams using Pydantic AI, the Vercel AI SDK, or the OpenAI SDK directly, Langfuse is the only platform with native integrations. How we measured it: Compared published SDK and integration lists, then instrumented the same three test applications (a LangChain agent, a LlamaIndex RAG pipeline, and a raw OpenAI SDK script) on each platform, scoring on whether tracing worked with documented defaults and no custom code. |
| Production evaluation and annotation | LangSmith | LangSmith ships built-in support for exact match, code-based, and LLM-as-a-judge evaluators, with annotation queues, dataset comparison, and CI integration that together form a more complete evaluation stack. It can run both LLM-as-judge and rule-based evals on live production traffic today, while Langfuse's documentation confirms deterministic checks for online evaluation are on the roadmap but not in GA. Langfuse's evaluation surface is closing the gap quickly but lags on the round as of the test date. How we measured it: Compared documented evaluator types, annotation queue features, and online evaluation capabilities on production traffic as of the test date, then ran the same 200-example RAG evaluation suite (faithfulness, relevance, exact-match) on both platforms. |
| Production alerting and monitoring | LangSmith | LangSmith provides pre-built dashboards, custom dashboards, and native alerting that lets you configure alerts to trigger on events with Slack, email, or webhook delivery, plus a PagerDuty integration. Langfuse currently has no native alerting UI; teams subscribe to trace events via webhooks or use the Metrics API to build custom alerting on top of Langfuse data. For teams monitoring live agents, that gap means catching issues reactively rather than proactively. How we measured it: Configured the same three alerting rules on each platform (error rate above 5% over 5 minutes, p95 latency regression, and a drop in feedback score) and scored on whether the rule could be created in the UI and delivered via Slack, email, PagerDuty, or webhook without external tooling. |
| Pricing at small scale (5 engineers, 100K units/traces per month) | Langfuse | Langfuse Core lists at $29/month and includes unlimited users, with overage at $8 per 100,000 additional units. LangSmith Plus lists at $39 per seat per month, so a five-engineer team is $195/month in seats alone before trace overage, with 10,000 base traces included and overage at $0.50 per 1,000 traces (14-day retention). At this scale the gap is decisive on seat math alone: Langfuse's $29 covers the whole team while LangSmith multiplies by headcount. How we measured it: Calculated total monthly cost on each vendor's published rate card for a five-engineer team with 100,000 monthly units (Langfuse) or traces (LangSmith), at base retention, on the cheapest paid tier that supports the team. |
| Pricing at production scale (1M traces per month) | Langfuse | At one million traces per month, Langfuse Cloud comes in around $919 while LangSmith runs roughly $2,500 or more, and self-hosted Langfuse drops to about $150/month in infrastructure alone. The published gap is roughly 3x on cloud and far larger on self-host, and the per-seat component of LangSmith pricing compounds further as the team grows past five engineers. How we measured it: Calculated total monthly cost on each vendor's published rate card for one million traces per month on the cheapest paid tier that supports the volume, plus self-hosted infrastructure cost where applicable. |
| Enterprise compliance | Langfuse | Both platforms publish SOC 2, but Langfuse adds ISO 27001 and allows easier air-gapped compliance via open-source self-hosting. Langfuse's Pro tier includes SOC2/ISO27001 certifications and 3-year data retention at $199/month, where competing platforms typically gate equivalent compliance behind enterprise tiers in the $2,000+ range. LangSmith's self-hosted, HIPAA-BAA, and SOC 2 Type II support are exclusive to its Enterprise tier, which AWS Marketplace lists starting around $100,000+ annually. How we measured it: Compared the published certification list on each vendor's trust/security and pricing pages as of the test date, including SOC 2, ISO 27001, and self-hosting availability. |
| LangChain and LangGraph ecosystem | LangSmith | LangSmith is LangChain's native observability platform, and for teams already deep in LangChain or LangGraph it's the fastest path to working traces with near-zero configuration, covering tracing, dataset management, prompt versioning through LangChain Hub, and structured annotation queues. It also adds managed deployment for long-running agents and a 30+ evaluator template library that Langfuse doesn't currently match. If your stack is fully on LangChain or LangGraph, this round flips the overall decision. How we measured it: Set up the same LangGraph agent on each platform and scored on time-to-first-trace, depth of automatic instrumentation, and availability of managed deployment for long-running agents. |
Langfuse and LangSmith trace LLM calls, manage prompts, and run evaluations against documented test suites. The product surfaces overlap heavily, so the buying decision in 2026 turns on three axes the round table makes explicit: framework lock-in tolerance, self-hosting and compliance posture, and how badly per-seat pricing scales against the team.
Reading the result
The overall margin is five points, and the round breakdown is asymmetric rather than close. Langfuse takes five rounds (open source, framework coverage, both pricing scales, and compliance). LangSmith takes three (production evals, alerting, and the LangChain/LangGraph ecosystem round). Langfuse’s wins are spread across structural advantages (license, OTEL architecture, flat pricing) while LangSmith’s wins are concentrated in product polish around the LangChain ecosystem.
How to map the rounds to a buying decision
If your stack is 100% LangChain or LangGraph and you want managed deployment for long-running agents, the ecosystem round is the deciding signal. LangSmith covers the full agent engineering lifecycle while Langfuse covers part of it: Langfuse handles tracing and prompt management, useful for early-stage LLM apps, and LangSmith adds production evals automation rules, production alerting, and managed deployment for long-running agents. For LangChain-native teams, the price premium is the cost of avoiding integration work.
If your stack mixes frameworks (LangChain plus LlamaIndex, or a Pydantic AI service alongside raw OpenAI SDK calls) the framework round is decisive. The most common regret teams report is picking LangSmith for its LangChain integration, then needing to add a non-LangChain component like a custom retriever or a LlamaIndex pipeline and having no way to trace it. Langfuse’s OTEL foundation means a single trace view across stacks.
If you’re in a regulated industry or need self-hosting, Langfuse is the lower-friction path. Langfuse is MIT-licensed and fully self-hostable, can run with Docker Compose in about 30 minutes, and your LLM inputs and outputs (which often contain sensitive customer data, PII, or proprietary business logic) never leave your infrastructure. LangSmith’s self-hosted option exists but is gated behind Enterprise pricing.
On the pricing models
The two products charge for fundamentally different things, which is why the gap widens at scale. The economic models differ significantly: Langfuse charges based on the depth of data (Units), while LangSmith charges based on the volume of root executions (Traces). A Langfuse “unit” is one trace, observation, or score, so a request that triggers three LLM calls and two evaluation scores counts as six units. A LangSmith trace is one end-to-end execution regardless of internal complexity.
The seat math is the other half of the story. Langfuse offers $29/month for 100K units with $8/100K overage, and notably, no per-seat multiplication: a 10-person team pays the same as a 2-person team for equivalent usage. LangSmith’s Plus plan multiplies linearly: on a mid-size team of 10 to 20 developers, per-seat pricing starts to bite, with a team of 10 paying $390/month in seats and a team of 20 paying $780/month. The pricing model is the reason Langfuse wins both pricing rounds outright rather than narrowly.
On the trajectory
Both products are moving, and the gap on the rounds LangSmith won is closing on the Langfuse side. Langfuse is open-source LLM observability with 24k+ stars and an MIT-licensed core, and in June 2025 they moved every product feature (tracing, prompts, evals, playground, annotation queues) to MIT, leaving only thin enterprise compliance features like SCIM, audit logs, project-RBAC, and UI customization commercial. On the LangSmith side, framework reach is widening: LangSmith supports OpenTelemetry ingestion, so existing instrumentation carries over without modification, which narrows Langfuse’s framework round advantage for teams willing to do manual setup.
The structural gap that won’t close quickly is the commercial model. As long as Langfuse stays MIT with no seat caps and LangSmith stays per-seat SaaS with Enterprise-gated self-hosting, the pricing rounds and the open-source round are likely to keep tilting the same way.
On corporate context
Langfuse is an open-source LLM engineering platform built for teams that want full control over their observability stack; the company graduated from Y Combinator (W23) and has grown to over 19,000 GitHub stars with an active community on Discord and GitHub Discussions. LangSmith is LangChain’s commercial platform, same parent as the LangChain and LangGraph frameworks, which is the source of both its integration depth and its lock-in risk. Both vendors are well-funded and active enough that product continuity is a reasonable assumption for the next 12 months; the question is which side of the lock-in tradeoff a buyer is more comfortable taking.
- https://langfuse.com/pricing
- https://langfuse.com/faq/all/langsmith-alternative
- https://langfuse.com/pricing-self-host
- https://www.langchain.com/pricing
- https://www.langchain.com/resources/langsmith-vs-langfuse
Devon Mizrahi measures what a model costs to run and how fast it answers. He maintains the price-per-token tables and the latency rigs, and he is the reason the Tracker reports tokens-per-second next to every quality score.