# Giving AI Teams Confidence in Every Release

**Built on OpenSource & Open Standards, LangWatch is an LLM observability and evaluation platform.**  
Debug, Evaluate & Optimize your entire AI agent lifecycle with LangWatch.

---

## Introduction

Welcome to **LangWatch**, the all-in-one open-source LLMOps platform. LangWatch allows you to track, monitor, guardrail, and evaluate your LLM apps to measure quality and alert on issues.

- **For domain experts**: Easily sift through conversations, see discussed topics, annotate, and score messages in collaboration with developers.
- **For developers**: Debug, build datasets, prompt engineer, and run evaluations or DSPy experiments.
- **For the business**: Track conversation metrics, analytics, cost tracking, build dashboards, and integrate with your platform.

**Trusted by AI Startups, Agencies & Enterprises**

---

## LLM Observability

### Identify, Debug & Resolve Blindspots

- Built-in support for **OpenTelemetry**
- Full visibility into prompts, tool calls, and agents across major frameworks
- Fast debugging, smarter insights

#### Core Features

- **Trace every request** through your stack
- **Visualize** token usage, latency, cost
- **Root cause analysis** of complex prompt issues

### LLM Metrics Built for AI Teams

- **Prompt & response tracing**
- **Metadata-rich logs**
- **Latency & error tracking**
- **Token usage**
- **User journey mapping**

#### Visual Debugging

- **Trace LLM calls**: Inputs, outputs, latency, tokens, cost
- **Analyze performance** with control flow visualization
- **Triggers & Alerts**: Set conditions to flag anomalies or patterns
- **Framework agnostic**: Works with LangChain, DSPy, direct API, etc.

---

## LLM Evaluations

### Integrate Automated LLM Evaluations

- Offline and online evaluations
- LLM-as-a-Judge + code-based tests
- Detect hallucinations, measure quality, compare models

### LangWatch Evaluations Wizard

- **No-code or code-based**
- **Dataset creation** from production data, low scores, feedback
- **Annotate LLM output**
- **CI/CD pipeline integration**
- **Safety checks** (PII, prompt injection, toxicity)

### Evaluation Features

- **Online, offline, and custom evals**
- **Simulations for real-world testing**
- **Annotations as evaluations**

---

## LLM Monitoring

- **Real-time monitoring**
- **Anomaly detection**
- **Customizable dashboards**
- **Alerting & Reporting**

---

## Annotations & Labelling

- **Human-in-the-loop workflows**
- **Share findings, collaborate, and document**
- **Auto-build datasets from annotations**

---

## LLM Experimentations

### Prompt Optimization with DSPy

- **DSPy Optimizers** (e.g. MIPROv2)
- **Drag-and-drop prompting**
- **Model agnostic**
- **DSPy Visualizer**

### Features

- **Prompt experimentation without production code**
- **A/B testing and versioning**
- **Insights and visual feedback**
- **Built for developers and domain experts**

---

## LLM Guardrails

- **Real-time hallucination detection**
- **Sensitive data leak prevention**
- **Low-latency guardrail deployment**
- **Customizable guardrails as a managed service**

---

## LLM Optimization

- **Integrated with DSPy**
- **Track experiments, visualize results**
- **Automatic prompt selection**
- **Run prompts, call APIs, no-code or full-code control**

### Use Cases

- RAG optimization
- Agent routing
- Categorization accuracy
- Structured vibe-checking

---

## Agentic Flow Testing (AI Agent Testing)

### Let AI Test AI — Automatically

- **Define tester persona & success criteria**
- **Start conversation** between tester and target agent
- **Get structured results**: verdict, transcript, safety flags

### Benefits

- Understand behaviors, not just bugs
- Continuous AI QA at scale
- CI/CD-ready

---

## LLM User Analytics

### Actionable Insights

- Measure engagement, satisfaction, drop-offs
- Funnel analytics and product fit insights
- Compare performance across models and prompts

### Metrics Tracked

- Latency, throughput, response time
- Quality and usage patterns
- Token usage and API costs
- Anomaly detection

### Dashboards

- **Executive**
- **Quality Monitoring**
- **Cost Management**
- **Custom Analytics Builder**

---

## Easy Integration

LangWatch integrates with:

- LangChain
- DSPy
- Vercel AI SDK
- LiteLLM
- OpenTelemetry
- LangFlow

### Model-Agnostic Support

- OpenAI
- Claude
- Azure OpenAI
- Gemini
- Hugging Face
- Groq

---

## Use Optimized LLM Flow as API

- Turn your LLM pipelines into **reliable APIs**

---

## Enterprise-Grade Controls

- **Self-hosted**, **Cloud**, or **Hybrid**
- **GDPR & ISO27001 compliant**
- **Role-based access**
- **Bring your own models**

---

## Security & Compliance

### Deployment Options

- **Cloud**: Fully managed by LangWatch
- **Self-hosted**: Full control by customer
- **Hybrid**: Cloud UI, data remains on customer infrastructure

### Security Features

- **Encryption** at rest (AES-256) and in transit (TLS 1.2+)
- **Key Management**: AWS KMS
- **Access Controls**: RBAC, MFA, SSO
- **Monitoring & Incident Response**: Real-time alerts, logging, response plans
- **Backups & DR**: Geo-redundant, encrypted, fast RTO/RPO

### Development Practices

- Code audits, static analysis, peer reviews
- Environment isolation
- Compliance: GDPR, SOC 2, ISO 27001
- Pen tests and vendor risk reviews

---

## Testimonials

> “LangWatch has brought us the next level observability and evaluations. The Optimization Studio brings the kind of progress we were hoping for as a partner.”  
> — **Lane**, VP Engineering, GetGenetica (Flora AI)

> “LangWatch’s UI-based approach allowed us to experiment with prompts, hyperparameters, and LLMs without touching production code. When deeper customization was needed, the flexibility to dive into coding was a huge plus.”  
> — **Malavika Suresh**, AI Researcher, PHWL.ai

> “I’ve seen a lot of LLMops tools and LangWatch is solving a problem that everyone building with AI will have when going to production. The best part is their product is so easy to use.”  
> — **Kjeld Oostra**, AI Architect, Entropical AI Agency

> “LangWatch didn’t just help us optimize our AI — it fundamentally changed how we work. Now everyone on our team – from engineers to coaching experts – can contribute to building a better AI coach.”  
> — **David Nicol**, CTO, Productive Healthy Work Lives

> LangWatch helps you monitor, evaluate, and optimize your LLM-powered applications\'97gain full visibility, improve performance, and ensure reliability in production\
\
## Giving AI teams confidence in every release\
\
- [Blog](https://langwatch.ai/blog): Technical insights on LLM evaluation frameworks, latency profiling, prompt testing, and observability\'97engineer-focused content from LangWatch\
- [Flexible, risk-free pricing](https://langwatch.ai/pricing): Explore flexible pricing for LangWatch\'97plans for startups to enterprises building LLM apps with observability, evaluations, and security in mind\
- [Create test suites for your LLMs in minutes](https://langwatch.ai/evaluations): Build structured eval pipelines for LLM outputs\'97track quality, detect regressions, and iterate fast with LangWatch Evaluations\
- [Full visibility into yourLLM application stack](https://langwatch.ai/observability): LLM observability for engineers\'97trace inputs and outputs, monitor latency and errors, and debug behavior across models and environments with LangWatch\
- [Ship better prompts with DSPy and LangWatch](https://langwatch.ai/llm-optimizations): Optimize LLM app performance with LangWatch\'97debug latency, fine-tune prompts, reduce hallucinations, and iterate faster with eval-based feedback\
- [Security & compliance](https://langwatch.ai/security): Enterprise-ready LLM security with LangWatch\'97SOC 2, ISO 27001, RBAC, redaction, and full audit trails for compliant AI deployments\
- [Mitigate Gen AI risks with Guardrails](https://langwatch.ai/guardrails): Implement robust guardrails for your LLM apps\'97detect toxic outputs, prevent data leaks, and enforce safe, compliant behavior with LangWatch\
- [\uc0\u55358 \u56810  Agentic Flow Testing](https://langwatch.ai/scenarios): Scenario automates end-to-end testing for LLM agents - simulate human interactions across defined flows, catch regressions, and validate goal completion\
- [LangWatch vs LangSmith vs LangFuse](https://langwatch.ai/comparison): Compare LangWatch, LangSmith, and LangFuse\'97see how they differ in LLM observability, evaluations, guardrails, and production readiness\
- [Actionable insights for your AI applications](https://langwatch.ai/analytics): Visualize how users interact with your LLM app - analyze usage patterns, drop-offs, and success rates to improve experience and outcomes.\
\
## Blog\
\
- [Building an LLM Eval framework that actually works in practice](https://langwatch.ai/blog/building-an-llm-eval-franework-that-actually-works-in-practice): Evals are the thing of 2025 and building high-quality Evals is one of the most impact investments you can make.\
- [April Product Recap: Selene Integration, Eval Wizard Upgrades, Prompt Studio & More](https://langwatch.ai/blog/april-product-recap-selene-integration-eval-wizard-upgrades-prompt-studio-more): LangWatch Selente - Atla, LLM Evaluations, prompt versioning, structured output, OpenTelemetry SDK, LLMops ISO certified\
- [LLM Monitoring & Evaluation for Real-World Production Use](https://langwatch.ai/blog/llm-monitoring-evaluation-for-real-world-production-use): Key challenges teams face when put LLM-powered apps in production, and why continuous monitoring and evaluation is essential\
- [Systematically Improving RAG Agents](https://langwatch.ai/blog/systematically-improving-rag-agents): Improving RAG agents: Build a basic system, Create evaluation data, run experiments\
- [Introducing the Evaluations Wizard: How to evaluate your LLM: Building an LLM evaluation framework that actually works](https://langwatch.ai/blog/introducing-the-evaluations-wizard-your-end-to-end-workflow-for-llm-testing): Learn how to effectively evaluate and test LLMs with LangWatch's new Evaluations Wizard. Improve your AI model performance\
- [Function Calling vs. MCP: Why You Need Both\'97and How LangWatch Makes It Click](https://langwatch.ai/blog/function-calling-vs-mcp-why-you-need-both-and-how-langwatch-makes-it-click): What is MCP? What does MCP stand for? And what is Function Calling?\
- [Why LLM Observability is Now Table Stakes](https://langwatch.ai/blog/why-llm-observability-is-now-table-stakes): The start of LLMOps: DevOps for Generative AI\
- [LangWatch vs. LangSmith vs. Braintrust vs. Langfuse: Choosing the Best LLM Evaluation & Monitoring Tool in 2025](https://langwatch.ai/blog/langwatch-vs-langsmith-vs-braintrust-vs-langfuse-choosing-the-best-llm-evaluation-monitoring-tool-in-2025): Compare LangWatch, LangSmith, Braintrust, and Langfuse in this 2025 guide to LLM evaluation and monitoring tools\
- [Introducing Scenario: Use an Agent to Test Your Agent](https://langwatch.ai/blog/introducing-scenario-use-an-agent-to-test-your-agent): Scenario is an automated testing library for LLM agents that simulates real user interactions end-to-end.\
- [LLM evaluations at Swis for Dutch government projects by LangWatch](https://langwatch.ai/blog/llm-evaluations-at-swis-for-dutch-government-projects-by-langwatch): How do we objectively know if the AI output is good? LLM evaluation reports & feedback loops\
- [LangWatch and adesso join forces: Accelerating Secure LLM Adoption for Enterprises](https://langwatch.ai/blog/langwatch-and-adesso-join-forces-accelerating-secure-llm-adoption-for-enterprises): LangWatch partners with Adesso to support Enterprise companies with LLMops\
- [Why Your AI Team Needs an AI PM (Quality) Lead](https://langwatch.ai/blog/why-your-ai-team-needs-an-ai-pm-(quality)-lead): The best GenAI teams are now introducing a critical new role: the AI PM (Quality) Lead.\
- [LLMOps Is Still About People: How to Build AI Teams That Don\'92t Implode](https://langwatch.ai/blog/llmops-is-still-about-people-how-to-build-ai-teams-that-don-t-implode): LLMs can do amazing things, but only if they understand context. That context lives in the heads of domain experts.\
- [Tackling LLM Hallucinations with LangWatch: Why Monitoring and Evaluation Matter](https://langwatch.ai/blog/tackling-llm-hallucinations-with-langwatch-why-monitoring-and-evaluation-matter): What are LLM Hallucinations? What causes LLM hallucinations? How to monitor and evaluate LLM-apps\
- [What is Model Context Protocol (MCP)? And how's LangWatch involved?](https://langwatch.ai/blog/what-is-model-context-protocol-(mcp)-and-how-s-langwatch-involved): The Model Context Protocol is a new standard that lets AI agents easily connect to external tools and data sources.\
- [How PHWL.ai uses LLM Observability and Optimization to Improve AI Coaching with LangWatch](https://langwatch.ai/blog/how-phwl-ai-uses-llm-observability-and-optimization-to-improve-ai-coaching-with-langwatch): Improve your LLM performance with real-time observability and optimization\
- [LangWatch.ai - Announcing -  \'801M funding round to bring the power of Evaluations and Auto-Optimizations to AI teams.](https://langwatch.ai/blog/langwatch-ai-announcing-1m-funding-round-to-bring-the-power-of-evaluations-to-ai-teams): LangWatch: \'801M pre-seed funding round led by Passion Capital, with great support from Volta Ventures and Antler.\
- [OpenAI, Anthropic, Deepseek and other LLM Providers keep dropping prices: Should you host your own model?](https://langwatch.ai/blog/openai-anthropic-deepseek-and-other-llm-providers-keep-dropping-prices-should-you-host-your-own-model): OpenAI, Anthropic, Deepseek and other LLM Providers keep dropping prices: Should you host your own model?\
- [7 Predictions for AI in 2025: A CTO's, Rogerio Chaves Perspective](https://langwatch.ai/blog/7-predictions-for-ai-in-2025-a-cto-s-rogerio-chaves-perspective): AI is evolving at speed, and the landscape in 2025 will be shaped across agents, multimodal data, and model efficiency.\
- [Customer Stories: HolidayHero AI start-up <> LangWatch](https://langwatch.ai/blog/holidayhero-customercase-with-langwatch-ai): LangWatch has been a part of HolidayHero's LLM production environment for over two months, overseeing thousands of guestchats\
- [LangWatch Optimization Studio \'96 Built for AI Engineers, by AI Engineers](https://langwatch.ai/blog/langwatch-optimization-studio-built-for-ai-engineers-by-ai-engineers): LangWatch Optimization Studio \'96 Built for AI Engineers, by AI Engineers\
- [The power of MIPROv2 (DSPy) in a Low-Code environment with LangWatch\'92s Optimization Studio](https://langwatch.ai/blog/the-power-of-miprov2-in-a-low-code-environment-with-langwatch-s-optimization-studio): Leverage the power of DSPy\'92s MIPROv2 without diving into complex code? Enter LangWatch\'92s Optimization Studio\
- [What is Prompt Optimization? An Introduction to DSPy and Optimization Studio](https://langwatch.ai/blog/what-is-prompt-optimization-an-introduction-to-dspy-and-optimization-studio): LangWatch\'92s Optimization Studio, a more precise, scientific and better approach to prompt optimization\
- [Deploying an OpenAI RAG Application to AWS ElasticBeanstalk](https://langwatch.ai/blog/deploying-openai-rag-application-to-aws-elasticbeanstalk): This tutorial guides you through building chatbots using Retrieval Augmented Generation with OpenAI in Python using FastAPI\
- [The complete guide for TDD with\'a0LLMs](https://langwatch.ai/blog/tdd-with-llms): How can we test in a probabilistic environment? Test Driven Development for LLM's\
- [Data Flywheel: Using your production data to build better LLM products](https://langwatch.ai/blog/data-flywheel): Data Flywheel: using your production data to build better LLM products\
- [How Algomo reduced AI hallucinations with LangWatch](https://langwatch.ai/blog/customer-case-study-algomos-experience-with-langwatch): How Algomo increased the quality of their AI app with LangWatch\
- [The AI Team: Integrating User and Domain Expert Feedback to Enhance LLM-Powered Applications](https://langwatch.ai/blog/the-ai-team-integrating-user-and-domain-expert-feedback-to-enhance-llm-powered-applications): Understand what is The AI Team and what are Their Roles\
- [Unit Testing Your LLM: The Power of Datasets](https://langwatch.ai/blog/unit-testing-your-llm-the-power-of-datasets): Understand how to leverage datasets for LLM unit testing\
- [Introducing DSPy Visualizer](https://langwatch.ai/blog/introducing-dspy-visualizer): What is DSPy? DSPy Visualizer, allows you to log your DSPy training sessions, track the performance\
- [New Dutch Startup, LangWatch, brings much-needed quality control to GenAI](https://langwatch.ai/blog/new-dutch-startup-langwatch-brings-much-needed-quality-control-to-genai): LangWatch, a new innovative Amsterdam-based startup: Meet the Team\
- [How to build a RAG application from scratch with the least possible AI Hallucinations](https://langwatch.ai/blog/how-to-build-a-rag-chatbot-from-scratch-with-the-least-possible-ai-hallucinations): Driving to help AI leaders create RAG chatbots with minimal hallucinations\
- [Safeguarding Your First LLM-Powered Innovation: Essential Practices for Security](https://langwatch.ai/blog/safeguarding-your-first-llm-powered-innovation-essential-practices-for-security): Journey of launching your first LLM-powered product is filled with potential and challenges.\
- [LLM Reliability with Retrieval-Augmented Generation](https://langwatch.ai/blog/llm-reliability-with-retrieval-augmented-generation): Retrieval Augmented Generation. Its popularity continues to surge, offering various methods for its successful implementation\
- [What is User Analytics for LLMs, The Difference With Traditional Analytics, And Why is it Important?](https://langwatch.ai/blog/what-is-user-analytics-for-llms-the-difference-with-traditional-analytics-and-why-is-it-important): Discover how User Analytics for LLMs can transform AI interactions, revealing user behavior\
- [Unlocking the Potential of Large Language Models: The LLM's Beyond the Hype](https://langwatch.ai/blog/unlocking-the-potential-of-large-language-models-the-llm-s-beyond-the-hype): Successfully integrating LLMs into your business requires careful monitoring and evaluation of options\
- [The 8 Types of LLM Hallucinations](https://langwatch.ai/blog/the-8-types-of-llm-hallucinations): Delve into the challenges of LLM hallucinations, explore their types, causes, and effective mitigation strategies\
\

_For more information, visit [LangWatch.ai](https://www.langwatch.ai) or contact our security team at **security@langwatch.ai**_