# Giving AI Teams Confidence in Every Release **Built on OpenSource & Open Standards, LangWatch is an LLM observability and evaluation platform.** Debug, Evaluate & Optimize your entire AI agent lifecycle with LangWatch. --- ## Introduction Welcome to **LangWatch**, the all-in-one open-source LLMOps platform. LangWatch allows you to track, monitor, guardrail, and evaluate your LLM apps to measure quality and alert on issues. - **For domain experts**: Easily sift through conversations, see discussed topics, annotate, and score messages in collaboration with developers. - **For developers**: Debug, build datasets, prompt engineer, and run evaluations or DSPy experiments. - **For the business**: Track conversation metrics, analytics, cost tracking, build dashboards, and integrate with your platform. **Trusted by AI Startups, Agencies & Enterprises** --- ## LLM Observability ### Identify, Debug & Resolve Blindspots - Built-in support for **OpenTelemetry** - Full visibility into prompts, tool calls, and agents across major frameworks - Fast debugging, smarter insights #### Core Features - **Trace every request** through your stack - **Visualize** token usage, latency, cost - **Root cause analysis** of complex prompt issues ### LLM Metrics Built for AI Teams - **Prompt & response tracing** - **Metadata-rich logs** - **Latency & error tracking** - **Token usage** - **User journey mapping** #### Visual Debugging - **Trace LLM calls**: Inputs, outputs, latency, tokens, cost - **Analyze performance** with control flow visualization - **Triggers & Alerts**: Set conditions to flag anomalies or patterns - **Framework agnostic**: Works with LangChain, DSPy, direct API, etc. --- ## LLM Evaluations ### Integrate Automated LLM Evaluations - Offline and online evaluations - LLM-as-a-Judge + code-based tests - Detect hallucinations, measure quality, compare models ### LangWatch Evaluations Wizard - **No-code or code-based** - **Dataset creation** from production data, low scores, feedback - **Annotate LLM output** - **CI/CD pipeline integration** - **Safety checks** (PII, prompt injection, toxicity) ### Evaluation Features - **Online, offline, and custom evals** - **Simulations for real-world testing** - **Annotations as evaluations** --- ## LLM Monitoring - **Real-time monitoring** - **Anomaly detection** - **Customizable dashboards** - **Alerting & Reporting** --- ## Annotations & Labelling - **Human-in-the-loop workflows** - **Share findings, collaborate, and document** - **Auto-build datasets from annotations** --- ## LLM Experimentations ### Prompt Optimization with DSPy - **DSPy Optimizers** (e.g. MIPROv2) - **Drag-and-drop prompting** - **Model agnostic** - **DSPy Visualizer** ### Features - **Prompt experimentation without production code** - **A/B testing and versioning** - **Insights and visual feedback** - **Built for developers and domain experts** --- ## LLM Guardrails - **Real-time hallucination detection** - **Sensitive data leak prevention** - **Low-latency guardrail deployment** - **Customizable guardrails as a managed service** --- ## LLM Optimization - **Integrated with DSPy** - **Track experiments, visualize results** - **Automatic prompt selection** - **Run prompts, call APIs, no-code or full-code control** ### Use Cases - RAG optimization - Agent routing - Categorization accuracy - Structured vibe-checking --- ## Agentic Flow Testing (AI Agent Testing) ### Let AI Test AI — Automatically - **Define tester persona & success criteria** - **Start conversation** between tester and target agent - **Get structured results**: verdict, transcript, safety flags ### Benefits - Understand behaviors, not just bugs - Continuous AI QA at scale - CI/CD-ready --- ## LLM User Analytics ### Actionable Insights - Measure engagement, satisfaction, drop-offs - Funnel analytics and product fit insights - Compare performance across models and prompts ### Metrics Tracked - Latency, throughput, response time - Quality and usage patterns - Token usage and API costs - Anomaly detection ### Dashboards - **Executive** - **Quality Monitoring** - **Cost Management** - **Custom Analytics Builder** --- ## Easy Integration LangWatch integrates with: - LangChain - DSPy - Vercel AI SDK - LiteLLM - OpenTelemetry - LangFlow ### Model-Agnostic Support - OpenAI - Claude - Azure OpenAI - Gemini - Hugging Face - Groq --- ## Use Optimized LLM Flow as API - Turn your LLM pipelines into **reliable APIs** --- ## Enterprise-Grade Controls - **Self-hosted**, **Cloud**, or **Hybrid** - **GDPR & ISO27001 compliant** - **Role-based access** - **Bring your own models** --- ## Security & Compliance ### Deployment Options - **Cloud**: Fully managed by LangWatch - **Self-hosted**: Full control by customer - **Hybrid**: Cloud UI, data remains on customer infrastructure ### Security Features - **Encryption** at rest (AES-256) and in transit (TLS 1.2+) - **Key Management**: AWS KMS - **Access Controls**: RBAC, MFA, SSO - **Monitoring & Incident Response**: Real-time alerts, logging, response plans - **Backups & DR**: Geo-redundant, encrypted, fast RTO/RPO ### Development Practices - Code audits, static analysis, peer reviews - Environment isolation - Compliance: GDPR, SOC 2, ISO 27001 - Pen tests and vendor risk reviews --- ## Testimonials > “LangWatch has brought us the next level observability and evaluations. The Optimization Studio brings the kind of progress we were hoping for as a partner.” > — **Lane**, VP Engineering, GetGenetica (Flora AI) > “LangWatch’s UI-based approach allowed us to experiment with prompts, hyperparameters, and LLMs without touching production code. When deeper customization was needed, the flexibility to dive into coding was a huge plus.” > — **Malavika Suresh**, AI Researcher, PHWL.ai > “I’ve seen a lot of LLMops tools and LangWatch is solving a problem that everyone building with AI will have when going to production. The best part is their product is so easy to use.” > — **Kjeld Oostra**, AI Architect, Entropical AI Agency > “LangWatch didn’t just help us optimize our AI — it fundamentally changed how we work. Now everyone on our team – from engineers to coaching experts – can contribute to building a better AI coach.” > — **David Nicol**, CTO, Productive Healthy Work Lives > LangWatch helps you monitor, evaluate, and optimize your LLM-powered applications\'97gain full visibility, improve performance, and ensure reliability in production\ \ ## Giving AI teams confidence in every release\ \ - [Blog](https://langwatch.ai/blog): Technical insights on LLM evaluation frameworks, latency profiling, prompt testing, and observability\'97engineer-focused content from LangWatch\ - [Flexible, risk-free pricing](https://langwatch.ai/pricing): Explore flexible pricing for LangWatch\'97plans for startups to enterprises building LLM apps with observability, evaluations, and security in mind\ - [Create test suites for your LLMs in minutes](https://langwatch.ai/evaluations): Build structured eval pipelines for LLM outputs\'97track quality, detect regressions, and iterate fast with LangWatch Evaluations\ - [Full visibility into yourLLM application stack](https://langwatch.ai/observability): LLM observability for engineers\'97trace inputs and outputs, monitor latency and errors, and debug behavior across models and environments with LangWatch\ - [Ship better prompts with DSPy and LangWatch](https://langwatch.ai/llm-optimizations): Optimize LLM app performance with LangWatch\'97debug latency, fine-tune prompts, reduce hallucinations, and iterate faster with eval-based feedback\ - [Security & compliance](https://langwatch.ai/security): Enterprise-ready LLM security with LangWatch\'97SOC 2, ISO 27001, RBAC, redaction, and full audit trails for compliant AI deployments\ - [Mitigate Gen AI risks with Guardrails](https://langwatch.ai/guardrails): Implement robust guardrails for your LLM apps\'97detect toxic outputs, prevent data leaks, and enforce safe, compliant behavior with LangWatch\ - [\uc0\u55358 \u56810 Agentic Flow Testing](https://langwatch.ai/scenarios): Scenario automates end-to-end testing for LLM agents - simulate human interactions across defined flows, catch regressions, and validate goal completion\ - [LangWatch vs LangSmith vs LangFuse](https://langwatch.ai/comparison): Compare LangWatch, LangSmith, and LangFuse\'97see how they differ in LLM observability, evaluations, guardrails, and production readiness\ - [Actionable insights for your AI applications](https://langwatch.ai/analytics): Visualize how users interact with your LLM app - analyze usage patterns, drop-offs, and success rates to improve experience and outcomes.\ \ ## Blog\ \ - [Building an LLM Eval framework that actually works in practice](https://langwatch.ai/blog/building-an-llm-eval-franework-that-actually-works-in-practice): Evals are the thing of 2025 and building high-quality Evals is one of the most impact investments you can make.\ - [April Product Recap: Selene Integration, Eval Wizard Upgrades, Prompt Studio & More](https://langwatch.ai/blog/april-product-recap-selene-integration-eval-wizard-upgrades-prompt-studio-more): LangWatch Selente - Atla, LLM Evaluations, prompt versioning, structured output, OpenTelemetry SDK, LLMops ISO certified\ - [LLM Monitoring & Evaluation for Real-World Production Use](https://langwatch.ai/blog/llm-monitoring-evaluation-for-real-world-production-use): Key challenges teams face when put LLM-powered apps in production, and why continuous monitoring and evaluation is essential\ - [Systematically Improving RAG Agents](https://langwatch.ai/blog/systematically-improving-rag-agents): Improving RAG agents: Build a basic system, Create evaluation data, run experiments\ - [Introducing the Evaluations Wizard: How to evaluate your LLM: Building an LLM evaluation framework that actually works](https://langwatch.ai/blog/introducing-the-evaluations-wizard-your-end-to-end-workflow-for-llm-testing): Learn how to effectively evaluate and test LLMs with LangWatch's new Evaluations Wizard. Improve your AI model performance\ - [Function Calling vs. MCP: Why You Need Both\'97and How LangWatch Makes It Click](https://langwatch.ai/blog/function-calling-vs-mcp-why-you-need-both-and-how-langwatch-makes-it-click): What is MCP? What does MCP stand for? And what is Function Calling?\ - [Why LLM Observability is Now Table Stakes](https://langwatch.ai/blog/why-llm-observability-is-now-table-stakes): The start of LLMOps: DevOps for Generative AI\ - [LangWatch vs. LangSmith vs. Braintrust vs. Langfuse: Choosing the Best LLM Evaluation & Monitoring Tool in 2025](https://langwatch.ai/blog/langwatch-vs-langsmith-vs-braintrust-vs-langfuse-choosing-the-best-llm-evaluation-monitoring-tool-in-2025): Compare LangWatch, LangSmith, Braintrust, and Langfuse in this 2025 guide to LLM evaluation and monitoring tools\ - [Introducing Scenario: Use an Agent to Test Your Agent](https://langwatch.ai/blog/introducing-scenario-use-an-agent-to-test-your-agent): Scenario is an automated testing library for LLM agents that simulates real user interactions end-to-end.\ - [LLM evaluations at Swis for Dutch government projects by LangWatch](https://langwatch.ai/blog/llm-evaluations-at-swis-for-dutch-government-projects-by-langwatch): How do we objectively know if the AI output is good? LLM evaluation reports & feedback loops\ - [LangWatch and adesso join forces: Accelerating Secure LLM Adoption for Enterprises](https://langwatch.ai/blog/langwatch-and-adesso-join-forces-accelerating-secure-llm-adoption-for-enterprises): LangWatch partners with Adesso to support Enterprise companies with LLMops\ - [Why Your AI Team Needs an AI PM (Quality) Lead](https://langwatch.ai/blog/why-your-ai-team-needs-an-ai-pm-(quality)-lead): The best GenAI teams are now introducing a critical new role: the AI PM (Quality) Lead.\ - [LLMOps Is Still About People: How to Build AI Teams That Don\'92t Implode](https://langwatch.ai/blog/llmops-is-still-about-people-how-to-build-ai-teams-that-don-t-implode): LLMs can do amazing things, but only if they understand context. That context lives in the heads of domain experts.\ - [Tackling LLM Hallucinations with LangWatch: Why Monitoring and Evaluation Matter](https://langwatch.ai/blog/tackling-llm-hallucinations-with-langwatch-why-monitoring-and-evaluation-matter): What are LLM Hallucinations? What causes LLM hallucinations? How to monitor and evaluate LLM-apps\ - [What is Model Context Protocol (MCP)? And how's LangWatch involved?](https://langwatch.ai/blog/what-is-model-context-protocol-(mcp)-and-how-s-langwatch-involved): The Model Context Protocol is a new standard that lets AI agents easily connect to external tools and data sources.\ - [How PHWL.ai uses LLM Observability and Optimization to Improve AI Coaching with LangWatch](https://langwatch.ai/blog/how-phwl-ai-uses-llm-observability-and-optimization-to-improve-ai-coaching-with-langwatch): Improve your LLM performance with real-time observability and optimization\ - [LangWatch.ai - Announcing - \'801M funding round to bring the power of Evaluations and Auto-Optimizations to AI teams.](https://langwatch.ai/blog/langwatch-ai-announcing-1m-funding-round-to-bring-the-power-of-evaluations-to-ai-teams): LangWatch: \'801M pre-seed funding round led by Passion Capital, with great support from Volta Ventures and Antler.\ - [OpenAI, Anthropic, Deepseek and other LLM Providers keep dropping prices: Should you host your own model?](https://langwatch.ai/blog/openai-anthropic-deepseek-and-other-llm-providers-keep-dropping-prices-should-you-host-your-own-model): OpenAI, Anthropic, Deepseek and other LLM Providers keep dropping prices: Should you host your own model?\ - [7 Predictions for AI in 2025: A CTO's, Rogerio Chaves Perspective](https://langwatch.ai/blog/7-predictions-for-ai-in-2025-a-cto-s-rogerio-chaves-perspective): AI is evolving at speed, and the landscape in 2025 will be shaped across agents, multimodal data, and model efficiency.\ - [Customer Stories: HolidayHero AI start-up <> LangWatch](https://langwatch.ai/blog/holidayhero-customercase-with-langwatch-ai): LangWatch has been a part of HolidayHero's LLM production environment for over two months, overseeing thousands of guestchats\ - [LangWatch Optimization Studio \'96 Built for AI Engineers, by AI Engineers](https://langwatch.ai/blog/langwatch-optimization-studio-built-for-ai-engineers-by-ai-engineers): LangWatch Optimization Studio \'96 Built for AI Engineers, by AI Engineers\ - [The power of MIPROv2 (DSPy) in a Low-Code environment with LangWatch\'92s Optimization Studio](https://langwatch.ai/blog/the-power-of-miprov2-in-a-low-code-environment-with-langwatch-s-optimization-studio): Leverage the power of DSPy\'92s MIPROv2 without diving into complex code? Enter LangWatch\'92s Optimization Studio\ - [What is Prompt Optimization? An Introduction to DSPy and Optimization Studio](https://langwatch.ai/blog/what-is-prompt-optimization-an-introduction-to-dspy-and-optimization-studio): LangWatch\'92s Optimization Studio, a more precise, scientific and better approach to prompt optimization\ - [Deploying an OpenAI RAG Application to AWS ElasticBeanstalk](https://langwatch.ai/blog/deploying-openai-rag-application-to-aws-elasticbeanstalk): This tutorial guides you through building chatbots using Retrieval Augmented Generation with OpenAI in Python using FastAPI\ - [The complete guide for TDD with\'a0LLMs](https://langwatch.ai/blog/tdd-with-llms): How can we test in a probabilistic environment? Test Driven Development for LLM's\ - [Data Flywheel: Using your production data to build better LLM products](https://langwatch.ai/blog/data-flywheel): Data Flywheel: using your production data to build better LLM products\ - [How Algomo reduced AI hallucinations with LangWatch](https://langwatch.ai/blog/customer-case-study-algomos-experience-with-langwatch): How Algomo increased the quality of their AI app with LangWatch\ - [The AI Team: Integrating User and Domain Expert Feedback to Enhance LLM-Powered Applications](https://langwatch.ai/blog/the-ai-team-integrating-user-and-domain-expert-feedback-to-enhance-llm-powered-applications): Understand what is The AI Team and what are Their Roles\ - [Unit Testing Your LLM: The Power of Datasets](https://langwatch.ai/blog/unit-testing-your-llm-the-power-of-datasets): Understand how to leverage datasets for LLM unit testing\ - [Introducing DSPy Visualizer](https://langwatch.ai/blog/introducing-dspy-visualizer): What is DSPy? DSPy Visualizer, allows you to log your DSPy training sessions, track the performance\ - [New Dutch Startup, LangWatch, brings much-needed quality control to GenAI](https://langwatch.ai/blog/new-dutch-startup-langwatch-brings-much-needed-quality-control-to-genai): LangWatch, a new innovative Amsterdam-based startup: Meet the Team\ - [How to build a RAG application from scratch with the least possible AI Hallucinations](https://langwatch.ai/blog/how-to-build-a-rag-chatbot-from-scratch-with-the-least-possible-ai-hallucinations): Driving to help AI leaders create RAG chatbots with minimal hallucinations\ - [Safeguarding Your First LLM-Powered Innovation: Essential Practices for Security](https://langwatch.ai/blog/safeguarding-your-first-llm-powered-innovation-essential-practices-for-security): Journey of launching your first LLM-powered product is filled with potential and challenges.\ - [LLM Reliability with Retrieval-Augmented Generation](https://langwatch.ai/blog/llm-reliability-with-retrieval-augmented-generation): Retrieval Augmented Generation. Its popularity continues to surge, offering various methods for its successful implementation\ - [What is User Analytics for LLMs, The Difference With Traditional Analytics, And Why is it Important?](https://langwatch.ai/blog/what-is-user-analytics-for-llms-the-difference-with-traditional-analytics-and-why-is-it-important): Discover how User Analytics for LLMs can transform AI interactions, revealing user behavior\ - [Unlocking the Potential of Large Language Models: The LLM's Beyond the Hype](https://langwatch.ai/blog/unlocking-the-potential-of-large-language-models-the-llm-s-beyond-the-hype): Successfully integrating LLMs into your business requires careful monitoring and evaluation of options\ - [The 8 Types of LLM Hallucinations](https://langwatch.ai/blog/the-8-types-of-llm-hallucinations): Delve into the challenges of LLM hallucinations, explore their types, causes, and effective mitigation strategies\ \ _For more information, visit [LangWatch.ai](https://www.langwatch.ai) or contact our security team at **security@langwatch.ai**_