LangWatch.ai - Announcing - €1M funding round to bring the power of Evaluations and Auto-Optimizations to AI teams.

Manouk

Feb 25, 2025

Today, we are supeeeer excited to share some big news from LangWatch. We have closed a €1M pre-seed funding round led by Passion Capital, with great support from Volta Ventures and Antler. This marks a key step in our journey to change how AI-powered applications are built and maintained.

How LangWatch started

While experimenting with building various different GenAI solutions. One thing quickly became clear: It is very hard to understand how users interacted with these systems and to share issues with engineers. At the same time, our CTO, Rogerio Chaves, was building products at Booking.com, where the need to evaluate LLM-based applications became obvious as they started using these models.

Tackling the unique challenges of LLMs

Anyone who’s worked with large language models (LLMs) knows that consistency isn’t a given. Unlike traditional software—where clicking a button always gives the same result—LLMs can produce many different answers to the same prompt. This unpredictable behavior poses a big challenge for AI teams aiming for reliability and efficiency.

Generative AI works in a non-deterministic way, meaning factors like model type, parameters, data, context, or even how a question is asked can change the quality and performance. Issues such as hallucinations, inaccuracies, and safety risks can hurt user trust and damage a brand’s reputation. We’ve seen this in examples like DPD's swearing chatbot and Air Canada's lying chatbot.

Today, many organizations rely on manual, non-scalable methods that slow down their development cycles. They often only notice problems after deployment and then make reactive fixes. They lack a solid system to check if improvements are really happening as they update to a new model or make changes to their pipeline.

With new models launching every other week (see recent launches of Deepseek, Grok3) or other models, new opportunities arise around the performance or decrease in costs. But how do you ensure that shifting to a newer model remains the same or better quality? How do we control what prompts performance better for which use-cases?

LangWatch’s solution sits between the models and application layers of the GenAI stack, bringing the best practices of test-driven development into AI workflows and solving the biggest painpoints AI developers are facing today. Whether that's understanding what prompt, examples or models work best or spending less time and money on LLM-development. Along with our co-founder Rogerio and our amazing team, I’m thrilled to be on this mission to speed up how developers—and even non-technical team members—build the next generation of software.

LangWatch's LLMops Solution

We've built our platform from the ground up, keeping in mind the needs of large AI organizations. This includes seamless self-hosted deployments, open-source availability, robust role-based access control, compliance, and enhanced collaboration features. Additionally, we provide custom dataset support and LangWatch-managed human evaluation for the last mile of AI deployment.

It's early days, and our customers who we have been working closely with have been able to test, iterate, and ship +10x faster, leveraging the LangWatch platform. The product is anchored to these fundamental principles:

  • Automated Quality Checks (evaluations): Having real-time alerts on when the LLM goes of the rails, spitting out harmful, inaccurate output. Or instead of spending up to a week testing each new model manually, our platform automatically runs quality control, ensuring consistent performance.

  • Continuous Improvement: With tools like DSPy-based optimizers, our platform not only finds issues—it helps your team continually improve your AI solutions with automatic-optimizers.

  • User-friendly interface: Whether you're an experienced AI engineer or a developer collaborating with domain experts, our intuitive interface allows you to comment in an email-like experience, seamlessly onboard experts, and automatically generate datasets from annotated feedback.

  • Developer focussed & Enterprise ready - Open-Source availability, model agnostic, keep running your own code, compliance certified (ISO) and ready to host anywhere on your side.

Redefining evaluations

Evaluations are at the heart of our platform. Today, many teams manage evaluations in silos—whether it’s batch testing via CI/CD, vibe-checking outputs, optimizing prompts, or evaluating RAG quality. We believe it’s time to bring all these approaches together. Imagine an evaluations page that isn’t just for monitoring or batch runs, but one that encapsulates the entire evaluation process. Whether you’re setting up real-time evaluations, running a batch evaluation of your LLM pipeline, or building a custom evaluator, each step is designed around your workflow and pain points.

Our Excitement about the future

While this funding milestone is a major achievement, it’s just the beginning. LangWatch is dedicated to helping AI teams build world-class solutions for the toughest challenges. We’re expanding our platform across the evaluation stack and optimization studio to support faster, more reliable, and scalable AI deployments.

We’re excited to create a truly unified LLM operations experience—one that grows with your needs and evolves as the AI landscape changes. Join us on this journey as we build the next generation of AI development tools, where every piece fits perfectly into the bigger picture. If our vision resonates with you, we’d love to have a chat.

We’re also hiring in several areas, including AI research, solution engineering, full-stack software engineering, and GTM. Join our team if you’re inspired by our vision.



Today, we are supeeeer excited to share some big news from LangWatch. We have closed a €1M pre-seed funding round led by Passion Capital, with great support from Volta Ventures and Antler. This marks a key step in our journey to change how AI-powered applications are built and maintained.

How LangWatch started

While experimenting with building various different GenAI solutions. One thing quickly became clear: It is very hard to understand how users interacted with these systems and to share issues with engineers. At the same time, our CTO, Rogerio Chaves, was building products at Booking.com, where the need to evaluate LLM-based applications became obvious as they started using these models.

Tackling the unique challenges of LLMs

Anyone who’s worked with large language models (LLMs) knows that consistency isn’t a given. Unlike traditional software—where clicking a button always gives the same result—LLMs can produce many different answers to the same prompt. This unpredictable behavior poses a big challenge for AI teams aiming for reliability and efficiency.

Generative AI works in a non-deterministic way, meaning factors like model type, parameters, data, context, or even how a question is asked can change the quality and performance. Issues such as hallucinations, inaccuracies, and safety risks can hurt user trust and damage a brand’s reputation. We’ve seen this in examples like DPD's swearing chatbot and Air Canada's lying chatbot.

Today, many organizations rely on manual, non-scalable methods that slow down their development cycles. They often only notice problems after deployment and then make reactive fixes. They lack a solid system to check if improvements are really happening as they update to a new model or make changes to their pipeline.

With new models launching every other week (see recent launches of Deepseek, Grok3) or other models, new opportunities arise around the performance or decrease in costs. But how do you ensure that shifting to a newer model remains the same or better quality? How do we control what prompts performance better for which use-cases?

LangWatch’s solution sits between the models and application layers of the GenAI stack, bringing the best practices of test-driven development into AI workflows and solving the biggest painpoints AI developers are facing today. Whether that's understanding what prompt, examples or models work best or spending less time and money on LLM-development. Along with our co-founder Rogerio and our amazing team, I’m thrilled to be on this mission to speed up how developers—and even non-technical team members—build the next generation of software.

LangWatch's LLMops Solution

We've built our platform from the ground up, keeping in mind the needs of large AI organizations. This includes seamless self-hosted deployments, open-source availability, robust role-based access control, compliance, and enhanced collaboration features. Additionally, we provide custom dataset support and LangWatch-managed human evaluation for the last mile of AI deployment.

It's early days, and our customers who we have been working closely with have been able to test, iterate, and ship +10x faster, leveraging the LangWatch platform. The product is anchored to these fundamental principles:

  • Automated Quality Checks (evaluations): Having real-time alerts on when the LLM goes of the rails, spitting out harmful, inaccurate output. Or instead of spending up to a week testing each new model manually, our platform automatically runs quality control, ensuring consistent performance.

  • Continuous Improvement: With tools like DSPy-based optimizers, our platform not only finds issues—it helps your team continually improve your AI solutions with automatic-optimizers.

  • User-friendly interface: Whether you're an experienced AI engineer or a developer collaborating with domain experts, our intuitive interface allows you to comment in an email-like experience, seamlessly onboard experts, and automatically generate datasets from annotated feedback.

  • Developer focussed & Enterprise ready - Open-Source availability, model agnostic, keep running your own code, compliance certified (ISO) and ready to host anywhere on your side.

Redefining evaluations

Evaluations are at the heart of our platform. Today, many teams manage evaluations in silos—whether it’s batch testing via CI/CD, vibe-checking outputs, optimizing prompts, or evaluating RAG quality. We believe it’s time to bring all these approaches together. Imagine an evaluations page that isn’t just for monitoring or batch runs, but one that encapsulates the entire evaluation process. Whether you’re setting up real-time evaluations, running a batch evaluation of your LLM pipeline, or building a custom evaluator, each step is designed around your workflow and pain points.

Our Excitement about the future

While this funding milestone is a major achievement, it’s just the beginning. LangWatch is dedicated to helping AI teams build world-class solutions for the toughest challenges. We’re expanding our platform across the evaluation stack and optimization studio to support faster, more reliable, and scalable AI deployments.

We’re excited to create a truly unified LLM operations experience—one that grows with your needs and evolves as the AI landscape changes. Join us on this journey as we build the next generation of AI development tools, where every piece fits perfectly into the bigger picture. If our vision resonates with you, we’d love to have a chat.

We’re also hiring in several areas, including AI research, solution engineering, full-stack software engineering, and GTM. Join our team if you’re inspired by our vision.



Today, we are supeeeer excited to share some big news from LangWatch. We have closed a €1M pre-seed funding round led by Passion Capital, with great support from Volta Ventures and Antler. This marks a key step in our journey to change how AI-powered applications are built and maintained.

How LangWatch started

While experimenting with building various different GenAI solutions. One thing quickly became clear: It is very hard to understand how users interacted with these systems and to share issues with engineers. At the same time, our CTO, Rogerio Chaves, was building products at Booking.com, where the need to evaluate LLM-based applications became obvious as they started using these models.

Tackling the unique challenges of LLMs

Anyone who’s worked with large language models (LLMs) knows that consistency isn’t a given. Unlike traditional software—where clicking a button always gives the same result—LLMs can produce many different answers to the same prompt. This unpredictable behavior poses a big challenge for AI teams aiming for reliability and efficiency.

Generative AI works in a non-deterministic way, meaning factors like model type, parameters, data, context, or even how a question is asked can change the quality and performance. Issues such as hallucinations, inaccuracies, and safety risks can hurt user trust and damage a brand’s reputation. We’ve seen this in examples like DPD's swearing chatbot and Air Canada's lying chatbot.

Today, many organizations rely on manual, non-scalable methods that slow down their development cycles. They often only notice problems after deployment and then make reactive fixes. They lack a solid system to check if improvements are really happening as they update to a new model or make changes to their pipeline.

With new models launching every other week (see recent launches of Deepseek, Grok3) or other models, new opportunities arise around the performance or decrease in costs. But how do you ensure that shifting to a newer model remains the same or better quality? How do we control what prompts performance better for which use-cases?

LangWatch’s solution sits between the models and application layers of the GenAI stack, bringing the best practices of test-driven development into AI workflows and solving the biggest painpoints AI developers are facing today. Whether that's understanding what prompt, examples or models work best or spending less time and money on LLM-development. Along with our co-founder Rogerio and our amazing team, I’m thrilled to be on this mission to speed up how developers—and even non-technical team members—build the next generation of software.

LangWatch's LLMops Solution

We've built our platform from the ground up, keeping in mind the needs of large AI organizations. This includes seamless self-hosted deployments, open-source availability, robust role-based access control, compliance, and enhanced collaboration features. Additionally, we provide custom dataset support and LangWatch-managed human evaluation for the last mile of AI deployment.

It's early days, and our customers who we have been working closely with have been able to test, iterate, and ship +10x faster, leveraging the LangWatch platform. The product is anchored to these fundamental principles:

  • Automated Quality Checks (evaluations): Having real-time alerts on when the LLM goes of the rails, spitting out harmful, inaccurate output. Or instead of spending up to a week testing each new model manually, our platform automatically runs quality control, ensuring consistent performance.

  • Continuous Improvement: With tools like DSPy-based optimizers, our platform not only finds issues—it helps your team continually improve your AI solutions with automatic-optimizers.

  • User-friendly interface: Whether you're an experienced AI engineer or a developer collaborating with domain experts, our intuitive interface allows you to comment in an email-like experience, seamlessly onboard experts, and automatically generate datasets from annotated feedback.

  • Developer focussed & Enterprise ready - Open-Source availability, model agnostic, keep running your own code, compliance certified (ISO) and ready to host anywhere on your side.

Redefining evaluations

Evaluations are at the heart of our platform. Today, many teams manage evaluations in silos—whether it’s batch testing via CI/CD, vibe-checking outputs, optimizing prompts, or evaluating RAG quality. We believe it’s time to bring all these approaches together. Imagine an evaluations page that isn’t just for monitoring or batch runs, but one that encapsulates the entire evaluation process. Whether you’re setting up real-time evaluations, running a batch evaluation of your LLM pipeline, or building a custom evaluator, each step is designed around your workflow and pain points.

Our Excitement about the future

While this funding milestone is a major achievement, it’s just the beginning. LangWatch is dedicated to helping AI teams build world-class solutions for the toughest challenges. We’re expanding our platform across the evaluation stack and optimization studio to support faster, more reliable, and scalable AI deployments.

We’re excited to create a truly unified LLM operations experience—one that grows with your needs and evolves as the AI landscape changes. Join us on this journey as we build the next generation of AI development tools, where every piece fits perfectly into the bigger picture. If our vision resonates with you, we’d love to have a chat.

We’re also hiring in several areas, including AI research, solution engineering, full-stack software engineering, and GTM. Join our team if you’re inspired by our vision.