April Product Recap: Selene Integration, Eval Wizard Upgrades, Prompt Studio & More

Manouk
May 5, 2025
April was one of our biggest months yet at LangWatch. We shipped powerful new features, launched major integrations, and doubled down on making LLM product development more reliable, secure, and flexible. Whether you're building eval pipelines, tuning prompts, or deploying agent workflows, there’s something new for you.
Let’s dive in. 👇

Atla Selene ❤️ LangWatch
New integration with the best LLM-as-a-Judge model on the market
We’re thrilled to announce a new integration with Atla’s Selene—a purpose-built evaluation model that outperforms OpenAI and Anthropic across 11 common benchmarks.
Now available directly inside LangWatch’s Evaluation Wizard, Selene makes it easy to run high-quality LLM-as-a-Judge evaluations, both offline and in real-time.
💡 Why use Selene?
LLM-as-a-Judge evaluations use one model to evaluate another. Selene specializes in this task—scoring outputs on dimensions like helpfulness, accuracy, or relevance with more reliability than general-purpose models.
Try it now — it’s free for a limited time in the Evaluation Wizard. You can run Selene for:
Offline dataset evals
Live, real-time guardrails
Custom scoring metrics with tailored evaluation prompts
Evaluation Wizard: Bigger, Better, Custom
You all loved the Eval Wizard—and we listened.
This month, we shipped major upgrades to make evaluation easier and more powerful:
AI Dataset Generation – no data? No problem. Auto-generate evaluation datasets in seconds.
Custom Evals – use LangWatch workflows to build your own evaluators with full control.
AI-named Versions – every change to an evaluation run gets automatically named and versioned using AI, so you can track your iteration history at a glance.
Wizard → Workflow Jump – jump from a quick wizard flow to a full custom workflow when you’re ready for more complexity.
From zero to advanced evaluations- all in one place.

Prompt Studio: Git for Prompts
Prompt iteration just got an upgrade.
Introducing Prompt Studio – a dedicated workspace for managing, testing, and versioning prompts like code.
What’s new:
Prompt Versioning – prompts are now first-class citizens, with their own version history, reusable across workflows.
API-first Access – manage and inject prompts directly from your codebase.
Collaborative UX – work side-by-side with your prompt engineers or non-technical teammates.
Quick Iteration – evaluate prompts on datasets, compare model outputs, tweak and roll back in seconds.
Perfect whether you're refining one prompt or scaling a prompt library for your team.

Agent Frameworks, Cookbooks & Comparisons
We built the same agent app in 7 different AI agent frameworks—from LangChain to Autogen to CrewAI and more.
Why? So you can compare frameworks side-by-side with real code, working examples, and head-to-head evaluations.
Check out:
create-agent-app
— open source starter repo
New cookbooks and tutorials for agent workflows
Guides on RAG + Eval best practices
Deep dives into vector vs hybrid search
Finetuning strategies for embedding models
We’re bringing clarity to the growing agent ecosystem.
🔐 Enterprise & Security: ISO Certified, SSO, Redaction & More
LangWatch is now ISO 27001:2022 certified.
We’ve also shipped a series of enterprise-grade features:
Custom SSO – integrate with your identity provider of choice
Input Redaction – redact sensitive data from inputs while keeping metrics and alerts visible
Hybrid Deployment Docs – use LangWatch Cloud, but keep all sensitive LLM I/O on your side
Security, flexibility, and compliance—built in.
🎓 Join Our Webinar: Building LLM Evals You Can Trust
Want to dive deeper into building better evals? We’re hosting a live webinar covering:
Online vs Offline evaluation strategies
Using Selene and other LLM Judges
Best practices for prompt evaluation
Real-time evals as guardrails for production
⭐ Help Us Grow
If you're excited about what we're building, show us some love:
👉 Star the repo
👉 Follow us on X
👉 Join the convo on Discord
More updates coming soon. Until then—keep building.
—
LangWatch: Build with confidence.
April was one of our biggest months yet at LangWatch. We shipped powerful new features, launched major integrations, and doubled down on making LLM product development more reliable, secure, and flexible. Whether you're building eval pipelines, tuning prompts, or deploying agent workflows, there’s something new for you.
Let’s dive in. 👇

Atla Selene ❤️ LangWatch
New integration with the best LLM-as-a-Judge model on the market
We’re thrilled to announce a new integration with Atla’s Selene—a purpose-built evaluation model that outperforms OpenAI and Anthropic across 11 common benchmarks.
Now available directly inside LangWatch’s Evaluation Wizard, Selene makes it easy to run high-quality LLM-as-a-Judge evaluations, both offline and in real-time.
💡 Why use Selene?
LLM-as-a-Judge evaluations use one model to evaluate another. Selene specializes in this task—scoring outputs on dimensions like helpfulness, accuracy, or relevance with more reliability than general-purpose models.
Try it now — it’s free for a limited time in the Evaluation Wizard. You can run Selene for:
Offline dataset evals
Live, real-time guardrails
Custom scoring metrics with tailored evaluation prompts
Evaluation Wizard: Bigger, Better, Custom
You all loved the Eval Wizard—and we listened.
This month, we shipped major upgrades to make evaluation easier and more powerful:
AI Dataset Generation – no data? No problem. Auto-generate evaluation datasets in seconds.
Custom Evals – use LangWatch workflows to build your own evaluators with full control.
AI-named Versions – every change to an evaluation run gets automatically named and versioned using AI, so you can track your iteration history at a glance.
Wizard → Workflow Jump – jump from a quick wizard flow to a full custom workflow when you’re ready for more complexity.
From zero to advanced evaluations- all in one place.

Prompt Studio: Git for Prompts
Prompt iteration just got an upgrade.
Introducing Prompt Studio – a dedicated workspace for managing, testing, and versioning prompts like code.
What’s new:
Prompt Versioning – prompts are now first-class citizens, with their own version history, reusable across workflows.
API-first Access – manage and inject prompts directly from your codebase.
Collaborative UX – work side-by-side with your prompt engineers or non-technical teammates.
Quick Iteration – evaluate prompts on datasets, compare model outputs, tweak and roll back in seconds.
Perfect whether you're refining one prompt or scaling a prompt library for your team.

Agent Frameworks, Cookbooks & Comparisons
We built the same agent app in 7 different AI agent frameworks—from LangChain to Autogen to CrewAI and more.
Why? So you can compare frameworks side-by-side with real code, working examples, and head-to-head evaluations.
Check out:
create-agent-app
— open source starter repo
New cookbooks and tutorials for agent workflows
Guides on RAG + Eval best practices
Deep dives into vector vs hybrid search
Finetuning strategies for embedding models
We’re bringing clarity to the growing agent ecosystem.
🔐 Enterprise & Security: ISO Certified, SSO, Redaction & More
LangWatch is now ISO 27001:2022 certified.
We’ve also shipped a series of enterprise-grade features:
Custom SSO – integrate with your identity provider of choice
Input Redaction – redact sensitive data from inputs while keeping metrics and alerts visible
Hybrid Deployment Docs – use LangWatch Cloud, but keep all sensitive LLM I/O on your side
Security, flexibility, and compliance—built in.
🎓 Join Our Webinar: Building LLM Evals You Can Trust
Want to dive deeper into building better evals? We’re hosting a live webinar covering:
Online vs Offline evaluation strategies
Using Selene and other LLM Judges
Best practices for prompt evaluation
Real-time evals as guardrails for production
⭐ Help Us Grow
If you're excited about what we're building, show us some love:
👉 Star the repo
👉 Follow us on X
👉 Join the convo on Discord
More updates coming soon. Until then—keep building.
—
LangWatch: Build with confidence.
April was one of our biggest months yet at LangWatch. We shipped powerful new features, launched major integrations, and doubled down on making LLM product development more reliable, secure, and flexible. Whether you're building eval pipelines, tuning prompts, or deploying agent workflows, there’s something new for you.
Let’s dive in. 👇

Atla Selene ❤️ LangWatch
New integration with the best LLM-as-a-Judge model on the market
We’re thrilled to announce a new integration with Atla’s Selene—a purpose-built evaluation model that outperforms OpenAI and Anthropic across 11 common benchmarks.
Now available directly inside LangWatch’s Evaluation Wizard, Selene makes it easy to run high-quality LLM-as-a-Judge evaluations, both offline and in real-time.
💡 Why use Selene?
LLM-as-a-Judge evaluations use one model to evaluate another. Selene specializes in this task—scoring outputs on dimensions like helpfulness, accuracy, or relevance with more reliability than general-purpose models.
Try it now — it’s free for a limited time in the Evaluation Wizard. You can run Selene for:
Offline dataset evals
Live, real-time guardrails
Custom scoring metrics with tailored evaluation prompts
Evaluation Wizard: Bigger, Better, Custom
You all loved the Eval Wizard—and we listened.
This month, we shipped major upgrades to make evaluation easier and more powerful:
AI Dataset Generation – no data? No problem. Auto-generate evaluation datasets in seconds.
Custom Evals – use LangWatch workflows to build your own evaluators with full control.
AI-named Versions – every change to an evaluation run gets automatically named and versioned using AI, so you can track your iteration history at a glance.
Wizard → Workflow Jump – jump from a quick wizard flow to a full custom workflow when you’re ready for more complexity.
From zero to advanced evaluations- all in one place.

Prompt Studio: Git for Prompts
Prompt iteration just got an upgrade.
Introducing Prompt Studio – a dedicated workspace for managing, testing, and versioning prompts like code.
What’s new:
Prompt Versioning – prompts are now first-class citizens, with their own version history, reusable across workflows.
API-first Access – manage and inject prompts directly from your codebase.
Collaborative UX – work side-by-side with your prompt engineers or non-technical teammates.
Quick Iteration – evaluate prompts on datasets, compare model outputs, tweak and roll back in seconds.
Perfect whether you're refining one prompt or scaling a prompt library for your team.

Agent Frameworks, Cookbooks & Comparisons
We built the same agent app in 7 different AI agent frameworks—from LangChain to Autogen to CrewAI and more.
Why? So you can compare frameworks side-by-side with real code, working examples, and head-to-head evaluations.
Check out:
create-agent-app
— open source starter repo
New cookbooks and tutorials for agent workflows
Guides on RAG + Eval best practices
Deep dives into vector vs hybrid search
Finetuning strategies for embedding models
We’re bringing clarity to the growing agent ecosystem.
🔐 Enterprise & Security: ISO Certified, SSO, Redaction & More
LangWatch is now ISO 27001:2022 certified.
We’ve also shipped a series of enterprise-grade features:
Custom SSO – integrate with your identity provider of choice
Input Redaction – redact sensitive data from inputs while keeping metrics and alerts visible
Hybrid Deployment Docs – use LangWatch Cloud, but keep all sensitive LLM I/O on your side
Security, flexibility, and compliance—built in.
🎓 Join Our Webinar: Building LLM Evals You Can Trust
Want to dive deeper into building better evals? We’re hosting a live webinar covering:
Online vs Offline evaluation strategies
Using Selene and other LLM Judges
Best practices for prompt evaluation
Real-time evals as guardrails for production
⭐ Help Us Grow
If you're excited about what we're building, show us some love:
👉 Star the repo
👉 Follow us on X
👉 Join the convo on Discord
More updates coming soon. Until then—keep building.
—
LangWatch: Build with confidence.
Boost your LLM's performance today
Get up and running with LangWatch in as little as 10 minutes.
Documentation
Features
Boost your LLM's performance today
Get up and running with LangWatch in as little as 10 minutes.
Documentation
Features
Boost your LLM's performance today
Get up and running with LangWatch in as little as 10 minutes.
Documentation
Features