April Product Recap: Selene Integration, Eval Wizard Upgrades, Prompt Studio & More

Manouk Draisma

Manouk

May 5, 2025

April was one of our biggest months yet at LangWatch. We shipped powerful new features, launched major integrations, and doubled down on making LLM product development more reliable, secure, and flexible. Whether you're building eval pipelines, tuning prompts, or deploying agent workflows, there’s something new for you.

Let’s dive in. 👇

Atla Selene ❤️ LangWatch

New integration with the best LLM-as-a-Judge model on the market

We’re thrilled to announce a new integration with Atla’s Selene—a purpose-built evaluation model that outperforms OpenAI and Anthropic across 11 common benchmarks.

Now available directly inside LangWatch’s Evaluation Wizard, Selene makes it easy to run high-quality LLM-as-a-Judge evaluations, both offline and in real-time.

💡 Why use Selene?
LLM-as-a-Judge evaluations use one model to evaluate another. Selene specializes in this task—scoring outputs on dimensions like helpfulness, accuracy, or relevance with more reliability than general-purpose models.

Try it now — it’s free for a limited time in the Evaluation Wizard. You can run Selene for:

  • Offline dataset evals

  • Live, real-time guardrails

  • Custom scoring metrics with tailored evaluation prompts

▶️ See Selene in action →

Evaluation Wizard: Bigger, Better, Custom

You all loved the Eval Wizard—and we listened.

This month, we shipped major upgrades to make evaluation easier and more powerful:

  • AI Dataset Generation – no data? No problem. Auto-generate evaluation datasets in seconds.

  • Custom Evals – use LangWatch workflows to build your own evaluators with full control.

  • AI-named Versions – every change to an evaluation run gets automatically named and versioned using AI, so you can track your iteration history at a glance.

  • Wizard → Workflow Jump – jump from a quick wizard flow to a full custom workflow when you’re ready for more complexity.

From zero to advanced evaluations- all in one place.

Prompt Studio: Git for Prompts

Prompt iteration just got an upgrade.
Introducing Prompt Studio – a dedicated workspace for managing, testing, and versioning prompts like code.

What’s new:

  • Prompt Versioning – prompts are now first-class citizens, with their own version history, reusable across workflows.

  • API-first Access – manage and inject prompts directly from your codebase.

  • Collaborative UX – work side-by-side with your prompt engineers or non-technical teammates.

  • Quick Iteration – evaluate prompts on datasets, compare model outputs, tweak and roll back in seconds.

Perfect whether you're refining one prompt or scaling a prompt library for your team.

Agent Frameworks, Cookbooks & Comparisons

We built the same agent app in 7 different AI agent frameworks—from LangChain to Autogen to CrewAI and more.

Why? So you can compare frameworks side-by-side with real code, working examples, and head-to-head evaluations.

Check out:

create-agent-app — open source starter repo


New cookbooks and tutorials for agent workflows

We’re bringing clarity to the growing agent ecosystem.

🔐 Enterprise & Security: ISO Certified, SSO, Redaction & More

LangWatch is now ISO 27001:2022 certified.
We’ve also shipped a series of enterprise-grade features:

  • Custom SSO – integrate with your identity provider of choice

  • Input Redaction – redact sensitive data from inputs while keeping metrics and alerts visible

  • Hybrid Deployment Docs – use LangWatch Cloud, but keep all sensitive LLM I/O on your side

Security, flexibility, and compliance—built in.

🎓 Join Our Webinar: Building LLM Evals You Can Trust

Want to dive deeper into building better evals? We’re hosting a live webinar covering:

  • Online vs Offline evaluation strategies

  • Using Selene and other LLM Judges

  • Best practices for prompt evaluation

  • Real-time evals as guardrails for production

🔗 Register here

⭐ Help Us Grow

If you're excited about what we're building, show us some love:

👉 Star the repo
👉 Follow us on X
👉 Join the convo on Discord

More updates coming soon. Until then—keep building.


LangWatch: Build with confidence.

April was one of our biggest months yet at LangWatch. We shipped powerful new features, launched major integrations, and doubled down on making LLM product development more reliable, secure, and flexible. Whether you're building eval pipelines, tuning prompts, or deploying agent workflows, there’s something new for you.

Let’s dive in. 👇

Atla Selene ❤️ LangWatch

New integration with the best LLM-as-a-Judge model on the market

We’re thrilled to announce a new integration with Atla’s Selene—a purpose-built evaluation model that outperforms OpenAI and Anthropic across 11 common benchmarks.

Now available directly inside LangWatch’s Evaluation Wizard, Selene makes it easy to run high-quality LLM-as-a-Judge evaluations, both offline and in real-time.

💡 Why use Selene?
LLM-as-a-Judge evaluations use one model to evaluate another. Selene specializes in this task—scoring outputs on dimensions like helpfulness, accuracy, or relevance with more reliability than general-purpose models.

Try it now — it’s free for a limited time in the Evaluation Wizard. You can run Selene for:

  • Offline dataset evals

  • Live, real-time guardrails

  • Custom scoring metrics with tailored evaluation prompts

▶️ See Selene in action →

Evaluation Wizard: Bigger, Better, Custom

You all loved the Eval Wizard—and we listened.

This month, we shipped major upgrades to make evaluation easier and more powerful:

  • AI Dataset Generation – no data? No problem. Auto-generate evaluation datasets in seconds.

  • Custom Evals – use LangWatch workflows to build your own evaluators with full control.

  • AI-named Versions – every change to an evaluation run gets automatically named and versioned using AI, so you can track your iteration history at a glance.

  • Wizard → Workflow Jump – jump from a quick wizard flow to a full custom workflow when you’re ready for more complexity.

From zero to advanced evaluations- all in one place.

Prompt Studio: Git for Prompts

Prompt iteration just got an upgrade.
Introducing Prompt Studio – a dedicated workspace for managing, testing, and versioning prompts like code.

What’s new:

  • Prompt Versioning – prompts are now first-class citizens, with their own version history, reusable across workflows.

  • API-first Access – manage and inject prompts directly from your codebase.

  • Collaborative UX – work side-by-side with your prompt engineers or non-technical teammates.

  • Quick Iteration – evaluate prompts on datasets, compare model outputs, tweak and roll back in seconds.

Perfect whether you're refining one prompt or scaling a prompt library for your team.

Agent Frameworks, Cookbooks & Comparisons

We built the same agent app in 7 different AI agent frameworks—from LangChain to Autogen to CrewAI and more.

Why? So you can compare frameworks side-by-side with real code, working examples, and head-to-head evaluations.

Check out:

create-agent-app — open source starter repo


New cookbooks and tutorials for agent workflows

We’re bringing clarity to the growing agent ecosystem.

🔐 Enterprise & Security: ISO Certified, SSO, Redaction & More

LangWatch is now ISO 27001:2022 certified.
We’ve also shipped a series of enterprise-grade features:

  • Custom SSO – integrate with your identity provider of choice

  • Input Redaction – redact sensitive data from inputs while keeping metrics and alerts visible

  • Hybrid Deployment Docs – use LangWatch Cloud, but keep all sensitive LLM I/O on your side

Security, flexibility, and compliance—built in.

🎓 Join Our Webinar: Building LLM Evals You Can Trust

Want to dive deeper into building better evals? We’re hosting a live webinar covering:

  • Online vs Offline evaluation strategies

  • Using Selene and other LLM Judges

  • Best practices for prompt evaluation

  • Real-time evals as guardrails for production

🔗 Register here

⭐ Help Us Grow

If you're excited about what we're building, show us some love:

👉 Star the repo
👉 Follow us on X
👉 Join the convo on Discord

More updates coming soon. Until then—keep building.


LangWatch: Build with confidence.

April was one of our biggest months yet at LangWatch. We shipped powerful new features, launched major integrations, and doubled down on making LLM product development more reliable, secure, and flexible. Whether you're building eval pipelines, tuning prompts, or deploying agent workflows, there’s something new for you.

Let’s dive in. 👇

Atla Selene ❤️ LangWatch

New integration with the best LLM-as-a-Judge model on the market

We’re thrilled to announce a new integration with Atla’s Selene—a purpose-built evaluation model that outperforms OpenAI and Anthropic across 11 common benchmarks.

Now available directly inside LangWatch’s Evaluation Wizard, Selene makes it easy to run high-quality LLM-as-a-Judge evaluations, both offline and in real-time.

💡 Why use Selene?
LLM-as-a-Judge evaluations use one model to evaluate another. Selene specializes in this task—scoring outputs on dimensions like helpfulness, accuracy, or relevance with more reliability than general-purpose models.

Try it now — it’s free for a limited time in the Evaluation Wizard. You can run Selene for:

  • Offline dataset evals

  • Live, real-time guardrails

  • Custom scoring metrics with tailored evaluation prompts

▶️ See Selene in action →

Evaluation Wizard: Bigger, Better, Custom

You all loved the Eval Wizard—and we listened.

This month, we shipped major upgrades to make evaluation easier and more powerful:

  • AI Dataset Generation – no data? No problem. Auto-generate evaluation datasets in seconds.

  • Custom Evals – use LangWatch workflows to build your own evaluators with full control.

  • AI-named Versions – every change to an evaluation run gets automatically named and versioned using AI, so you can track your iteration history at a glance.

  • Wizard → Workflow Jump – jump from a quick wizard flow to a full custom workflow when you’re ready for more complexity.

From zero to advanced evaluations- all in one place.

Prompt Studio: Git for Prompts

Prompt iteration just got an upgrade.
Introducing Prompt Studio – a dedicated workspace for managing, testing, and versioning prompts like code.

What’s new:

  • Prompt Versioning – prompts are now first-class citizens, with their own version history, reusable across workflows.

  • API-first Access – manage and inject prompts directly from your codebase.

  • Collaborative UX – work side-by-side with your prompt engineers or non-technical teammates.

  • Quick Iteration – evaluate prompts on datasets, compare model outputs, tweak and roll back in seconds.

Perfect whether you're refining one prompt or scaling a prompt library for your team.

Agent Frameworks, Cookbooks & Comparisons

We built the same agent app in 7 different AI agent frameworks—from LangChain to Autogen to CrewAI and more.

Why? So you can compare frameworks side-by-side with real code, working examples, and head-to-head evaluations.

Check out:

create-agent-app — open source starter repo


New cookbooks and tutorials for agent workflows

We’re bringing clarity to the growing agent ecosystem.

🔐 Enterprise & Security: ISO Certified, SSO, Redaction & More

LangWatch is now ISO 27001:2022 certified.
We’ve also shipped a series of enterprise-grade features:

  • Custom SSO – integrate with your identity provider of choice

  • Input Redaction – redact sensitive data from inputs while keeping metrics and alerts visible

  • Hybrid Deployment Docs – use LangWatch Cloud, but keep all sensitive LLM I/O on your side

Security, flexibility, and compliance—built in.

🎓 Join Our Webinar: Building LLM Evals You Can Trust

Want to dive deeper into building better evals? We’re hosting a live webinar covering:

  • Online vs Offline evaluation strategies

  • Using Selene and other LLM Judges

  • Best practices for prompt evaluation

  • Real-time evals as guardrails for production

🔗 Register here

⭐ Help Us Grow

If you're excited about what we're building, show us some love:

👉 Star the repo
👉 Follow us on X
👉 Join the convo on Discord

More updates coming soon. Until then—keep building.


LangWatch: Build with confidence.