How PHWL.ai uses LLM Observability and Optimization to Improve AI Coaching with LangWatch

Manouk

Mar 14, 2025

​​From AI guesswork to a structured LLM Optimization framework

About PHWL: A smarter approach to AI Coaching

At PHWL.ai, the mission has always been clear: make high-quality business coaching accessible to everyone, not just executives. Founded by an experienced business coach with over 20 years of expertise, the company developed an AI-powered coaching app that provides professional guidance through voice and text interactions.

However, as they scaled, a major challenge emerged—ensuring that the AI truly behaved like a coach, asking insightful questions rather than delivering lectures. How could they achieve LLM observability and optimize AI-generated conversations at scale?

The struggle with LLM observability and optimization

David, the CTO at PHWL.ai, had spent years fine-tuning the app’s backend. Before integrating LangWatch, PHWL.ai relied on manual methods to evaluate AI performance. While they had a structured coaching model, monitoring the AI’s adherence to that model was cumbersome.

“Our biggest issue was tracking when the AI drifted from the coaching framework,” David explained. “We needed a way to measure where it failed and how often it hit the right coaching notes in real conversations.”

The team experimented with LangSmith and other tools, but found its annotation and evaluation processes too rigid for their needs. PHWL.ai required a more flexible and intuitive system - one that allowed both technical and non-technical team members to collaborate seamlessly on improving AI responses.

Finding the perfect fit solution

That’s when PHWL.ai turned to LangWatch for LLM observability and optimization

From the start, LangWatch’s annotation workflow changed everything. For the first time, PHWL.ai’s coaching experts could efficiently triage AI responses, categorize issues, and provide feedback—all in a single, streamlined interface. Instead of navigating complex logs, the team could quickly identify patterns and make data-driven improvements to their AI models.

“What stood out was how LangWatch facilitated real collaboration,” said Malavika, an AI researcher at PHWL.ai. “Our coaching experts but also our founder—who isn’t technical—could now directly review AI outputs, annotate problem areas, and pass them to our team for optimization. This closed the loop between real-world coaching expertise and AI refinement.”

Beyond annotation, LangWatch’s Optimization Studio became a key part of PHWL.ai’s workflow. Unlike their previous setup, which required coding every change manually, LangWatch’s UI-based approach allowed them to experiment with prompts, hyperparameters, and even different LLMs—all without touching production code. However, when deeper customization was required, PHWL.ai still appreciated the flexibility to step in with coding.

“For simpler iterations, LangWatch’s UI saves us a ton of time, literally days” Malavika noted. “But when we need fine-grained control, we still have the ability to integrate our own logic and custom scripts without being locked into a rigid framework.” This balance between ease of use and technical flexibility made LangWatch an ideal solution.

Measurable results

With LangWatch in place, PHWL.ai saw immediate improvements:

  • With LangWatch’s LLM optimization features, AI tuning time decreased by ...%, thanks to faster prompt iteration and testing.

  • Annotation efforts were cut in half, allowing the team to focus on refining coaching strategies rather than manually sifting through chat logs.

  • AI responses became significantly more aligned with coaching best practices, as LLM-as-a-judge evaluations and RAG faithfulness scores helped fine-tune the model’s accuracy.

These improvements meant that users received a more natural and effective coaching experience, strengthening PHWL.ai’s value proposition in a competitive market.

The future of AI Coaching

As PHWL.ai continues to expand, LangWatch remains a critical part of their AI development pipeline. With plans to integrate multiple LLMs and further refine coaching interactions, they see LangWatch as more than just a monitoring tool—it’s a cornerstone of their AI-driven coaching evolution.

“LangWatch didn’t just help us optimize our AI—it fundamentally changed how we work,” David shared. “Now, everyone on our team—from engineers to coaching experts—can contribute to building a better AI coach.”

David Nicol (CTO @ PHWL.ai)

Ready to improve your AI’s performance with LangWatch?

If you’re struggling with finding the right tools for LLM observability, LLM optimization, or AI performance tracking. LangWatch provides the end-to-end observability, evaluation and optimization platform you need. 

Contact us today to see how LangWatch can help your AI deliver more reliable, effective, and human-like experiences.

​​From AI guesswork to a structured LLM Optimization framework

About PHWL: A smarter approach to AI Coaching

At PHWL.ai, the mission has always been clear: make high-quality business coaching accessible to everyone, not just executives. Founded by an experienced business coach with over 20 years of expertise, the company developed an AI-powered coaching app that provides professional guidance through voice and text interactions.

However, as they scaled, a major challenge emerged—ensuring that the AI truly behaved like a coach, asking insightful questions rather than delivering lectures. How could they achieve LLM observability and optimize AI-generated conversations at scale?

The struggle with LLM observability and optimization

David, the CTO at PHWL.ai, had spent years fine-tuning the app’s backend. Before integrating LangWatch, PHWL.ai relied on manual methods to evaluate AI performance. While they had a structured coaching model, monitoring the AI’s adherence to that model was cumbersome.

“Our biggest issue was tracking when the AI drifted from the coaching framework,” David explained. “We needed a way to measure where it failed and how often it hit the right coaching notes in real conversations.”

The team experimented with LangSmith and other tools, but found its annotation and evaluation processes too rigid for their needs. PHWL.ai required a more flexible and intuitive system - one that allowed both technical and non-technical team members to collaborate seamlessly on improving AI responses.

Finding the perfect fit solution

That’s when PHWL.ai turned to LangWatch for LLM observability and optimization

From the start, LangWatch’s annotation workflow changed everything. For the first time, PHWL.ai’s coaching experts could efficiently triage AI responses, categorize issues, and provide feedback—all in a single, streamlined interface. Instead of navigating complex logs, the team could quickly identify patterns and make data-driven improvements to their AI models.

“What stood out was how LangWatch facilitated real collaboration,” said Malavika, an AI researcher at PHWL.ai. “Our coaching experts but also our founder—who isn’t technical—could now directly review AI outputs, annotate problem areas, and pass them to our team for optimization. This closed the loop between real-world coaching expertise and AI refinement.”

Beyond annotation, LangWatch’s Optimization Studio became a key part of PHWL.ai’s workflow. Unlike their previous setup, which required coding every change manually, LangWatch’s UI-based approach allowed them to experiment with prompts, hyperparameters, and even different LLMs—all without touching production code. However, when deeper customization was required, PHWL.ai still appreciated the flexibility to step in with coding.

“For simpler iterations, LangWatch’s UI saves us a ton of time, literally days” Malavika noted. “But when we need fine-grained control, we still have the ability to integrate our own logic and custom scripts without being locked into a rigid framework.” This balance between ease of use and technical flexibility made LangWatch an ideal solution.

Measurable results

With LangWatch in place, PHWL.ai saw immediate improvements:

  • With LangWatch’s LLM optimization features, AI tuning time decreased by ...%, thanks to faster prompt iteration and testing.

  • Annotation efforts were cut in half, allowing the team to focus on refining coaching strategies rather than manually sifting through chat logs.

  • AI responses became significantly more aligned with coaching best practices, as LLM-as-a-judge evaluations and RAG faithfulness scores helped fine-tune the model’s accuracy.

These improvements meant that users received a more natural and effective coaching experience, strengthening PHWL.ai’s value proposition in a competitive market.

The future of AI Coaching

As PHWL.ai continues to expand, LangWatch remains a critical part of their AI development pipeline. With plans to integrate multiple LLMs and further refine coaching interactions, they see LangWatch as more than just a monitoring tool—it’s a cornerstone of their AI-driven coaching evolution.

“LangWatch didn’t just help us optimize our AI—it fundamentally changed how we work,” David shared. “Now, everyone on our team—from engineers to coaching experts—can contribute to building a better AI coach.”

David Nicol (CTO @ PHWL.ai)

Ready to improve your AI’s performance with LangWatch?

If you’re struggling with finding the right tools for LLM observability, LLM optimization, or AI performance tracking. LangWatch provides the end-to-end observability, evaluation and optimization platform you need. 

Contact us today to see how LangWatch can help your AI deliver more reliable, effective, and human-like experiences.

​​From AI guesswork to a structured LLM Optimization framework

About PHWL: A smarter approach to AI Coaching

At PHWL.ai, the mission has always been clear: make high-quality business coaching accessible to everyone, not just executives. Founded by an experienced business coach with over 20 years of expertise, the company developed an AI-powered coaching app that provides professional guidance through voice and text interactions.

However, as they scaled, a major challenge emerged—ensuring that the AI truly behaved like a coach, asking insightful questions rather than delivering lectures. How could they achieve LLM observability and optimize AI-generated conversations at scale?

The struggle with LLM observability and optimization

David, the CTO at PHWL.ai, had spent years fine-tuning the app’s backend. Before integrating LangWatch, PHWL.ai relied on manual methods to evaluate AI performance. While they had a structured coaching model, monitoring the AI’s adherence to that model was cumbersome.

“Our biggest issue was tracking when the AI drifted from the coaching framework,” David explained. “We needed a way to measure where it failed and how often it hit the right coaching notes in real conversations.”

The team experimented with LangSmith and other tools, but found its annotation and evaluation processes too rigid for their needs. PHWL.ai required a more flexible and intuitive system - one that allowed both technical and non-technical team members to collaborate seamlessly on improving AI responses.

Finding the perfect fit solution

That’s when PHWL.ai turned to LangWatch for LLM observability and optimization

From the start, LangWatch’s annotation workflow changed everything. For the first time, PHWL.ai’s coaching experts could efficiently triage AI responses, categorize issues, and provide feedback—all in a single, streamlined interface. Instead of navigating complex logs, the team could quickly identify patterns and make data-driven improvements to their AI models.

“What stood out was how LangWatch facilitated real collaboration,” said Malavika, an AI researcher at PHWL.ai. “Our coaching experts but also our founder—who isn’t technical—could now directly review AI outputs, annotate problem areas, and pass them to our team for optimization. This closed the loop between real-world coaching expertise and AI refinement.”

Beyond annotation, LangWatch’s Optimization Studio became a key part of PHWL.ai’s workflow. Unlike their previous setup, which required coding every change manually, LangWatch’s UI-based approach allowed them to experiment with prompts, hyperparameters, and even different LLMs—all without touching production code. However, when deeper customization was required, PHWL.ai still appreciated the flexibility to step in with coding.

“For simpler iterations, LangWatch’s UI saves us a ton of time, literally days” Malavika noted. “But when we need fine-grained control, we still have the ability to integrate our own logic and custom scripts without being locked into a rigid framework.” This balance between ease of use and technical flexibility made LangWatch an ideal solution.

Measurable results

With LangWatch in place, PHWL.ai saw immediate improvements:

  • With LangWatch’s LLM optimization features, AI tuning time decreased by ...%, thanks to faster prompt iteration and testing.

  • Annotation efforts were cut in half, allowing the team to focus on refining coaching strategies rather than manually sifting through chat logs.

  • AI responses became significantly more aligned with coaching best practices, as LLM-as-a-judge evaluations and RAG faithfulness scores helped fine-tune the model’s accuracy.

These improvements meant that users received a more natural and effective coaching experience, strengthening PHWL.ai’s value proposition in a competitive market.

The future of AI Coaching

As PHWL.ai continues to expand, LangWatch remains a critical part of their AI development pipeline. With plans to integrate multiple LLMs and further refine coaching interactions, they see LangWatch as more than just a monitoring tool—it’s a cornerstone of their AI-driven coaching evolution.

“LangWatch didn’t just help us optimize our AI—it fundamentally changed how we work,” David shared. “Now, everyone on our team—from engineers to coaching experts—can contribute to building a better AI coach.”

David Nicol (CTO @ PHWL.ai)

Ready to improve your AI’s performance with LangWatch?

If you’re struggling with finding the right tools for LLM observability, LLM optimization, or AI performance tracking. LangWatch provides the end-to-end observability, evaluation and optimization platform you need. 

Contact us today to see how LangWatch can help your AI deliver more reliable, effective, and human-like experiences.