How to evaluate an LLM when you don't have defined answers

For some AI applications, it’s not really possible to define a golden answer, this happens for example in creative tasks, where it’s hard to define a single correct answer. On the video below, we show how to use the LangWatch Experiments via UI to evaluate a Business Coaching Agent, where we don’t have defined answers, but we can use an LLM-as-a-judge to evaluate the quality of the answers: