Ship reliable, testable agents – not guesses. Better Agents adds simulations, evaluations, and standards on top of any framework. Explore Better Agents
for index, row in experiment.loop(df.iterrows()): # your execution code here experiment.evaluate( "langevals/query_resolution", index=index, data={ "conversation": row["conversation"], }, settings={} )
This evaluator checks if all the user queries in the conversation were resolved. Useful to detect when the bot doesn’t know how to answer or can’t help the user.
for index, row in experiment.loop(df.iterrows()): # your execution code here experiment.evaluate( "langevals/query_resolution", index=index, data={ "conversation": row["conversation"], }, settings={} )