LLM-as-a-Judge Score Evaluator

import langwatch

df = langwatch.datasets.get_dataset("dataset-id").to_pandas()

experiment = langwatch.experiment.init("my-experiment")

for index, row in experiment.loop(df.iterrows()):
    # your execution code here
    experiment.evaluate(
        "langevals/llm_score",
        index=index,
        data={
            "input": row["input"],
            "output": output,
            "contexts": row["contexts"],
        },
        settings={}
    )

[
  {
    "status": "processed",
    "score": 123,
    "passed": true,
    "label": "<string>",
    "details": "<string>",
    "cost": {
      "currency": "<string>",
      "amount": 123
    }
  }
]

POST

langevals

llm_score

evaluate

import langwatch

df = langwatch.datasets.get_dataset("dataset-id").to_pandas()

experiment = langwatch.experiment.init("my-experiment")

for index, row in experiment.loop(df.iterrows()):
    # your execution code here
    experiment.evaluate(
        "langevals/llm_score",
        index=index,
        data={
            "input": row["input"],
            "output": output,
            "contexts": row["contexts"],
        },
        settings={}
    )

[
  {
    "status": "processed",
    "score": 123,
    "passed": true,
    "label": "<string>",
    "details": "<string>",
    "cost": {
      "currency": "<string>",
      "amount": 123
    }
  }
]

Authorizations

X-Auth-Token

string

header

required

API key for authentication

Body

application/json

data

object

required

Show child attributes

trace_id

string

Optional trace ID to associate this evaluation with a trace

settings

object

Show child attributes

Response

Successful evaluation

status

enum<string>

Available options:

processed,

skipped,

error

score

number

Numeric score from the evaluation

passed

boolean

Whether the evaluation passed

label

string

Label assigned by the evaluation

details

string

Additional details about the evaluation

cost

object

Show child attributes

LLM-as-a-Judge Category Evaluator Rubrics Based Scoring

Traces

Prompts

Annotations

Datasets

Automations

Scenarios

Evaluators

Saved Evaluators

LLM-as-a-Judge Score Evaluator

Authorizations

Body

Response

Traces

Prompts

Annotations

Datasets

Automations

Scenarios

Evaluators

Saved Evaluators

Documentation Index

Authorizations

Body

Response