> ## Documentation Index > Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # Custom Scoring > Send evaluation scores from your own custom logic to LangWatch for tracking and analysis. Custom scoring lets you send evaluation results from your own code to LangWatch. This is useful when you have proprietary evaluation logic, domain-specific metrics, or want to integrate existing evaluation systems. **When to use Custom Scoring:** * You have your own evaluation logic (deterministic or ML-based) * You're integrating an existing evaluation system * You need domain-specific metrics that aren't covered by built-in evaluators * You want to track any custom metric alongside your traces **See also:** * [Built-in Evaluators](/docs/evaluations/evaluators/built-in-evaluators) - Use LangWatch's ready-made evaluators * [Saved Evaluators](/docs/evaluations/evaluators/saved-evaluators) - Reuse configured evaluators across your project ## How It Works With custom scoring, you: 1. Run your own evaluation logic 2. Send the results (score, passed, label, details) to LangWatch 3. View results in traces, analytics, and dashboards ``` Your Code → Your Evaluation Logic → Score/Pass/Fail → LangWatch ↓ Traces, Analytics, Alerts ``` ## Sending Custom Scores ### On a Trace/Span Attach evaluation results to the current trace or span: ```python theme={null} import langwatch @langwatch.span() def my_llm_step(user_input: str): output = my_llm(user_input) # Run your custom evaluation score = my_custom_evaluator(user_input, output) is_valid = score > 0.7 # Send results to LangWatch langwatch.get_current_span().add_evaluation( name="my_custom_metric", passed=is_valid, score=score, details="Custom evaluation based on domain rules" ) return output ``` ```typescript theme={null} import { LangWatch } from "langwatch"; const langwatch = new LangWatch(); async function myLLMStep(userInput: string): Promise { return await langwatch.trace({ name: "my-trace" }, async (span) => { const output = await myLLM(userInput); // Run your custom evaluation const score = myCustomEvaluator(userInput, output); const isValid = score > 0.7; // Send results to LangWatch span.addEvaluation({ name: "my_custom_metric", passed: isValid, score: score, details: "Custom evaluation based on domain rules" }); return output; }); } ``` Send evaluation results directly via the collector API: ```bash theme={null} curl -X POST "https://app.langwatch.ai/api/collector" \ -H "X-Auth-Token: $LANGWATCH_API_KEY" \ -H "Content-Type: application/json" \ -d @- < ### In Experiments Log custom scores during batch evaluation: ```python Python theme={null} import langwatch experiment = langwatch.experiment.init("my-experiment") for index, row in experiment.loop(df.iterrows()): output = my_llm(row["input"]) # Run your custom evaluation score = my_custom_evaluator(row["input"], output, row["expected"]) # Log the custom score experiment.log( name="my_custom_metric", index=index, data={"input": row["input"], "output": output}, score=score, passed=score > 0.7, details="Custom domain-specific evaluation" ) ``` ```typescript TypeScript theme={null} import { LangWatch } from "langwatch"; const langwatch = new LangWatch(); const experiment = await langwatch.experiments.init("my-experiment"); await experiment.run( dataset.entries.map((e) => e.entry), async ({ item, index }) => { const output = await myLLM(item.input); // Run your custom evaluation const score = myCustomEvaluator(item.input, output, item.expected); // Log the custom score experiment.log({ name: "my_custom_metric", index, data: { input: item.input, output }, score, passed: score > 0.7, details: "Custom domain-specific evaluation" }); } ); ``` ## Evaluation Result Fields | Field | Type | Required | Description | | --------- | ------- | -------- | --------------------------------------------- | | `name` | string | Yes | Identifier for this evaluation (shows in UI) | | `passed` | boolean | No | Whether the evaluation passed | | `score` | number | No | Numeric score (typically 0-1) | | `label` | string | No | Category label (e.g., "positive", "negative") | | `details` | string | No | Human-readable explanation | At least one of `passed`, `score`, or `label` should be provided for meaningful results. ## Example Use Cases ### Code Quality Check ```python theme={null} def check_code_quality(generated_code: str) -> dict: # Your custom logic has_syntax_errors = check_syntax(generated_code) follows_style = check_style_guide(generated_code) score = 0.0 if not has_syntax_errors: score += 0.5 if follows_style: score += 0.5 return { "passed": score >= 0.5, "score": score, "details": f"Syntax OK: {not has_syntax_errors}, Style OK: {follows_style}" } # Use in your pipeline result = check_code_quality(llm_output) langwatch.get_current_span().add_evaluation( name="code_quality", **result ) ``` ### Semantic Similarity ```python theme={null} from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') def semantic_similarity(output: str, expected: str) -> float: embeddings = model.encode([output, expected]) similarity = cosine_similarity([embeddings[0]], [embeddings[1]])[0][0] return float(similarity) # Use in experiment score = semantic_similarity(output, row["expected"]) experiment.log( name="semantic_similarity", index=index, data={"output": output, "expected": row["expected"]}, score=score, passed=score > 0.8 ) ``` ### Business Rule Validation ```python theme={null} def validate_response(response: str, context: dict) -> dict: issues = [] # Check for required elements if context.get("require_disclaimer") and "disclaimer" not in response.lower(): issues.append("Missing required disclaimer") # Check length constraints if len(response) > context.get("max_length", 1000): issues.append("Response too long") # Check for prohibited content for word in context.get("prohibited_words", []): if word.lower() in response.lower(): issues.append(f"Contains prohibited word: {word}") return { "passed": len(issues) == 0, "score": 1.0 - (len(issues) * 0.2), "details": "; ".join(issues) if issues else "All checks passed" } ``` ## Combining with Built-in Evaluators You can use custom scoring alongside built-in evaluators: ```python theme={null} @langwatch.span() def my_llm_step(user_input: str): output = my_llm(user_input) # Built-in evaluator langwatch.evaluation.evaluate( "presidio/pii_detection", name="PII Check", data={"output": output}, ) # Custom evaluation business_score = my_business_rules_check(output) langwatch.get_current_span().add_evaluation( name="business_rules", passed=business_score > 0.8, score=business_score, ) return output ``` ## Viewing Custom Scores Custom scores appear in: * **Trace Details** - Under the Evaluations section * **Analytics Dashboard** - Filterable by evaluation name * **Experiments** - In the results table alongside other evaluators ## Next Steps