- Platform Experiments - Configure the experiment in LangWatch, then trigger it from CI/CD with a single line
- Experiments via SDK - Define the entire experiment in code and run it in CI/CD
| Approach | Best For |
|---|---|
| Platform Experiments | Non-technical team members can modify experiments; configuration lives in LangWatch |
| Experiments via SDK | Version control your experiment config; full flexibility in code |
Option 1: Platform Experiments
Configure your experiment once in the LangWatch Experiments via UI, then trigger it from CI/CD.Setup
-
Create your experiment in the Experiments via UI
- Add your dataset
- Configure targets (prompts, models, or API endpoints)
- Select evaluators
- Run it once to verify it works
-
Get your experiment slug from the URL:
Or click the CI/CD button in the experiment toolbar.
- Run from CI/CD:
- Python
- TypeScript
GitHub Actions Example
Options
Option 2: Experiments via SDK
Define your entire experiment in code. This gives you full control and version control over your experiment configuration.Basic Example
- Python
- TypeScript
GitHub Actions Example
scripts/run_evaluation.py contains your full experiment code.
Comparing Multiple Configurations
SDK experiments shine when comparing different configurations:Results Summary
Both approaches output a CI-friendly summary:print_summary() method:
- Outputs results in a structured format
- Returns exit code 1 if any evaluations failed (unless
exit_on_failure=False) - Provides a link to view detailed results in LangWatch
CI Platform Examples
GitLab CI
- Platform Experiment
- via SDK
CircleCI
- Platform Experiment
- via SDK
Error Handling
- Python
- TypeScript