--help, and list/get commands accept --format json. Run langwatch --help to see the full command tree.
Install
npx:
Authenticate
langwatch login is interactive by default, it asks where (cloud vs self-hosted) and how (AI tools vs project SDK), opens your browser to approve, and the credential flows back to the CLI automatically. No copy-paste of keys:
- Where do you want to log in?: LangWatch Cloud (
app.langwatch.ai) or a self-hosted instance (custom URL). - How do you want to use it?: three options:
- AI tools, agentic flows:
claude,codex,cursor,gemini,opencode. Mints an OAuth-style device session in~/.langwatch/config.json(user-scoped) solangwatch claudeetc. wrap any tool through your gateway. - Project, SDK API key: for
langwatch sync,langwatch eval, and SDK auto-instrumentation. Mints a fresh project API key into$CWD/.env(project-scoped). - Both: runs both flows in sequence.
- AI tools, agentic flows:
Storage discipline (where credentials land)
| Scope | Path | What lives there |
|---|---|---|
| Project | $CWD/.env | LANGWATCH_API_KEY, used by SDK consumers + langwatch sync, eval, prompt |
| User-global | ~/.langwatch/config.json (mode 0600) | Device session (access_token + refresh_token), control-plane URL, gateway URL, default org, used by langwatch claude/codex/cursor/gemini/opencode, langwatch whoami, langwatch request-increase |
langwatch logout-device clears ~/.langwatch/config.json) doesn’t touch the other.
Self-hosted
The CLI picks up your self-hosted endpoint from any of these (highest priority first):| Priority | Source | Use case |
|---|---|---|
| 1 | --endpoint <url> flag on langwatch login | one-shot login at a different host |
| 2 | LANGWATCH_ENDPOINT env var | CI, scripts |
| 3 | ~/.langwatch/config.json:control_plane_url | persisted from prior login (daily driver) |
| 4 | https://app.langwatch.ai | built-in cloud default |
Non-interactive escape hatches (for CI, agents)
When you’re driving the CLI from automation and already have a credential, skip the prompts:| Flag | Use case | Where it lands |
|---|---|---|
langwatch login --device | AI tools mode, skip Q1/Q2 prompts | ~/.langwatch/config.json |
langwatch login --api-key <KEY> | CI pipeline that has LANGWATCH_API_KEY injected from secrets | $CWD/.env |
langwatch login --token <TOKEN> | Agent harness with a pre-minted device session token (issued via dashboard “Personal Access Tokens”) | ~/.langwatch/config.json |
langwatch login --endpoint <URL> | combine with any of the above to pin a self-hosted instance | persisted to config |
langwatch login always shows these flags in a banner above the prompts, so a fake-TTY agent (Claude Code, certain Gemini CLI sandboxes) can detect the prompt and re-invoke with the right flag instead of getting stuck.
When stdin is not a TTY (genuine CI), langwatch login with no flags errors out with the same flag list, explicit > implicit, no surprise API-key fall-throughs.
Letting an agent do it
A coding assistant drivinglangwatch will see the always-on banner naming --device, --api-key, --token, --endpoint whenever the interactive prompt fires. If the assistant’s harness reports as a TTY but can’t actually answer prompts, the banner gives it everything it needs to re-invoke:
Fetch documentation
langwatch docs returns any LangWatch documentation page as plain Markdown, ideal for feeding into an agent’s context before it writes code.
.md extension is appended automatically.
Version prompts
The Prompts CLI turns your prompts into tracked files alongside your code, with lock files, tagging, and sync to the LangWatch platform.Tag versions for deployment
Three built-in tags are available:latest (auto-assigned), production, and staging. Assign a tag to the current version:
langwatch prompt tag create.
For the full Prompts CLI reference, see the Prompt Management CLI guide.
Run scenario tests
Scenarios are the LangWatch equivalent of end-to-end tests for agents: a user simulator chats with your agent, an LLM judge scores the conversation against criteria you define, and everything is recorded for later inspection.Inspect simulation runs
Every scenario execution produces a simulation run you can inspect after the fact, full conversation, judge verdict, reasoning, met/unmet criteria, cost, and duration.get command renders assistant thinking blocks and tool calls as readable plain text, no raw JSON dumps in your terminal. Use --format json on either command for structured output.
For the full scenario testing guide, see the Scenarios documentation.
Inspect traces
Traces capture every LLM call your agent makes, prompts, responses, latency, cost, errors. Search and drill into them from the terminal:langwatch trace search --limit 5 and verify traces are flowing. If nothing appears, the instrumentation is wrong, no need to read logs.
Query analytics
Analytics aggregate your traces into performance and cost metrics without leaving the terminal:Manage platform resources
Every LangWatch resource follows the same consistent subcommand shape:evaluator, create and version evaluators (answer correctness, faithfulness, custom LLM judges)monitor, online evaluations that score production traces automaticallydataset, evaluation datasets (upload CSV, download, manage columns)agent, agent definitions used by scenarios and monitorsdashboardandgraph, custom analytics dashboardstrigger, automations (alerts, webhooks, dataset-append on failure)secret, encrypted environment variables for scheduled agent runsworkflow, reusable workflows built in the UImodel-provider, configure OpenAI, Anthropic, Azure, or Bedrock for your projectannotation, attach labels to traces for supervised fine-tuning data
langwatch <resource> --help on any of these for subcommand-level options, and --format json to get structured output for scripting.
Organization management
These commands manage org-wide resources and require an API key with organization-level permissions.Projects
API keys
Trigger experiments
Experiments batch-run an agent or prompt against a dataset and produce an evaluation report:Progressive disclosure
The CLI leans heavily on--help. Every subcommand has its own, and the top-level langwatch --help is the best way to discover what’s available:
--help the moment they ship, so you never have to wonder whether a flag exists.
AI Gateway commands
The CLI also provisions AI Gateway resources, virtual keys, budgets, provider bindings, cache rules, without touching the UI. Behaviour matches the dashboard exactly; the CLI and UI share a server-side service layer.Virtual keys
--provider takes a provider-credential id (from langwatch gateway-providers list), not a plain provider name. Pass the flag multiple times to bind more than one provider.
Gateway budgets
--window accepts minute|hour|day|week|month|total. --on-breach is block (default) or warn. --limit is USD. List output colourises spent-vs-limit, red at ≥100%, yellow at ≥80%.
Provider bindings
gpc_* id you then pass to --provider on VK create. --slot is free-text (primary, eu-region, canary). --rotation-policy accepts manual in v1; auto and external_secret_store are v1.1.
Cache rules
--match-vk, --match-vk-prefix, --match-tag key=value, --match-principal, --match-model <name-or-*-glob>, --match-metadata key=value. --mode is respect | force | disable.
Required token permissions map onto the RBAC grants:
| Command group | Permission | |
|---|---|---|
| `projects list | get` | project:view |
projects create | project:create | |
projects update | project:update | |
projects delete | project:delete | |
api-keys list | organization:view | |
| `api-keys create | revoke` | organization:manage |
| `virtual-keys list | get` | virtualKeys:view |
virtual-keys create | virtualKeys:create | |
virtual-keys rotate | virtualKeys:rotate | |
virtual-keys revoke | virtualKeys:delete | |
gateway-budgets * | gatewayBudgets:* | |
gateway-providers * | gatewayProviders:* | |
cache-rules * | gatewayCacheRules:* |
Use the CLI as your agent’s control plane
The CLI was designed so that an AI coding assistant can operate LangWatch end-to-end on your behalf. Skills like Tracing, Evaluations, Scenarios, and Prompt Versioning are built on top of it, the assistant reads docs vialangwatch docs, runs platform operations via the subcommands, and verifies its own work by searching traces and inspecting simulation runs.
If you’re building your own agent workflows, the pattern works the same way: give the assistant the CLI and ask it what you want. It’s a small surface area with a big reach, and everything it does is auditable in the LangWatch app afterwards.