--help, and list/get commands accept --format json. Run langwatch --help to see the full command tree.
Install
npx:
Authenticate
Sign in once and the CLI saves your API key to.env in the current directory:
Self-hosted
If you run a self-hosted LangWatch, point the CLI at your deployment:authorize flow, docs fetcher, and every API call will automatically use that endpoint. Add LANGWATCH_ENDPOINT to your .env so it persists.
Letting an agent do it
If you’re driving the CLI from a coding assistant, just ask the agent to set it up. When the API key is missing, the CLI prints a message pointing at the/authorize URL so the agent can walk the user through it:
Fetch documentation
langwatch docs returns any LangWatch documentation page as plain Markdown — ideal for feeding into an agent’s context before it writes code.
.md extension is appended automatically.
Version prompts
The Prompts CLI turns your prompts into tracked files alongside your code, with lock files, tagging, and sync to the LangWatch platform.Tag versions for deployment
Three built-in tags are available:latest (auto-assigned), production, and staging. Assign a tag to the current version:
langwatch prompt tag create.
For the full Prompts CLI reference, see the Prompt Management CLI guide.
Run scenario tests
Scenarios are the LangWatch equivalent of end-to-end tests for agents: a user simulator chats with your agent, an LLM judge scores the conversation against criteria you define, and everything is recorded for later inspection.Inspect simulation runs
Every scenario execution produces a simulation run you can inspect after the fact — full conversation, judge verdict, reasoning, met/unmet criteria, cost, and duration.get command renders assistant thinking blocks and tool calls as readable plain text — no raw JSON dumps in your terminal. Use --format json on either command for structured output.
For the full scenario testing guide, see the Scenarios documentation.
Inspect traces
Traces capture every LLM call your agent makes — prompts, responses, latency, cost, errors. Search and drill into them from the terminal:langwatch trace search --limit 5 and verify traces are flowing. If nothing appears, the instrumentation is wrong — no need to read logs.
Query analytics
Analytics aggregate your traces into performance and cost metrics without leaving the terminal:Manage platform resources
Every LangWatch resource follows the same consistent subcommand shape:evaluator— create and version evaluators (answer correctness, faithfulness, custom LLM judges)monitor— online evaluations that score production traces automaticallydataset— evaluation datasets (upload CSV, download, manage columns)agent— agent definitions used by scenarios and monitorsdashboardandgraph— custom analytics dashboardstrigger— automations (alerts, webhooks, dataset-append on failure)secret— encrypted environment variables for scheduled agent runsworkflow— reusable workflows built in the UImodel-provider— configure OpenAI, Anthropic, Azure, or Bedrock for your projectannotation— attach labels to traces for supervised fine-tuning data
langwatch <resource> --help on any of these for subcommand-level options, and --format json to get structured output for scripting.
Trigger experiments
Experiments batch-run an agent or prompt against a dataset and produce an evaluation report:Progressive disclosure
The CLI leans heavily on--help. Every subcommand has its own, and the top-level langwatch --help is the best way to discover what’s available:
--help the moment they ship, so you never have to wonder whether a flag exists.
Use the CLI as your agent’s control plane
The CLI was designed so that an AI coding assistant can operate LangWatch end-to-end on your behalf. Skills like Tracing, Evaluations, Scenarios, and Prompt Versioning are built on top of it — the assistant reads docs vialangwatch docs, runs platform operations via the subcommands, and verifies its own work by searching traces and inspecting simulation runs.
If you’re building your own agent workflows, the pattern works the same way: give the assistant the CLI and ask it what you want. It’s a small surface area with a big reach, and everything it does is auditable in the LangWatch app afterwards.