LangWatch CLI - LangWatch

Everything in the LangWatch app is available from the CLI, it’s designed to be driven by a coding assistant. Every subcommand supports --help, and list/get commands accept --format json. Run langwatch --help to see the full command tree.

Install

npm install -g langwatch

Or run any command with npx:

npx langwatch --help

For pnpm, yarn equivalents, self-hosted endpoint pre-config, and enterprise-network notes, see the full install guide.

Authenticate

langwatch login is interactive by default, it asks where (cloud vs self-hosted) and how (AI tools vs project SDK), opens your browser to approve, and the credential flows back to the CLI automatically. No copy-paste of keys:

langwatch login

You’ll see two questions:

Where do you want to log in?: LangWatch Cloud (app.langwatch.ai) or a self-hosted instance (custom URL).
How do you want to use it?: three options:
- AI tools, agentic flows: claude, codex, cursor, gemini, opencode. Mints an OAuth-style device session in ~/.langwatch/config.json (user-scoped) so langwatch claude etc. wrap any tool through your gateway.
- Project, SDK API key: for langwatch sync, langwatch eval, and SDK auto-instrumentation. Mints a fresh project API key into $CWD/.env (project-scoped).
- Both: runs both flows in sequence.

The browser shows an “Approve” or “Generate API key” button (depending on the chosen mode); after you click, the credential is delivered back to the CLI over the same RFC 8628 device-code poll. You never copy-paste a key.

Storage discipline (where credentials land)

Scope	Path	What lives there
Project	`$CWD/.env`	`LANGWATCH_API_KEY`, used by SDK consumers + `langwatch sync`, `eval`, `prompt`
User-global	`~/.langwatch/config.json` (mode `0600`)	Device session (`access_token` + `refresh_token`), control-plane URL, gateway URL, default org, used by `langwatch claude/codex/cursor/gemini/opencode`, `langwatch whoami`, `langwatch request-increase`

The two stores serve different audiences and never leak into each other. Logging out of one (langwatch logout-device clears ~/.langwatch/config.json) doesn’t touch the other.

Self-hosted

The CLI picks up your self-hosted endpoint from any of these (highest priority first):

Priority	Source	Use case
1	`--endpoint <url>` flag on `langwatch login`	one-shot login at a different host
2	`LANGWATCH_ENDPOINT` env var	CI, scripts
3	`~/.langwatch/config.json:control_plane_url`	persisted from prior login (daily driver)
4	`https://app.langwatch.ai`	built-in cloud default

The simplest path for self-hosted users is the interactive prompt, pick “Self-hosted instance”, enter your URL, and the CLI saves it for future invocations:

langwatch login
# → Where do you want to log in?  ▸ Self-hosted instance
# → URL: https://langwatch.acme.internal

Or, equivalently, one-shot with the flag:

langwatch login --endpoint https://langwatch.acme.internal

To inspect or change persisted settings without re-logging in:

langwatch config list                                          # show resolved values + sources
langwatch config get endpoint
langwatch config set endpoint https://langwatch.acme.internal  # writes ~/.langwatch/config.json

Non-interactive escape hatches (for CI, agents)

When you’re driving the CLI from automation and already have a credential, skip the prompts:

Flag	Use case	Where it lands
`langwatch login --device`	AI tools mode, skip Q1/Q2 prompts	`~/.langwatch/config.json`
`langwatch login --api-key <KEY>`	CI pipeline that has `LANGWATCH_API_KEY` injected from secrets	`$CWD/.env`
`langwatch login --token <TOKEN>`	Agent harness with a pre-minted device session token (issued via dashboard “Personal Access Tokens”)	`~/.langwatch/config.json`
`langwatch login --endpoint <URL>`	combine with any of the above to pin a self-hosted instance	persisted to config

The interactive langwatch login always shows these flags in a banner above the prompts, so a fake-TTY agent (Claude Code, certain Gemini CLI sandboxes) can detect the prompt and re-invoke with the right flag instead of getting stuck. When stdin is not a TTY (genuine CI), langwatch login with no flags errors out with the same flag list, explicit > implicit, no surprise API-key fall-throughs.

Letting an agent do it

A coding assistant driving langwatch will see the always-on banner naming --device, --api-key, --token, --endpoint whenever the interactive prompt fires. If the assistant’s harness reports as a TTY but can’t actually answer prompts, the banner gives it everything it needs to re-invoke:

langwatch login --device                # AI tools mode (recommended for assistants)
langwatch login --api-key sk-lw-...     # if a project key was injected from secrets
langwatch login --token lwc_...         # if a device session token was pre-minted
langwatch login --endpoint https://...  # combine with any of the above for self-hosted

Fetch documentation

langwatch docs returns any LangWatch documentation page as plain Markdown, ideal for feeding into an agent’s context before it writes code.

langwatch docs                                # Documentation index
langwatch docs integration/python/guide       # Python integration guide
langwatch docs integration/typescript/guide   # TypeScript integration guide
langwatch docs prompt-management/cli          # Prompts CLI reference
langwatch docs integration/python/langgraph   # Framework-specific guides

Scenario framework docs live under a separate namespace:

langwatch scenario-docs                       # Scenario index
langwatch scenario-docs advanced/red-teaming

Both commands accept full URLs too, and any missing .md extension is appended automatically.

If you’re inside an assistant with no shell (for example, a chat-only environment), the same content is available over plain HTTP, append .md to any documentation path, e.g. https://langwatch.ai/docs/integration/python/guide.md. Indexes: docs, scenarios.

Version prompts

The Prompts CLI turns your prompts into tracked files alongside your code, with lock files, tagging, and sync to the LangWatch platform.

langwatch prompt init             # scaffold prompts.json + prompts/ folder
langwatch prompt create my-agent  # create a new local prompt
langwatch prompt sync             # push local changes, pull remote updates
langwatch prompt list             # see all prompts in the project

In your application code, fetch the latest version at runtime:

import langwatch
prompt = langwatch.prompts.get("my-agent")

Tag versions for deployment

Three built-in tags are available: latest (auto-assigned), production, and staging. Assign a tag to the current version:

langwatch prompt tag assign my-agent production

Then fetch by tag at runtime:

prompt = langwatch.prompts.get("my-agent", tag="production")

For canary or blue/green deployments, create custom tags with langwatch prompt tag create. For the full Prompts CLI reference, see the Prompt Management CLI guide.

Run scenario tests

Scenarios are the LangWatch equivalent of end-to-end tests for agents: a user simulator chats with your agent, an LLM judge scores the conversation against criteria you define, and everything is recorded for later inspection.

langwatch scenario list                       # see all scenarios
langwatch scenario create "refund flow" \
  --description "User asks for a refund on a recent order" \
  --criteria "Agent verifies identity" \
  --criteria "Agent processes the refund"
langwatch scenario run <scenario-id> --target <prompt-or-agent>

Group related scenarios into a suite (a run plan) for CI or scheduled runs:

langwatch suite list
langwatch suite run <suite-id>                # kicks off every scenario × target

Inspect simulation runs

Every scenario execution produces a simulation run you can inspect after the fact, full conversation, judge verdict, reasoning, met/unmet criteria, cost, and duration.

langwatch simulation-run list                         # recent runs, with relative timestamps
langwatch simulation-run list --status FAILED        # only failures
langwatch simulation-run list --name "refund flow"   # filter by name substring
langwatch simulation-run get <run-id>                 # full details for one run
langwatch simulation-run get <run-id> --full         # don't truncate long messages

The get command renders assistant thinking blocks and tool calls as readable plain text, no raw JSON dumps in your terminal. Use --format json on either command for structured output. For the full scenario testing guide, see the Scenarios documentation.

Inspect traces

Traces capture every LLM call your agent makes, prompts, responses, latency, cost, errors. Search and drill into them from the terminal:

langwatch trace search --limit 10              # most recent traces
langwatch trace search --query "error"         # full-text search
langwatch trace get <trace-id>                 # one trace in detail
langwatch trace export --output traces.jsonl   # stream traces for offline analysis

Agents use this to debug their own instrumentation: after running your code once, ask the assistant to run langwatch trace search --limit 5 and verify traces are flowing. If nothing appears, the instrumentation is wrong, no need to read logs.

Query analytics

Analytics aggregate your traces into performance and cost metrics without leaving the terminal:

langwatch analytics --help          # available metrics and dimensions

Use it to answer questions like “what’s my P95 latency this week”, “how much did each agent cost last month”, or “which prompts produce the most errors”. The underlying data is the same as the LangWatch dashboard.

Manage platform resources

Every LangWatch resource follows the same consistent subcommand shape:

langwatch <resource> list
langwatch <resource> get <id>
langwatch <resource> create <name> [options]
langwatch <resource> update <id> [options]
langwatch <resource> delete <id>

Available resources include:

evaluator, create and version evaluators (answer correctness, faithfulness, custom LLM judges)
monitor, online evaluations that score production traces automatically
dataset, evaluation datasets (upload CSV, download, manage columns)
agent, agent definitions used by scenarios and monitors
dashboard and graph, custom analytics dashboards
trigger, automations (alerts, webhooks, dataset-append on failure)
secret, encrypted environment variables for scheduled agent runs
workflow, reusable workflows built in the UI
model-provider, configure OpenAI, Anthropic, Azure, or Bedrock for your project
annotation, attach labels to traces for supervised fine-tuning data

Run langwatch <resource> --help on any of these for subcommand-level options, and --format json to get structured output for scripting.

Organization management

These commands manage org-wide resources and require an API key with organization-level permissions.

Projects

# List all projects in your organization
langwatch projects list
langwatch projects list --format json

# Create a project (returns a one-time service API key)
langwatch projects create \
  --name "My Agent Project" \
  --language python \
  --framework langchain \
  --new-team-name "Engineering"

# Get project details
langwatch projects get proj_01HZX9K3MN...

# Update project metadata
langwatch projects update proj_01HZX9K3MN... --name "Renamed"

# Archive a project (soft-delete)
langwatch projects delete proj_01HZX9K3MN...

When creating a project, the CLI displays the service API key exactly once. Save it immediately — it cannot be retrieved later.

API keys

# List all API keys
langwatch api-keys list
langwatch api-keys list --format json

# Create a service key (org-wide)
langwatch api-keys create --name "CI Pipeline"

# Create a service key scoped to specific projects
langwatch api-keys create \
  --name "Staging Key" \
  --project-id proj_01HZX... \
  --project-id proj_02ABC...

# Revoke a key (cannot be undone)
langwatch api-keys revoke key_01HZX...

Trigger experiments

Experiments batch-run an agent or prompt against a dataset and produce an evaluation report:

langwatch evaluation --help

Typically you script this in CI: check out the branch, run the experiment, fail the build if the pass rate drops below your threshold. The CLI emits machine-readable results so this plumbing is straightforward.

Progressive disclosure

The CLI leans heavily on --help. Every subcommand has its own, and the top-level langwatch --help is the best way to discover what’s available:

langwatch --help
langwatch prompt --help
langwatch prompt tag --help
langwatch prompt tag create --help

This keeps the CLI honest, new capabilities show up in --help the moment they ship, so you never have to wonder whether a flag exists.

AI Gateway commands

The CLI also provisions AI Gateway resources, virtual keys, budgets, provider bindings, cache rules, without touching the UI. Behaviour matches the dashboard exactly; the CLI and UI share a server-side service layer.

Virtual keys

# List
langwatch virtual-keys list
langwatch virtual-keys list --format json

# Create (returns the secret once; store it immediately)
langwatch virtual-keys create \
  --name prod-key \
  --env live \
  --provider gpc_01HZX... \
  --provider gpc_01HZY...

# Get details
langwatch virtual-keys get vk_01HZX9K3MN...

# Rotate (new secret shown once; old stops working immediately, ~60s to propagate)
langwatch virtual-keys rotate vk_01HZX9K3MN...

# Revoke (hard, no grace)
langwatch virtual-keys revoke vk_01HZX9K3MN...

--provider takes a provider-credential id (from langwatch gateway-providers list), not a plain provider name. Pass the flag multiple times to bind more than one provider.

Gateway budgets

langwatch gateway-budgets list

langwatch gateway-budgets create \
  --name eng-team-monthly \
  --scope team --team team_01HZ... \
  --window month --limit 5000 --on-breach warn

# Other scopes: --scope organization|project|virtual-key|principal
langwatch gateway-budgets archive bgt_01HZ...

Provider bindings

langwatch gateway-providers list

langwatch gateway-providers create \
  --model-provider mp_01HZ... \
  --slot primary \
  --rate-limit-rpm 10000 \
  --rate-limit-tpm 1000000 \
  --rotation-policy manual

langwatch gateway-providers disable gpc_01HZ...

Returns the gpc_* id you then pass to --provider on VK create. --slot is free-text (primary, eu-region, canary). --rotation-policy accepts manual in v1; auto and external_secret_store are v1.1.

Cache rules

# List (sorted priority DESC)
langwatch cache-rules list

# Create (matchers are ANDed; at least one required)
langwatch cache-rules create \
  --name force-cache-enterprise \
  --priority 300 \
  --mode force --ttl 600 \
  --match-tag tier=enterprise

# Update / disable / archive
langwatch cache-rules update <rule_id> --priority 400
langwatch cache-rules update <rule_id> --disable
langwatch cache-rules archive <rule_id>

# Bulk git-ops pattern: export → commit → apply
langwatch cache-rules export --file cache-rules.json
langwatch cache-rules apply --file cache-rules.json

Matcher flags: --match-vk, --match-vk-prefix, --match-tag key=value, --match-principal, --match-model <name-or-*-glob>, --match-metadata key=value. --mode is respect | force | disable. Required token permissions map onto the RBAC grants:

Command group	Permission
`projects list	get`	`project:view`
`projects create`	`project:create`
`projects update`	`project:update`
`projects delete`	`project:delete`
`api-keys list`	`organization:view`
`api-keys create	revoke`	`organization:manage`
`virtual-keys list	get`	`virtualKeys:view`
`virtual-keys create`	`virtualKeys:create`
`virtual-keys rotate`	`virtualKeys:rotate`
`virtual-keys revoke`	`virtualKeys:delete`
`gateway-budgets *`	`gatewayBudgets:*`
`gateway-providers *`	`gatewayProviders:*`
`cache-rules *`	`gatewayCacheRules:*`

See the public REST API reference for direct HTTP calls that don’t require Node.

Use the CLI as your agent’s control plane

The CLI was designed so that an AI coding assistant can operate LangWatch end-to-end on your behalf. Skills like Tracing, Evaluations, Scenarios, and Prompt Versioning are built on top of it, the assistant reads docs via langwatch docs, runs platform operations via the subcommands, and verifies its own work by searching traces and inspecting simulation runs. If you’re building your own agent workflows, the pattern works the same way: give the assistant the CLI and ask it what you want. It’s a small surface area with a big reach, and everything it does is auditable in the LangWatch app afterwards.

​Install

​Authenticate

​Storage discipline (where credentials land)

​Self-hosted

​Non-interactive escape hatches (for CI, agents)

​Letting an agent do it

​Fetch documentation

​Version prompts

​Tag versions for deployment

​Run scenario tests

​Inspect simulation runs

​Inspect traces

​Query analytics

​Manage platform resources

​Organization management

​Projects

​API keys

​Trigger experiments

​Progressive disclosure

​AI Gateway commands

​Virtual keys

​Gateway budgets

​Provider bindings

​Cache rules

​Use the CLI as your agent’s control plane

Install

Authenticate

Storage discipline (where credentials land)

Self-hosted

Non-interactive escape hatches (for CI, agents)

Letting an agent do it

Fetch documentation

Version prompts

Tag versions for deployment

Run scenario tests

Inspect simulation runs

Inspect traces

Query analytics

Manage platform resources

Organization management

Projects

API keys

Trigger experiments

Progressive disclosure

AI Gateway commands

Virtual keys

Gateway budgets

Provider bindings

Cache rules

Use the CLI as your agent’s control plane