Credential-leak incident response

A VK ends up in a public repo, an ex-employee’s laptop is still syncing, a CI secret got mirrored to a vendor log. You need to stop the bleeding fast, without breaking every legitimate caller at the same time. This cookbook walks the end-to-end path using the gateway surfaces that exist specifically for this scenario. Estimated time: 5 min for rotation-class incidents, 15 min for revoke-class. Both scale independently of how many clients use the leaked secret.

Decision tree

Has the secret been EXPOSED in a public channel (GitHub, stacktrace, vendor log)?
├── Yes → revoke. No grace window. See "Rotation vs revoke" below.
└── No, but it's out of a trusted boundary (ex-employee laptop, lost device)?
    ├── Yes, and clients can update within 24h → rotate. 24h grace window.
    └── Yes, and clients CANNOT update within 24h → rotate now, coordinate rollover, consider dual-key pattern (out of scope for v1).

Rotation vs revoke — the enforcement-shape difference

Operation	Old secret behaviour	Use when
Rotate	Keeps working for 24h (grace window), then hard-fails with 401.	Secret left a trusted boundary but is not publicly exposed. Gives clients a window to update configs.
Revoke	Immediately fails with 401. No grace period.	Secret is publicly exposed, or you need to terminate a specific caller’s access right now.

Both operations keep the VK’s trace history intact for audit and post-incident analysis.

Step 1 — find the right VK

If you know the secret, look up the VK by its prefix:

# The first 25 chars of the secret are its prefix — safe to log/grep
PREFIX="lw_vk_live_01HZX9..."

# Find the VK via CLI
langwatch virtual-keys list --prefix "$PREFIX"

Or from the UI: AI Gateway → Virtual Keys and filter by prefix. Clicking the row opens the VK detail page.

Step 2 — pre-rotation forensics (before you touch the secret)

Before you rotate, capture the current state. Rotation creates an audit row, but deeper questions (“how many callers? from where? what did they spend?”) are better answered BEFORE you change the secret.

Open the VK detail page.
Click Audit history in the header. This is a deep-link into /settings/audit-log?targetKind=virtual_key&targetId=<vk_id> pre-filtered to this VK only. You’ll see every CREATE / UPDATE / ROTATE event with actor + timestamp + changed-fields diff summary, alongside any related platform actions on the same resource.
Scroll down to the Usage (last 30 days) section — this is the per-VK spend sparkline. Any dramatic spike or off-hours usage is worth flagging now; after rotation the old secret’s tail-usage shows up in audit as an implicit “used during grace window” signal.

Export the current spend detail for the incident ticket:

curl -sS "https://app.langwatch.ai/api/gateway/v1/virtual-keys/$VK_ID/usage?since=-30d" \
  -H "Authorization: Bearer $LANGWATCH_API_KEY" \
  > incident-$VK_ID-pre-rotation.json

Why this matters: if the leak turns out to be bigger than you thought (e.g. the secret was in a compromised CI that’s been drained overnight), the pre-rotation snapshot is your only record of what the attacker’s pattern looked like before you tipped them off.

Step 3 — rotate

From the UI

VK detail page → Rotate.
Confirm: “A fresh secret will be minted and shown once. The current secret keeps working for 24h (grace window) so clients can roll over.”
The new secret is displayed once. Save it to your secret store before checking I’ve saved the secret in a safe place → Close.

From the CLI

# Mints a new secret + returns it; old continues working for 24h
langwatch virtual-keys rotate "$VK_ID"
# Pipe directly into your secret store if you want to avoid clipboard/terminal exposure:
langwatch virtual-keys rotate "$VK_ID" --format raw | \
  vault kv put -mount=secret/gateway "$VK_ID" secret=-

From the REST API

curl -sS -X POST "https://app.langwatch.ai/api/gateway/v1/virtual-keys/$VK_ID/rotate" \
  -H "Authorization: Bearer $LANGWATCH_API_KEY"
# Response: { "vk_id": "...", "secret": "lw_vk_live_...", "rotated_at": "..." }

The response is the only time the new secret is in plaintext. LangWatch stores only a hash. If you lose the response, you must rotate again.

Step 4 — propagate to clients within 24h

Grace-window math: rotated_at + 24h = hard-cut-off. The previous secret remains resolvable on the gateway’s auth hot-path for that window (see the resolver’s previousHashedSecret + previousSecretValidUntil fields on VirtualKey); after the window, only the new secret works. During the grace window, every request with the OLD secret succeeds just like before rotation — but LangWatch records it with enough signal to see the tail:

The request’s audit/ledger rows continue to reference the same vk_id — rotation is a secret swap, not a VK identity change.
Incoming requests using the previous secret can be identified in the trace by the langwatch.virtual_key_id span attr + gateway_request_id header even though the client-visible secret has changed.

Dedicated grace-window observability (gateway_vk_grace_window_hit_total Prometheus counter + langwatch.vk.grace_window=true span attr) is a v1.1 follow-up. For v1, teams measure rollout coverage by inspecting the LangWatch trace stream for each caller and cross-referencing against gateway_request_ids known to have been minted after rotated_at. File an issue if you hit this and need the dedicated metric sooner.

Operational hint: for now, set a calendar reminder for 20 hours after rotation. Check your client fleet’s config push status (Ansible run, k8s rolling-update status, secret-manager sync logs, etc.). If any client hasn’t picked up the new secret, you have 4 hours to push it or accept the hard-cutoff fallout at hour 24.

Step 5 — revoke (if exposure is public)

If the secret has leaked publicly, don’t rotate — revoke immediately.

# UI: VK detail → Revoke. Confirmation dialog. No grace window.
# CLI:
langwatch virtual-keys revoke "$VK_ID" --reason "secret leaked in commit SHA $LEAK_SHA"

The --reason field is stored in the audit row’s metadata so post-incident reviews have context. Use it liberally. After revoke, any request presenting the revoked secret gets:

HTTP/1.1 401 Unauthorized
Content-Type: application/json

{ "error": { "type": "invalid_api_key", "message": "virtual key has been revoked" } }

Cache invalidation propagates to every gateway pod within ~60 seconds via the /internal/gateway/changes long-poll. If you need instant invalidation everywhere for a regulatory-grade incident, restart the gateway deployment — the next request to hit a cold pod pulls fresh state before serving.

Step 6 — post-incident forensics

Once the bleeding has stopped, the audit log is your friend. You want to answer three questions:

When did the leak start? Audit history filter → virtual_key_updated events. Look for config changes (scope expansion, rate-limit increase, new guardrail bypass) that correlate with a usage spike.
Who had access? The VK’s principal + the audit actor column for every update. If actor = svc_<projectId>, a service token was used — look up which PAT that token belongs to and check whether the PAT itself was compromised.
What was the blast radius? Usage-page filter by VK for the suspect window. Stream of traces available via the langwatch.virtual_key_id attribute.

Export the full audit trail for the incident ticket. There are two supported paths in v1 — pick whichever fits your forensics workflow:

CSV from the UI — open /settings/audit-log, set Target = virtual_key and the target-id filter chip to $VK_ID, optionally narrow the time window, then click Export CSV. Drop the resulting audit_logs_<date>.csv into the incident ticket. Most expedient for one-shot incident bundling.
Direct PG query — for repeatable, scriptable evidence collection (preferred for SOC2 / SIEM-grade retention):

SELECT *
FROM "AuditLog"
WHERE "targetKind" = 'virtual_key'
  AND "targetId" = '<vk_id>'
ORDER BY "createdAt" DESC;

Pipe the result through psql -A -F$'\t' and gzip it into incident-<vk_id>-audit.tsv.gz. The shape carries the actor (user or svc_<projectId>), action code, before / after JSON, and metadata.requestId for cross-referencing the trace stream. See Audit log → Querying programmatically for the full schema.

What NOT to do

Don’t delete the VK. Revocation is soft-delete — the row stays in the database with status=revoked so traces remain attributable. Hard-deleting loses forensic context and breaks historic analytics.
Don’t assume the provider credential (upstream OpenAI/Anthropic API key) is safe just because you rotated the VK. If the leak vector could have exposed the provider credential too, rotate those directly in the provider console — LangWatch never has them in plaintext for most providers but a compromised operator account on provider-side is a separate threat.
Don’t skip the pre-rotation snapshot. It’s tempting to just hit Rotate when you’re panicking. The 30-second snapshot in Step 2 is what lets you later tell “the attacker did X from country Y” — once rotated, the attacker’s pattern becomes just another row in the tail.
Don’t publish the rotation in a public Slack channel before it’s in your secret store. If you’re pasting the new secret, you’re in the same class of problem you were trying to solve.

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Credential-leak incident response

Decision tree

Rotation vs revoke — the enforcement-shape difference

Step 1 — find the right VK

Step 2 — pre-rotation forensics (before you touch the secret)

Step 3 — rotate

From the UI

From the CLI

From the REST API

Step 4 — propagate to clients within 24h

Step 5 — revoke (if exposure is public)

Step 6 — post-incident forensics

What NOT to do

See also

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Documentation Index

​Decision tree

​Rotation vs revoke — the enforcement-shape difference

​Step 1 — find the right VK

​Step 2 — pre-rotation forensics (before you touch the secret)

​Step 3 — rotate

​From the UI

​From the CLI

​From the REST API

​Step 4 — propagate to clients within 24h

​Step 5 — revoke (if exposure is public)

​Step 6 — post-incident forensics

​What NOT to do

​See also

Decision tree

Rotation vs revoke — the enforcement-shape difference

Step 1 — find the right VK

Step 2 — pre-rotation forensics (before you touch the secret)

Step 3 — rotate

From the UI

From the CLI

From the REST API

Step 4 — propagate to clients within 24h

Step 5 — revoke (if exposure is public)

Step 6 — post-incident forensics

What NOT to do

See also