A VK ends up in a public repo, an ex-employee’s laptop is still syncing, a CI secret got mirrored to a vendor log. You need to stop the bleeding fast, without breaking every legitimate caller at the same time. This cookbook walks the end-to-end path using the gateway surfaces that exist specifically for this scenario. Estimated time: 5 min for rotation-class incidents, 15 min for revoke-class. Both scale independently of how many clients use the leaked secret.Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Decision tree
Rotation vs revoke — the enforcement-shape difference
| Operation | Old secret behaviour | Use when |
|---|---|---|
| Rotate | Keeps working for 24h (grace window), then hard-fails with 401. | Secret left a trusted boundary but is not publicly exposed. Gives clients a window to update configs. |
| Revoke | Immediately fails with 401. No grace period. | Secret is publicly exposed, or you need to terminate a specific caller’s access right now. |
Step 1 — find the right VK
If you know the secret, look up the VK by its prefix:Step 2 — pre-rotation forensics (before you touch the secret)
Before you rotate, capture the current state. Rotation creates an audit row, but deeper questions (“how many callers? from where? what did they spend?”) are better answered BEFORE you change the secret.- Open the VK detail page.
- Click Audit history in the header. This is a deep-link into
/settings/audit-log?targetKind=virtual_key&targetId=<vk_id>pre-filtered to this VK only. You’ll see every CREATE / UPDATE / ROTATE event with actor + timestamp + changed-fields diff summary, alongside any related platform actions on the same resource. - Scroll down to the Usage (last 30 days) section — this is the per-VK spend sparkline. Any dramatic spike or off-hours usage is worth flagging now; after rotation the old secret’s tail-usage shows up in audit as an implicit “used during grace window” signal.
- Export the current spend detail for the incident ticket:
Step 3 — rotate
From the UI
- VK detail page → Rotate.
- Confirm: “A fresh secret will be minted and shown once. The current secret keeps working for 24h (grace window) so clients can roll over.”
- The new secret is displayed once. Save it to your secret store before checking I’ve saved the secret in a safe place → Close.
From the CLI
From the REST API
Step 4 — propagate to clients within 24h
Grace-window math:rotated_at + 24h = hard-cut-off. The previous secret remains resolvable on the gateway’s auth hot-path for that window (see the resolver’s previousHashedSecret + previousSecretValidUntil fields on VirtualKey); after the window, only the new secret works.
During the grace window, every request with the OLD secret succeeds just like before rotation — but LangWatch records it with enough signal to see the tail:
- The request’s audit/ledger rows continue to reference the same
vk_id— rotation is a secret swap, not a VK identity change. - Incoming requests using the previous secret can be identified in the trace by the
langwatch.virtual_key_idspan attr +gateway_request_idheader even though the client-visible secret has changed.
Dedicated grace-window observability (
gateway_vk_grace_window_hit_total Prometheus counter + langwatch.vk.grace_window=true span attr) is a v1.1 follow-up. For v1, teams measure rollout coverage by inspecting the LangWatch trace stream for each caller and cross-referencing against gateway_request_ids known to have been minted after rotated_at. File an issue if you hit this and need the dedicated metric sooner.Step 5 — revoke (if exposure is public)
If the secret has leaked publicly, don’t rotate — revoke immediately.--reason field is stored in the audit row’s metadata so post-incident reviews have context. Use it liberally.
After revoke, any request presenting the revoked secret gets:
/internal/gateway/changes long-poll. If you need instant invalidation everywhere for a regulatory-grade incident, restart the gateway deployment — the next request to hit a cold pod pulls fresh state before serving.
Step 6 — post-incident forensics
Once the bleeding has stopped, the audit log is your friend. You want to answer three questions:- When did the leak start? Audit history filter → virtual_key_updated events. Look for config changes (scope expansion, rate-limit increase, new guardrail bypass) that correlate with a usage spike.
- Who had access? The VK’s principal + the audit actor column for every update. If
actor = svc_<projectId>, a service token was used — look up which PAT that token belongs to and check whether the PAT itself was compromised. - What was the blast radius? Usage-page filter by VK for the suspect window. Stream of traces available via the
langwatch.virtual_key_idattribute.
- CSV from the UI — open
/settings/audit-log, setTarget = virtual_keyand the target-id filter chip to$VK_ID, optionally narrow the time window, then click Export CSV. Drop the resultingaudit_logs_<date>.csvinto the incident ticket. Most expedient for one-shot incident bundling. - Direct PG query — for repeatable, scriptable evidence collection (preferred for SOC2 / SIEM-grade retention):
psql -A -F$'\t' and gzip it into incident-<vk_id>-audit.tsv.gz. The shape carries the actor (user or svc_<projectId>), action code, before / after JSON, and metadata.requestId for cross-referencing the trace stream. See Audit log → Querying programmatically for the full schema.
What NOT to do
- Don’t delete the VK. Revocation is soft-delete — the row stays in the database with
status=revokedso traces remain attributable. Hard-deleting loses forensic context and breaks historic analytics. - Don’t assume the provider credential (upstream OpenAI/Anthropic API key) is safe just because you rotated the VK. If the leak vector could have exposed the provider credential too, rotate those directly in the provider console — LangWatch never has them in plaintext for most providers but a compromised operator account on provider-side is a separate threat.
- Don’t skip the pre-rotation snapshot. It’s tempting to just hit Rotate when you’re panicking. The 30-second snapshot in Step 2 is what lets you later tell “the attacker did X from country Y” — once rotated, the attacker’s pattern becomes just another row in the tail.
- Don’t publish the rotation in a public Slack channel before it’s in your secret store. If you’re pasting the new secret, you’re in the same class of problem you were trying to solve.
See also
- Virtual Keys → Rotation — the operation reference.
- Virtual Keys → Revocation — the immediate-termination reference.
- Audit log — how to query the audit log programmatically.
- RBAC → virtualKeys:rotate — who can rotate which keys.
- Observability → grace-window metrics — Prometheus counters that matter during an incident.