The shape
- Each end-user has their own VK. One per customer, or per seat, your call.
- You set per-customer budgets. Enforced at the gateway. When a customer hits their cap, the gateway returns
402 budget_exceeded. Your code doesn’t need to track spend. - All audit + trace data lands in your LangWatch project. End-users never see LangWatch.
- End-users can’t escape your policy. Policy rules, model allowlists, cache rules all attach to the VK.
Step 1: Model the tenancy
Decide your scope granularity:| Model | VK per | Budget scope | Use case |
|---|---|---|---|
| Per customer | one VK per tenant | principal or virtual_key | Most SaaS apps, one account = one VK |
| Per seat | one VK per user within tenant | principal | Per-seat billing, per-user rate limits |
| Per team | one VK per sub-org | team-scope budgets | Customers have their own teams/projects |
Step 2: Configure the upstream ModelProvider once
Your LangWatch org owns a single ModelProvider row per upstream (OpenAI, Anthropic) at ORGANIZATION scope. End-users never see these. Configure them under Settings → Model Providers → Add Model Provider (Scope = Organization), then open each row’s Advanced (Gateway) tab and set the per-credential gateway caps:| Field | Suggested value |
|---|---|
| RPM | 100000 (your bulk-purchase cap from the upstream) |
| Fallback priority | 10 for primary, 20 for backup |
| Provider config (JSON) | region/deployment overrides as needed |
Step 3: Provision a VK when a customer signs up
Server-side code (Node.js example):routing_policy_id, dispatch falls back to the org’s default ordering (fallbackPriorityGlobal then createdAt).
In a signup webhook:
Step 4: Attach a per-customer budget
Tie the budget to the VK:/changes feed, no restart, no customer impact.
Step 5: End-user makes a call
Your customer’s app calls the gateway directly with the VK you gave them:- Authenticates the VK.
- Checks the customer’s budget.
- Applies your
models_allowedandpolicy_rules. - Dispatches to OpenAI using your provider credential.
- Meters the cost against the customer’s budget.
- Emits a LangWatch trace into your project, tagged with
principal_user_id = customerId.
Step 6: Bill the customer
You have two options, matching two mental models:(a) Bill from your pricing table
You define a markup over what the gateway shows you spent. Use/api/gateway/v1/budgets or the gateway_budget_ledger ClickHouse view to read per-VK spend:
(b) Passthrough billing
For passthrough, your UI shows the same numbers you see in LangWatch + a per-customer markup or flat management fee. The BudgetLedger row is per-request and includes cost, you can stream it into your reporting.Handling over-limit customers
Whenon_breach: BLOCK, the gateway returns 402 budget_exceeded. Your customer sees:
- Show an upgrade CTA in the end-user’s UI.
- Fall back to a free-tier response (“you’ve hit your monthly cap; upgrade for unlimited”).
error.type.
Rotation & revocation
Customer resets their API key in your UI → you calllangwatch.rotate(vkId), get a new secret, send it to the customer. The old secret stops working within the gateway’s cache TTL (~60 s).
Customer cancels their subscription → langwatch.revoke(vkId). The next request returns 403 virtual_key_revoked. If you want a grace period, schedule the revoke job for 24–48 h after cancellation instead.
Audit: who did what
Every write through your backend token is audited withactor = "svc_<your_project_id>", action, target, and metadata. Filter the audit log on resource_type = "virtualKey" to get a per-customer provisioning history. Under /settings/audit-log in the LangWatch UI.
If you need SIEM export: the Postgres AuditLog table is queryable for gateway-shape rows (filter on targetKind IN ('virtual_key', 'budget', 'model_provider', 'routing_policy', 'cache_rule')). Set up a daily pg_dump-then-ship pipeline into your SIEM’s ingestion path. There is no public REST audit-export endpoint in v1, see Audit log → Querying programmatically for the supported paths (UI CSV download, direct SQL).
Gotchas
- Never let the customer see your backend API token. It has
virtualKeys:create; they’d provision more VKs charged to you. - Rotate VKs when an end-user leaves your customer’s org. Otherwise the ex-user keeps spend access until the budget resets.
- Set
principal_user_idat VK creation time, not later. Audit attribution is based on this; filling it in after-the-fact doesn’t retroactively re-tag old traces. - Test the
402path in your app before go-live. Many apps have unhandled exceptions on budget breach and crash the user’s flow. - Budgets scoped to
virtual_keyare the right level for per-customer enforcement. Scoping toprincipalworks too but traces get messier becauseprincipal_user_idvalues from different customers can collide if you’re not careful with namespacing.
Rate limits at your gateway level
If you want to throttle a specific customer without moving them to a different plan:429 rate_limit_exceeded. Combine with a short-window budget (hour, minute) for finer control.
See also
- Management REST API: every endpoint used above.
- Virtual Keys: VK lifecycle semantics.
- Budgets: hierarchical scope logic.
- Security: how backend tokens are isolated from VK secrets.
- CI smoke-test cookbook: validate the full flow in CI.