Dimensions
RPM: requests per minute
Classic Leaky-bucket, token-bucket: a counter that refills atlimit / 60 tokens per second, caps at limit. Every request consumes 1 token. Bucket empty → the request returns 429 rate_limit_exceeded with Retry-After seconds to refill.
Good for: protecting providers from a flood, enforcing a fair-share per customer, giving CI pipelines a known cap.
Not for: cost enforcement, use budgets for spend caps.
RPD: requests per day
Rolling-24-hour window. Every request consumes 1; counter resets at the first request after the window ticks. RPD and RPM are enforced jointly: a request that fits under RPM still fails if RPD is exhausted. Good for: capping long-tail daily spend (cheaper than budgets for simple per-user limits), rate-limiting evaluation runs to stay under a provider’s daily quota.TPM: tokens per minute (v1.1)
When v1.1 ships, TPM will consume tokens based on actual usage reported by the provider (or estimated on request body for pre-dispatch shaping). The accounting will be cross-dimension with RPM + RPD, so a request fits if every applicable bucket has room.Precedence
The gateway evaluates limits in the order it touches them:- Per-ModelProvider (configured on Settings → Model Providers → Advanced (Gateway) tab), protects a single upstream account from all traffic.
- Per-VK (configured on the VK drawer), protects a specific key from over-sending.
429 envelope
When the bucket rejects, the response is:X-LangWatch-RateLimit-Dimension identifies which bucket tripped:
rpm, minute bucket fullrpd, daily bucket full (only when both RPM and RPD are configured and RPD tripped first)tpm, v1.1 only
Setting limits
Per VK: VK drawer → “Rate limits (per-VK)” section. See the drawer’s(i) tooltips for per-field guidance.
Per ModelProvider: Settings → Model Providers → row → Advanced (Gateway) tab → “Rate limit (rpm)”, “Rate limit (rpd)”, “Rate limit (tpm)” fields. See Gateway provider settings → RPM.
Leave a field blank to inherit the upstream provider’s own limit (i.e., no gateway-side throttle on that dimension).
Observability
Rate-limit events surface via the gateway’s standard observability primitives, there is no dedicated rate-limit counter in v1:- HTTP metric
gateway_http_requests_total{status="429"}, counts 429s including both gateway-enforced and upstream-reported. - Upstream attempt metric
gateway_provider_attempts_total{outcome="rate_limit"}, increments when a 429 came back from the provider (not from gateway-side enforcement). Joining onstatus="429"vsoutcome="rate_limit"distinguishes gateway-enforced (status 429, no provider_attempts increment on that dimension) from upstream-enforced. - Response header
X-LangWatch-RateLimit-Dimension, the downstream key-switch signal (also included in the span error message for trace-level analysis). - Trace error: the rejected request’s span carries
error.type="rate_limit_exceeded"with the dimension in the message string.
A dedicated
gateway_rate_limit_rejects_total{dimension} counter that slices by dimension without needing to parse the trace stream is a v1.1 observability follow-up. Until then, dashboards can filter gateway_http_requests_total{status="429"} by the Retry-After header or check the X-LangWatch-RateLimit-Dimension value in trace attributes at query time.Cross-replica coordination
In multi-replica deployments the buckets are per-replica by default: the gateway’s limiter is an in-memorygolang.org/x/time/rate token bucket stored in an LRU cache, zero external dependency on the hot path. If your cluster runs N replicas, the effective org-wide RPM is N × configured_rpm (explicit design trade: zero-dependency on the hot path over strict cluster-wide correctness).
Permissions
| Action | Permission |
|---|---|
| View rate-limit settings | virtualKeys:view + modelProviders:view |
| Edit per-VK rate limits | virtualKeys:update |
| Edit per-ModelProvider rate limits | modelProviders:update |
See also
- Virtual keys: per-VK limit configuration.
- Provider bindings → RPM: per-binding limit configuration.
- Budgets: spend caps (different shape; use both).
- Concepts → Rate limits vs budgets: when to reach for which.