Tracking LLM Costs and Tokens

By default, LangWatch automatically captures token usage and calculates cost for your LLM calls, no configuration needed.

This page explains exactly how that works, which token types are captured, which OpenTelemetry attributes are read, what happens when token counts are missing, and how to adjust costs manually when you need to.

How cost calculation works

LangWatch calculates costs in three steps:

Capture the model name from the LLM span (e.g. openai/gpt-5-mini)
Capture token usage from the span’s attributes, or estimate it when missing
Multiply by the model’s per-token prices from the pricing registry in Settings > Model Costs

The model price registry

LangWatch ships a registry of 350+ models with input, output, cache read, and cache write prices per token. Prices are checked every day against the providers’ public pricing and the registry is updated as soon as a price or model changes, so the costs you see track what providers actually charge.

Model Costs settings page listing 350+ models with input, output, cache read and cache write prices

Each entry has a regex match rule, so the same price matches the different spellings instrumentation libraries use for the same model (gpt-5-mini, openai/gpt-5-mini, openai.responses/gpt-5-mini). Bedrock-style identifiers like eu.anthropic.claude-sonnet-4-5-20250929-v1:0 are normalized and matched too. If a model isn’t in the registry, LangWatch does not invent a price: the cost shows as $0 until you add a custom entry (see Custom model costs).

Which tokens are captured

LangWatch captures the full token breakdown reported by your LLM provider:

Token type	What it is	Pricing
Input tokens	The prompt sent to the model	Input rate
Output tokens	The completion generated by the model	Output rate
Cache read tokens	Prompt tokens served from the provider’s prompt cache	Cache read rate (falls back to input rate)
Cache write tokens	Prompt tokens written into the provider’s prompt cache	Cache write rate (falls back to input rate)
Reasoning tokens	Thinking tokens from reasoning models (o-series, Claude extended thinking, etc.)	Already included in the provider’s output token count, so they are billed as output tokens; LangWatch captures the breakdown separately so you can see how much of the output was reasoning

All of these are summed across the spans of a trace and shown in the trace details, the traces table (Tokens, Tokens In, Tokens Out columns), and analytics.

OpenTelemetry attributes reference

If you use OpenTelemetry instrumentation directly (or want to patch what a framework emits), these are the span attributes LangWatch reads, in their order of precedence:

Data	Attributes
Input tokens	`gen_ai.usage.input_tokens`, `gen_ai.usage.prompt_tokens`, `ai.usage.promptTokens` (Vercel), `llm.token_count.prompt` (OpenInference)
Output tokens	`gen_ai.usage.output_tokens`, `gen_ai.usage.completion_tokens`, `ai.usage.completionTokens` (Vercel), `llm.token_count.completion` (OpenInference)
Cache read tokens	`gen_ai.usage.cache_read.input_tokens`, `gen_ai.usage.cached_tokens`, `llm.token_count.prompt_details.cache_read` (OpenInference)
Cache write tokens	`gen_ai.usage.cache_creation.input_tokens`, `llm.token_count.prompt_details.cache_write` (OpenInference)
Reasoning tokens	`gen_ai.usage.reasoning_tokens`, `llm.token_count.completion_details.reasoning` (OpenInference)
Model	`gen_ai.response.model`, `gen_ai.request.model`
Pre-computed cost	`langwatch.span.cost` (a number, in USD)

import { trace } from "@opentelemetry/api";

const tracer = trace.getTracer("my-service");

async function myLLMCallWithTracking(prompt: string) {
  return await tracer.startActiveSpan("llm_call_wrapper", async (span) => {
    span.setAttribute("gen_ai.operation.name", "chat");
    span.setAttribute("gen_ai.request.model", "gpt-5-mini");

    const response = await yourFrameworkLLMClient.generate(prompt);

    if (response.usage) {
      span.setAttribute("gen_ai.usage.input_tokens", response.usage.inputTokens);
      span.setAttribute("gen_ai.usage.output_tokens", response.usage.outputTokens);
      // Optional breakdowns
      span.setAttribute("gen_ai.usage.cache_read.input_tokens", response.usage.cachedTokens);
      span.setAttribute("gen_ai.usage.reasoning_tokens", response.usage.reasoningTokens);
    }

    span.end();
    return response;
  });
}

When token counts are missing

Some providers and frameworks don’t report usage on streamed responses, and some integrations drop it. When an LLM span arrives with the model and input/output captured but no token counts, LangWatch estimates them by tokenizing the captured input and output text with OpenAI’s o200k_base encoding, the same tokenizer used by the GPT-5 model family. Estimated tokens (and the costs computed from them) are marked with a ~ prefix in the UI, so you always know which numbers are exact and which are approximations. Exact counts reported by your provider always win over estimation.

Adjusting tokens and costs manually

If automatic capture doesn’t fit your setup, a proxy strips usage data, a provider isn’t supported, or you negotiate custom pricing, you can set token counts and cost explicitly with the TypeScript SDK:

import { setupObservability } from "langwatch/observability/node";
import { getLangWatchTracer } from "langwatch";

setupObservability();

const tracer = getLangWatchTracer("cost-tracking-example");

async function customLLMCall(prompt: string): Promise<string> {
  return await tracer.withActiveSpan("CustomLLMCall", async (span) => {
    span.setType("llm");
    span.setRequestModel("my-custom-model-v1"); // Use the exact model identifier
    span.setInput("text", prompt);

    const response = await yourCustomLLMClient.generate(prompt);

    span.setOutput("text", response.text);
    span.setMetrics({
      promptTokens: response.usage.promptTokens,
      completionTokens: response.usage.completionTokens,
      cost: 0.042, // optional: explicit cost in USD, wins over computation
    });

    return response.text;
  });
}

When cost is set, it is used as-is, even for a known model with token counts: an explicit cost is treated as authoritative and overrides the registry calculation. When only token counts are set, LangWatch multiplies them by the model’s registry prices as usual.

Custom model costs

For self-hosted models, fine-tunes, or negotiated pricing, add your own entries in Settings > Model Costs:

Click “Add New Model”
Configure the entry:
- Model Name: descriptive name (e.g. my-custom-model-v1)
- Regex Match Rule: pattern matching your model identifier (e.g. ^my-custom-model-v1$)
- Input / Output Cost: price per token
- Cache Read / Cache Write Cost: optional cache rates

Custom costs can be scoped to a project, a team, or the whole organization, the most specific scope wins, and custom entries take precedence over the built-in registry.

Common model identifier patterns

Make sure your regex matches how the model name appears in your spans:

Framework	Model Identifier Format	Regex Pattern
OpenAI SDK	`gpt-5`	`^gpt-5$`
Azure OpenAI	`gpt-5`	`^gpt-5$`
LangChain	`openai/gpt-5`	`^openai/gpt-5$`
Custom	`my-custom-model-v1`	`^my-custom-model-v1$`

Troubleshooting

Costs show $0 or are missing

Open a trace that should have cost data and click the LLM span
Check these fields:
- Model: must show the model identifier (e.g. openai/gpt-5)
- Input/Output: must contain the captured messages
- Metrics: must show prompt + completion tokens

LLM span showing model, input/output, and token metrics

Then work through:

Model missing: your framework isn’t capturing it. Use the framework auto-tracking from the Integrations menu, or set it manually with span.setRequestModel(...) as shown above.
Model present but cost is $0: the model isn’t matching the registry. Check Settings > Model Costs, the regex must match your exact model identifier. Add a custom entry if it’s not there.
Tokens are 0 but model is captured: your provider isn’t returning usage (common with streaming). Either enable usage on streaming (e.g. stream_options: { include_usage: true } for OpenAI), set metrics manually, or let LangWatch estimate them from the captured text.
Tokens show but with a ~: they were estimated by the tokenizer because the provider didn’t report exact counts. This is expected; report exact counts to remove the marker.

Vercel AI SDK

When using the Vercel AI SDK, enable telemetry so usage is captured:

import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const result = await generateText({
  model: openai("gpt-5-mini"),
  prompt: "Hello!",
  experimental_telemetry: { isEnabled: true },
});

Token usage flows through the ai.usage.promptTokens / ai.usage.completionTokens attributes automatically.

Getting Help

If you’re still experiencing issues after following this guide:

Check the LangWatch logs for any error messages
Verify your API key and endpoint configuration
Share a minimal reproduction with the specific framework you’re using
Contact support at support@langwatch.ai with:
- Your integration method (SDK, OpenTelemetry, etc.)
- Framework versions
- Sample span data from the dashboard

​How cost calculation works

​The model price registry

​Which tokens are captured

​OpenTelemetry attributes reference

​When token counts are missing

​Adjusting tokens and costs manually

​Custom model costs

​Common model identifier patterns

​Troubleshooting

​Costs show $0 or are missing

​Vercel AI SDK

​Getting Help