Tracking Time to First Token (TTFT)

Time to First Token (TTFT) measures how long your users wait between sending a request and seeing the first piece of the streamed answer. For streaming LLM applications it is usually a better proxy for perceived speed than total duration: a response can take 20 seconds to finish and still feel instant if the first token arrives in 300ms. LangWatch captures TTFT per LLM span, rolls it up to the trace, and surfaces it across the product:

Traces table with the TTFT column enabled, showing values and p95-scaled bars

Automatic capture

If your instrumentation already emits any of the signals below, TTFT shows up without any extra code:

Source	Signal
Vercel AI SDK	`ai.response.msToFirstChunk` span attribute
OpenLLMetry / OpenLIT instrumentors	`llm.content.completion.chunk` / `First Token Stream Event` span events
OTel GenAI semconv emitters	`gen_ai.server.time_to_first_token` span attribute (milliseconds)
LangWatch SDKs	`timestamps.first_token_at` on the span (see below)

When more than one LLM span in a trace reports a TTFT, the trace-level value is the smallest one, the first token the user could have seen.

If you stream through the Vercel AI SDK with experimental_telemetry enabled, TTFT is captured automatically from ai.response.msToFirstChunk, nothing to do.

Setting TTFT manually

If you handle streaming yourself, measure the time between starting the call and the first chunk, then set the gen_ai.server.time_to_first_token attribute (duration in milliseconds) on your LLM span:

import { getLangWatchTracer } from "langwatch";
import OpenAI from "openai";

const tracer = getLangWatchTracer("my-service");
const client = new OpenAI();

async function streamCompletion(prompt: string): Promise<string> {
  return await tracer.withActiveSpan("LLM call", async (span) => {
    span.setType("llm");
    span.setRequestModel("gpt-5-mini");
    span.setInput(prompt);

    const startedAt = Date.now();
    let firstTokenAt: number | null = null;
    const chunks: string[] = [];

    const stream = await client.chat.completions.create({
      model: "gpt-5-mini",
      messages: [{ role: "user", content: prompt }],
      stream: true,
    });

    for await (const chunk of stream) {
      if (firstTokenAt === null) {
        firstTokenAt = Date.now();
        // Duration in milliseconds until the first streamed token
        span.setAttribute(
          "gen_ai.server.time_to_first_token",
          firstTokenAt - startedAt
        );
      }
      chunks.push(chunk.choices[0]?.delta?.content ?? "");
    }

    span.setOutput(chunks.join(""));
    return chunks.join("");
  });
}

Alternatively, if you track the wall-clock timestamp of the first token instead of the elapsed duration, send the LangWatch timestamps attribute, with values in unix epoch milliseconds:

span.setAttribute(
  "langwatch.timestamps",
  JSON.stringify({ first_token_at: firstTokenAt })  // epoch milliseconds
);

LangWatch computes TTFT as first_token_at - started_at, falling back to the span’s own start time when started_at isn’t provided.

REST API

If you send spans through the REST API instead of the SDK, set first_token_at (epoch milliseconds) in the span’s timestamps object:

{
  "type": "llm",
  "span_id": "span-123",
  "timestamps": {
    "started_at": 1718886000000,
    "first_token_at": 1718886000800,
    "finished_at": 1718886004200
  }
}

Where TTFT shows up

Traces table column

On the Traces page, open the columns picker (the columns icon in the toolbar) and enable TTFT. The column behaves like the Duration column: each cell shows the value plus an inline bar scaled to the 95th percentile of the visible page, so one slow outlier doesn’t flatten every other row. Rows at or above the page’s TTFT p95 render a full red bar.

Hovering a cell shows how that trace compares against the page’s TTFT p95:

TTFT cell tooltip showing the percentage of the page's TTFT p95

The column is sortable, so you can order by TTFT to chase your slowest streaming starts.

Trace details

The trace drawer shows a TTFT pill in the header next to duration, cost, and tokens:

Trace drawer header showing the TTFT pill

Analytics

In Analytics > LLM Metrics you get time-to-first-token graphs with the usual aggregations (median, p90, p95, p99), and you can build custom graphs on the same metric for dashboards and alerting.

Troubleshooting

TTFT shows ”—” in the table: the trace has no LLM span reporting any of the signals above. For non-streaming calls this is expected, there is no “first token” moment distinct from the response itself.
Value looks wrong by orders of magnitude: check the units. gen_ai.server.time_to_first_token is a duration in milliseconds; first_token_at inside langwatch.timestamps is an epoch timestamp in milliseconds.

​Automatic capture

​Setting TTFT manually

​REST API

​Where TTFT shows up

​Traces table column

​Trace details

​Analytics

​Troubleshooting

Automatic capture

Setting TTFT manually

REST API

Where TTFT shows up

Traces table column

Trace details

Analytics

Troubleshooting