Skip to main content

Why

By default POST /api/traces/search returns the full trace shape. For an ETL pipeline that only needs a handful of fields across thousands of traces, that is both more data than required and forces a second call per trace to read event-level feedback. The projection DSL lets a caller declare the columns of interest up front; the server returns exactly that shape — with nested events, annotations, and evaluations joined server-side — in one paginated response. The feature is fully additive: a request without from/select behaves exactly as before.

The contract

Two optional fields on the search request body:
FieldTypeDescription
from"traces"Entity root to read from. Only traces is supported today; defaults to traces when select is present.
selectstring[]Flat list of dotted-path columns to project. Must be non-empty when present.
curl -X POST https://app.langwatch.ai/api/traces/search \
  -H "X-Auth-Token: $LANGWATCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "startDate": 1717200000000,
    "endDate":   1717286400000,
    "from": "traces",
    "select": [
      "trace_id",
      "started_at",
      "metadata.user_id",
      "metrics.total_cost",
      "evaluations.name", "evaluations.score",
      "events.type", "events.metrics",
      "annotations.is_thumbs_up", "annotations.scores"
    ]
  }'

How paths map to the response

Paths group by their root, so the response mirrors the selection:
  • Scalar fields (trace_id, started_at, input, …) stay at the top level of each row.
  • metadata.* nests under a metadata object.
  • metrics.* nests under a metrics object.
  • events.*, annotations.*, evaluations.* return as nested arrays — one row per trace, with each child record projected to the requested sub-fields.
A request for the select above yields rows shaped like:
{
  "trace_id": "abc123…",
  "started_at": 1717200001234,
  "metadata": { "user_id": "u_42" },
  "metrics": { "total_cost": 0.0031 },
  "evaluations": [
    { "name": "Faithfulness", "score": 0.91 }
  ],
  "events": [
    { "type": "thumbs_up_down", "metrics": { "vote": 1 } }
  ],
  "annotations": [
    { "is_thumbs_up": true, "scores": { "quality": { "value": "5" } } }
  ]
}

The schema response field

When select is present, the response envelope gains a top-level schema object listing the resolved columns — their dotted path, value type, and whether they belong to a nested collection. Use it to pre-allocate a typed reader (pandas, a Parquet writer, a warehouse table) without inferring types from the data:
{
  "traces": [ /* projected rows */ ],
  "pagination": { "totalHits": 3120, "scrollId": "…" },
  "schema": {
    "from": "traces",
    "columns": [
      { "path": "trace_id",         "type": "string",  "collection": false },
      { "path": "metrics.total_cost","type": "number", "collection": false },
      { "path": "evaluations.score","type": "number",  "collection": true  }
    ]
  }
}

Field catalog

Every selectable path is listed below. A path outside this catalog is rejected at request time with HTTP 400 and the offending path named in the error, so you can correct the whole select in one round-trip.

Trace scalars

PathTypeNotes
trace_idstring
project_idstring
started_atnumberEpoch ms — when the trace occurred.
inserted_atnumberEpoch ms — when first stored.
updated_atnumberEpoch ms — when last modified.
inputstringCaptured input. Requires input-visibility permission; redacted to null otherwise.
outputstringCaptured output. Requires output-visibility permission; redacted to null otherwise.

metadata.<key>

Any metadata key (e.g. metadata.user_id, metadata.customer_id, metadata.thread_id, or your own custom keys). Returned under a metadata object.

metrics.<key>

PathType
metrics.first_token_msnumber
metrics.total_time_msnumber
metrics.prompt_tokensnumber
metrics.completion_tokensnumber
metrics.reasoning_tokensnumber
metrics.cache_read_input_tokensnumber
metrics.cache_creation_input_tokensnumber
metrics.total_costnumber — requires cost-visibility permission; redacted to null otherwise
metrics.tokens_estimatedboolean

evaluations.<field> (nested array)

PathType
evaluations.namestring
evaluations.scorenumber
evaluations.passedboolean
evaluations.labelstring
evaluations.detailsstring
evaluations.statusstring
evaluations.evaluator_idstring
evaluations.typestring
evaluations.is_guardrailboolean

events.<field> (nested array)

PathTypeNotes
events.typestringThe event name, e.g. thumbs_up_down, star_rating.
events.timestampnumberEpoch ms.
events.metricsjsonAll numeric metrics for the event.
events.metrics.<key>numberA single named metric.
events.detailsjsonAll string details for the event. Redacted without input-visibility permission.
events.details.<key>stringA single named detail. Redacted without input-visibility permission.

annotations.<field> (nested array)

PathTypeNotes
annotations.is_thumbs_upboolean
annotations.commentstringRequires output-visibility permission; redacted to null otherwise (reviewers routinely quote model output here).
annotations.expected_outputstringRequires output-visibility permission; redacted to null otherwise.
annotations.created_atnumberEpoch ms.
annotations.scoresjsonAll score values, keyed by your AnnotationScore name.
annotations.scores.<name>jsonA single score, addressed by its AnnotationScore name (e.g. annotations.scores.quality).
Annotations are manual reviews captured in the LangWatch UI (thumbs, scores, comments). They live alongside SDK-emitted events and are joined into the same response — no separate annotations call needed.

Choosing the date axis

By default the startDate/endDate window filters traces by when they occurred. For incremental ETL you usually want everything changed since your last pull — a trace can occur weeks before it gains a later evaluation or annotation. Set dateField to switch the axis:
dateFieldWindow applies toUse for
occurred (default)When the trace happenedTime-bucketed reporting.
updatedWhen the trace was last modifiedIncremental / CDC pulls.
{
  "startDate": 1717200000000,   // last watermark
  "endDate":   1717286400000,   // now
  "dateField": "updated",
  "select": ["trace_id", "updated_at", "annotations.is_thumbs_up"]
}
Delivery contract on the updated axis: at-least-once across pulls, at-most-once within a single scroll. A trace modified again while you are paging moves past the cursor and is not re-delivered in the current scroll — but its new updated_at lands in your next pull’s window (which starts at your last watermark), so no change is ever lost across pulls. A CDC loop that advances its watermark to the previous pull’s start time gets every change.
The updated watermark advances on every ClickHouse-side change to a trace — evaluations, metadata updates, spans, and annotation creates and deletes. One gap today: an in-place edit to an existing annotation (e.g. changing a score from 3 to 5) writes Postgres only and does not advance the trace’s ClickHouse updated_at, so dateField: "updated" does not pick it up until the trace changes again. New and removed annotations are caught. See #4736 for the cross-store watermark follow-up.

Performance

The projection drives column selection, so asking for only lightweight fields keeps the query lightweight. In particular, the heavy captured input/output columns are only read when you select them — omit them and a wide window stays fast. Each page is capped at 1000 rows; keep following pagination.scrollId until it is absent to sweep a full window.

Backwards compatibility

A request with neither from nor select returns the existing full-trace shape and no schema field. Existing integrations are unaffected.