Projection DSL & field catalog

Why

By default POST /api/traces/search returns the full trace shape. For an ETL pipeline that only needs a handful of fields across thousands of traces, that is both more data than required and forces a second call per trace to read event-level feedback. The projection DSL lets a caller declare the columns of interest up front; the server returns exactly that shape — with nested events, annotations, and evaluations joined server-side — in one paginated response. The feature is fully additive: a request without from/select behaves exactly as before.

The contract

Two optional fields on the search request body:

Field	Type	Description
`from`	`"traces"`	Entity root to read from. Only `traces` is supported today; defaults to `traces` when `select` is present.
`select`	`string[]`	Flat list of dotted-path columns to project. Must be non-empty when present.

curl -X POST https://app.langwatch.ai/api/traces/search \
  -H "X-Auth-Token: $LANGWATCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "startDate": 1717200000000,
    "endDate":   1717286400000,
    "from": "traces",
    "select": [
      "trace_id",
      "started_at",
      "metadata.user_id",
      "metrics.total_cost",
      "evaluations.name", "evaluations.score",
      "events.type", "events.metrics",
      "annotations.is_thumbs_up", "annotations.scores"
    ]
  }'

How paths map to the response

Paths group by their root, so the response mirrors the selection:

Scalar fields (trace_id, started_at, input, …) stay at the top level of each row.
metadata.* nests under a metadata object.
metrics.* nests under a metrics object.
events.*, annotations.*, evaluations.* return as nested arrays — one row per trace, with each child record projected to the requested sub-fields.

A request for the select above yields rows shaped like:

{
  "trace_id": "abc123…",
  "started_at": 1717200001234,
  "metadata": { "user_id": "u_42" },
  "metrics": { "total_cost": 0.0031 },
  "evaluations": [
    { "name": "Faithfulness", "score": 0.91 }
  ],
  "events": [
    { "type": "thumbs_up_down", "metrics": { "vote": 1 } }
  ],
  "annotations": [
    { "is_thumbs_up": true, "scores": { "quality": { "value": "5" } } }
  ]
}

The `schema` response field

When select is present, the response envelope gains a top-level schema object listing the resolved columns — their dotted path, value type, and whether they belong to a nested collection. Use it to pre-allocate a typed reader (pandas, a Parquet writer, a warehouse table) without inferring types from the data:

{
  "traces": [ /* projected rows */ ],
  "pagination": { "totalHits": 3120, "scrollId": "…" },
  "schema": {
    "from": "traces",
    "columns": [
      { "path": "trace_id",         "type": "string",  "collection": false },
      { "path": "metrics.total_cost","type": "number", "collection": false },
      { "path": "evaluations.score","type": "number",  "collection": true  }
    ]
  }
}

Field catalog

Every selectable path is listed below. A path outside this catalog is rejected at request time with HTTP 400 and the offending path named in the error, so you can correct the whole select in one round-trip.

Trace scalars

Path	Type	Notes
`trace_id`	string
`project_id`	string
`started_at`	number	Epoch ms — when the trace occurred.
`inserted_at`	number	Epoch ms — when first stored.
`updated_at`	number	Epoch ms — when last modified.
`input`	string	Captured input. Requires input-visibility permission; redacted to `null` otherwise.
`output`	string	Captured output. Requires output-visibility permission; redacted to `null` otherwise.

`metadata.<key>`

Any metadata key (e.g. metadata.user_id, metadata.customer_id, metadata.thread_id, or your own custom keys). Returned under a metadata object.

`metrics.<key>`

Path	Type
`metrics.first_token_ms`	number
`metrics.total_time_ms`	number
`metrics.prompt_tokens`	number
`metrics.completion_tokens`	number
`metrics.reasoning_tokens`	number
`metrics.cache_read_input_tokens`	number
`metrics.cache_creation_input_tokens`	number
`metrics.total_cost`	number — requires cost-visibility permission; redacted to `null` otherwise
`metrics.tokens_estimated`	boolean

`evaluations.<field>` (nested array)

Path	Type
`evaluations.name`	string
`evaluations.score`	number
`evaluations.passed`	boolean
`evaluations.label`	string
`evaluations.details`	string
`evaluations.status`	string
`evaluations.evaluator_id`	string
`evaluations.type`	string
`evaluations.is_guardrail`	boolean

`events.<field>` (nested array)

Path	Type	Notes
`events.type`	string	The event name, e.g. `thumbs_up_down`, `star_rating`.
`events.timestamp`	number	Epoch ms.
`events.metrics`	json	All numeric metrics for the event.
`events.metrics.<key>`	number	A single named metric.
`events.details`	json	All string details for the event. Redacted without input-visibility permission.
`events.details.<key>`	string	A single named detail. Redacted without input-visibility permission.

`annotations.<field>` (nested array)

Path	Type	Notes
`annotations.is_thumbs_up`	boolean
`annotations.comment`	string	Requires output-visibility permission; redacted to `null` otherwise (reviewers routinely quote model output here).
`annotations.expected_output`	string	Requires output-visibility permission; redacted to `null` otherwise.
`annotations.created_at`	number	Epoch ms.
`annotations.scores`	json	All score values, keyed by your AnnotationScore name.
`annotations.scores.<name>`	json	A single score, addressed by its AnnotationScore name (e.g. `annotations.scores.quality`).

Annotations are manual reviews captured in the LangWatch UI (thumbs, scores, comments). They live alongside SDK-emitted events and are joined into the same response — no separate annotations call needed.

Choosing the date axis

By default the startDate/endDate window filters traces by when they occurred. For incremental ETL you usually want everything changed since your last pull — a trace can occur weeks before it gains a later evaluation or annotation. Set dateField to switch the axis:

`dateField`	Window applies to	Use for
`occurred` (default)	When the trace happened	Time-bucketed reporting.
`updated`	When the trace was last modified	Incremental / CDC pulls.

{
  "startDate": 1717200000000,   // last watermark
  "endDate":   1717286400000,   // now
  "dateField": "updated",
  "select": ["trace_id", "updated_at", "annotations.is_thumbs_up"]
}

Delivery contract on the updated axis: at-least-once across pulls, at-most-once within a single scroll. A trace modified again while you are paging moves past the cursor and is not re-delivered in the current scroll — but its new updated_at lands in your next pull’s window (which starts at your last watermark), so no change is ever lost across pulls. A CDC loop that advances its watermark to the previous pull’s start time gets every change.

The updated watermark advances on every ClickHouse-side change to a trace — evaluations, metadata updates, spans, and annotation creates and deletes. One gap today: an in-place edit to an existing annotation (e.g. changing a score from 3 to 5) writes Postgres only and does not advance the trace’s ClickHouse updated_at, so dateField: "updated" does not pick it up until the trace changes again. New and removed annotations are caught. See #4736 for the cross-store watermark follow-up.

Performance

The projection drives column selection, so asking for only lightweight fields keeps the query lightweight. In particular, the heavy captured input/output columns are only read when you select them — omit them and a wide window stays fast. Each page is capped at 1000 rows; keep following pagination.scrollId until it is absent to sweep a full window.

Backwards compatibility

A request with neither from nor select returns the existing full-trace shape and no schema field. Existing integrations are unaffected.

Search traces Get trace details

​Why

​The contract

​How paths map to the response

​The schema response field

​Field catalog

​Trace scalars

​metadata.<key>

​metrics.<key>

​evaluations.<field> (nested array)

​events.<field> (nested array)

​annotations.<field> (nested array)

​Choosing the date axis

​Performance

​Backwards compatibility

Why

The contract

How paths map to the response

The `schema` response field

Field catalog

Trace scalars

`metadata.<key>`

`metrics.<key>`

`evaluations.<field>` (nested array)

`events.<field>` (nested array)

`annotations.<field>` (nested array)

Choosing the date axis

Performance

Backwards compatibility