Runs

The Runs page is a central observability hub within your project. It displays all telemetry data captured from your connected AI agents, so you can monitor real-time agent activity, inspect individual sessions and traces, debug failures, and curate datasets for evaluation. Navigation: Select a Project, then go to Observability → Runs in the left sidebar.

Overview

Section	Purpose
Key Capabilities	What you can do on the Runs page
Sessions, Traces, and Spans	The three-level telemetry hierarchy
Streaming and Paused Mode	How live data and frozen views differ
The Sessions Grid	The main data grid and its columns
Inspect Session Details	Drill into I/O, logs, policies, and metadata
Traces and Spans	Navigate the session hierarchy
Filter Runs	Narrow down sessions by time, status, and metadata
Create Datasets	Group sessions for evaluation and regression testing
Common Use Cases	Real-world workflows on the Runs page

Key Capabilities

Stream incoming telemetry in real time with automatic updates.
Drill down from Sessions → Traces → Spans for granular inspection.
Filter data by time range, status, evaluation results, metadata, and natural language queries.
Create static or dynamic datasets directly from sessions for evaluation and regression testing.
Visualize trace timelines in a List or Waterfall view with color-coded span types.

Sessions, Traces, and Spans

The Runs page organizes telemetry data into a three-level hierarchy following the OpenTelemetry standard.

Concept	Definition	Example
Session	A collection of related traces representing a complete user interaction or conversation	A multi-turn customer support conversation
Trace	A single agent workflow from input to output within a session. Each trace represents one request-response cycle	A user asks, “What is my account balance?” and the agent responds
Span	An individual operation within a trace. Spans nest hierarchically to form a tree structure	An LLM API call, tool invocation, or database query within a single trace

A session contains one or more traces. Each trace contains one or more spans. You can drill down through each level to inspect execution details, timing, inputs, outputs, and errors.

Streaming and Paused Mode

The Runs page defaults to Streaming mode, where sessions update automatically. The streaming indicator in the top-right corner displays the current state and how recently the data was refreshed (for example, “Streaming · Updated 6s ago”). Click the Streaming indicator to pause the data feed. While paused:

The data grid freezes at the current point in time.
A counter shows how many new sessions have arrived since you paused (for example, +1, +2, +3).
Advanced filtering, bulk selection, and actions such as Save become available.

Click the indicator again to resume streaming. Any sessions that arrived while paused load into the view.

You can apply filters and perform bulk actions only while the stream is paused.

Streaming vs. Paused Mode

Feature	Streaming Mode	Paused Mode
Data updates	Auto-refresh (real-time)	Static (frozen at the point of pause)
Search bar	Disabled	Enabled
Time range picker	Disabled	Enabled (defaults to Last 30 days)
Row checkboxes	Hidden	Visible
Save button	Disabled	Enabled (when rows are selected)
Policies tab in Detail Panel	Enabled	Enabled

The Sessions Grid

The main area of the Runs page displays a data grid listing all sessions within the project. Click Columns in the top-right corner of the grid to open the Toggle columns panel. Select or clear checkboxes to show or hide columns. Available columns include: ID, Start Time, Last Updated Time, Duration, View Traces, Policies, Cost, Input Tokens, Output Tokens, Avg Latency, and PII.

Inspect Session Details

Click any session row in the data grid to open a detail panel with a complete breakdown of everything that happened during that session. The panel header displays the Session ID (with a copy icon), Latency, and Total Cost. The detail panel has two areas:

Timeline Visualization (left) — Displays all traces, agents, and spans that executed during the session. Toggle between List view (a vertical list with durations) and Waterfall view (a Gantt chart showing start times and durations relative to the root session). Click any item in the timeline to load its details on the right.
Data Tabs (right) — Four tabs organize the session data: I/O, Log View, Policies, and Metadata. The content updates based on the item you select in the timeline.

I/O

Displays the execution hierarchy as interactive cards. Each card shows the item name, type (session, trace, agent, or chat), duration, cost, and ID. Click any card to navigate deeper into that item.

Log View

Displays the structured telemetry data for the selected item. Toggle between Formatted view (human-readable key-value layout) and JSON view (raw telemetry structure as sent by the SDK). Key fields include:

Field Group	Fields
Identifiers	`session_id`, `type`, `status`
Performance	`latency_ms`, `total_cost_usd`
Tokens	`total_input_tokens`, `total_output_tokens`
Custom	`metadata` (expandable object containing custom keys)

Policies

Displays the evaluation results for all policies applied to the session. Each policy appears as a card showing:

Policy name and status badge (Pass, Fail, or Inconclusive).
Version, severity level, and category.
A Metrics Evaluated table with the metric name, threshold, and actual value.
Evaluation timestamp.

Metadata

Organizes all attributes associated with the selected item into collapsible sections:

Section	Contents
METADATA	Session ID, Type, Duration, and Total Cost
TOKEN USAGE	Input and output token counts
TIMING	Timing-related attributes
CONTEXT	Contextual information passed with the session

Traces and Spans

The Runs page lets you navigate through the session hierarchy to isolate exactly where an issue occurred. This drill-down workflow helps you trace a failure from the session level all the way to the specific span — such as a failed LLM call or a timed-out tool invocation — that caused the issue.

From the Sessions grid, click the View Traces link on any row to see all traces within that session.
From the Traces list, click View Spans on any trace to see the individual operations that executed within it.
Use the Back button at each level to navigate up to the previous level.

Filter Runs

The Runs page provides several ways to narrow down data so you can focus on the sessions that matter most. Filtering is available only in Paused mode, so pause the stream before applying filters. Click the Filters button to open the Filter runs panel:

Filter	Description
Time Range	Select a preset (Lifetime, Last 15 minutes, Last hour, Last 24 hours, Last 7 days, Last 30 days) or define a Custom range. A histogram above the grid shows session distribution across the period
Session Status	Filter by Success, Failure, or In Progress
Evaluation Status	Filter by Pass or Fail
Policy Name	Select a specific policy to filter sessions evaluated by that policy
Input Tokens	Enter min/max values to filter sessions by input token count
Avg Latency	Enter min/max values to filter sessions by average latency

Click Apply filters to apply your selections, or Reset to clear all active filters.

Natural Language Search

In Paused mode, use the Search telemetry bar to type a natural language query. The platform parses your input and maps it to structured filters automatically. Examples:

duration greater than 15 seconds
errors in the last hour
payment_failed

Save Quick Filters

When you find yourself applying the same combination of filters repeatedly, save them as a Quick Filter for one-click reuse.

Apply your desired filters.
Click Save as Quick Filter at the bottom of the filter panel.
Enter a name (for example, “Last 24h Failures”) and save.

The saved filter appears as a pill below the search bar. Click any quick filter pill to instantly apply all its associated settings.

Create Datasets

Datasets let you group sessions for focused analysis, evaluation, or regression testing. You can create them directly from the Runs page without leaving your workflow.

Type	Description
Static	Fixed collection built by manually selecting sessions. You can add more sessions over time
Auto-update	Defined by filter criteria. The platform automatically adds all matching sessions, including new ones that arrive later
Static-Simulated	Created automatically when you trigger a simulation, saving those sessions as a dataset

Create a Static Dataset

Pause the stream.
Select one or more sessions using the row checkboxes.
Click the Save dropdown in the header bar and select Save Selection to Dataset.
Choose an existing dataset or create a new one by providing a name and description.

To add more sessions later, repeat the same steps and select the same dataset as the destination.

Create an Auto-update Dataset

Apply the filters that define the sessions you want to track (for example, Status = Error, Time Range = Last 7 Days).
Click the Save dropdown and select Save current filters as Auto-update Dataset.
Review the active filters in the confirmation modal, then provide a name and description.

The platform continuously adds new sessions matching the criteria. This is useful for tracking trends — for example, monitoring whether error rates decrease after deploying a fix.

You cannot manually add or remove sessions from an auto-update dataset. The filter criteria control its contents entirely.

Access all your datasets by navigating to Evaluations → Datasets in the left sidebar.

Common Use Cases

Use Case	Workflow
Spot errors in real time	Keep the Runs page in Streaming mode. Watch for sessions with failure status indicators in the Policies column. When you notice a spike, pause the stream and drill down into affected sessions to inspect traces, spans, and error details
Debug a slow agent workflow	Pause the stream and filter for high-latency sessions. Open a session’s Detail Panel and switch to Waterfall view to identify which spans consume the most time. Check the Token Usage section on the Metadata tab to review token consumption
Validate a new agent version	After deploying a new version to staging, keep the Runs page in Streaming mode. Pause after collecting sufficient data. Filter by error status and review the failure rate. Save the filtered results as a dynamic dataset to track improvements over time
Build a regression test dataset	Pause the stream. Apply filters to select representative sessions. Save the selection as a static dataset. Use this dataset to run evaluations whenever you update an agent version or policy
Monitor policy compliance	Create a dynamic dataset with filters for failed evaluation status. The platform automatically captures all non-compliant sessions. Review the dataset periodically and drill into sessions to understand root causes using the Policies tab

Managing Agents

Administration

References

Overview

Key Capabilities

Sessions, Traces, and Spans

Streaming and Paused Mode

Streaming vs. Paused Mode

The Sessions Grid

Inspect Session Details

I/O

Log View

Policies

Metadata

Traces and Spans

Filter Runs

Natural Language Search

Save Quick Filters

Create Datasets

Create a Static Dataset

Create an Auto-update Dataset

Common Use Cases

Managing Agents

Administration

References

Documentation Index

​Overview

​Key Capabilities

​Sessions, Traces, and Spans

​Streaming and Paused Mode

​Streaming vs. Paused Mode

​The Sessions Grid

​Inspect Session Details

​I/O

​Log View

​Policies

​Metadata

​Traces and Spans

​Filter Runs

​Natural Language Search

​Save Quick Filters

​Create Datasets

​Create a Static Dataset

​Create an Auto-update Dataset

​Common Use Cases

Overview

Key Capabilities

Sessions, Traces, and Spans

Streaming and Paused Mode

Streaming vs. Paused Mode

The Sessions Grid

Inspect Session Details

I/O

Log View

Policies

Metadata

Traces and Spans

Filter Runs

Natural Language Search

Save Quick Filters

Create Datasets

Create a Static Dataset

Create an Auto-update Dataset

Common Use Cases