Skip to main content

Documentation Index

Fetch the complete documentation index at: https://koreai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

The Runs page is a central observability hub within your project. It displays all telemetry data captured from your connected AI agents, so you can monitor real-time agent activity, inspect individual sessions and traces, debug failures, and curate datasets for evaluation. Navigation: Select a Project, then go to Observability → Runs in the left sidebar. Runs page

Overview

SectionPurpose
Key CapabilitiesWhat you can do on the Runs page
Sessions, Traces, and SpansThe three-level telemetry hierarchy
Streaming and Paused ModeHow live data and frozen views differ
The Sessions GridThe main data grid and its columns
Inspect Session DetailsDrill into I/O, logs, policies, and metadata
Traces and SpansNavigate the session hierarchy
Filter RunsNarrow down sessions by time, status, and metadata
Create DatasetsGroup sessions for evaluation and regression testing
Common Use CasesReal-world workflows on the Runs page

Key Capabilities

  • Stream incoming telemetry in real time with automatic updates.
  • Drill down from Sessions → Traces → Spans for granular inspection.
  • Filter data by time range, status, evaluation results, metadata, and natural language queries.
  • Create static or dynamic datasets directly from sessions for evaluation and regression testing.
  • Visualize trace timelines in a List or Waterfall view with color-coded span types.

Sessions, Traces, and Spans

The Runs page organizes telemetry data into a three-level hierarchy following the OpenTelemetry standard.
ConceptDefinitionExample
SessionA collection of related traces representing a complete user interaction or conversationA multi-turn customer support conversation
TraceA single agent workflow from input to output within a session. Each trace represents one request-response cycleA user asks, “What is my account balance?” and the agent responds
SpanAn individual operation within a trace. Spans nest hierarchically to form a tree structureAn LLM API call, tool invocation, or database query within a single trace
A session contains one or more traces. Each trace contains one or more spans. You can drill down through each level to inspect execution details, timing, inputs, outputs, and errors.

Streaming and Paused Mode

The Runs page defaults to Streaming mode, where sessions update automatically. The streaming indicator in the top-right corner displays the current state and how recently the data was refreshed (for example, “Streaming · Updated 6s ago”). Click the Streaming indicator to pause the data feed. While paused:
  • The data grid freezes at the current point in time.
  • A counter shows how many new sessions have arrived since you paused (for example, +1, +2, +3).
  • Advanced filtering, bulk selection, and actions such as Save become available.
Click the indicator again to resume streaming. Any sessions that arrived while paused load into the view.
You can apply filters and perform bulk actions only while the stream is paused.

Streaming vs. Paused Mode

FeatureStreaming ModePaused Mode
Data updatesAuto-refresh (real-time)Static (frozen at the point of pause)
Search barDisabledEnabled
Time range pickerDisabledEnabled (defaults to Last 30 days)
Row checkboxesHiddenVisible
Save buttonDisabledEnabled (when rows are selected)
Policies tab in Detail PanelEnabledEnabled

The Sessions Grid

The main area of the Runs page displays a data grid listing all sessions within the project. Click Columns in the top-right corner of the grid to open the Toggle columns panel. Select or clear checkboxes to show or hide columns. Available columns include: ID, Start Time, Last Updated Time, Duration, View Traces, Policies, Cost, Input Tokens, Output Tokens, Avg Latency, and PII.

Inspect Session Details

Click any session row in the data grid to open a detail panel with a complete breakdown of everything that happened during that session. The panel header displays the Session ID (with a copy icon), Latency, and Total Cost. The detail panel has two areas:
  • Timeline Visualization (left) — Displays all traces, agents, and spans that executed during the session. Toggle between List view (a vertical list with durations) and Waterfall view (a Gantt chart showing start times and durations relative to the root session). Click any item in the timeline to load its details on the right.
  • Data Tabs (right) — Four tabs organize the session data: I/O, Log View, Policies, and Metadata. The content updates based on the item you select in the timeline.
Session detail panel

I/O

Displays the execution hierarchy as interactive cards. Each card shows the item name, type (session, trace, agent, or chat), duration, cost, and ID. Click any card to navigate deeper into that item.

Log View

Displays the structured telemetry data for the selected item. Toggle between Formatted view (human-readable key-value layout) and JSON view (raw telemetry structure as sent by the SDK). Key fields include:
Field GroupFields
Identifierssession_id, type, status
Performancelatency_ms, total_cost_usd
Tokenstotal_input_tokens, total_output_tokens
Custommetadata (expandable object containing custom keys)

Policies

Displays the evaluation results for all policies applied to the session. Each policy appears as a card showing:
  • Policy name and status badge (Pass, Fail, or Inconclusive).
  • Version, severity level, and category.
  • A Metrics Evaluated table with the metric name, threshold, and actual value.
  • Evaluation timestamp.

Metadata

Organizes all attributes associated with the selected item into collapsible sections:
SectionContents
METADATASession ID, Type, Duration, and Total Cost
TOKEN USAGEInput and output token counts
TIMINGTiming-related attributes
CONTEXTContextual information passed with the session

Traces and Spans

The Runs page lets you navigate through the session hierarchy to isolate exactly where an issue occurred. This drill-down workflow helps you trace a failure from the session level all the way to the specific span — such as a failed LLM call or a timed-out tool invocation — that caused the issue.
  1. From the Sessions grid, click the View Traces link on any row to see all traces within that session.
  2. From the Traces list, click View Spans on any trace to see the individual operations that executed within it.
  3. Use the Back button at each level to navigate up to the previous level.

Filter Runs

The Runs page provides several ways to narrow down data so you can focus on the sessions that matter most. Filtering is available only in Paused mode, so pause the stream before applying filters. Click the Filters button to open the Filter runs panel:
FilterDescription
Time RangeSelect a preset (Lifetime, Last 15 minutes, Last hour, Last 24 hours, Last 7 days, Last 30 days) or define a Custom range. A histogram above the grid shows session distribution across the period
Session StatusFilter by Success, Failure, or In Progress
Evaluation StatusFilter by Pass or Fail
Policy NameSelect a specific policy to filter sessions evaluated by that policy
Input TokensEnter min/max values to filter sessions by input token count
Avg LatencyEnter min/max values to filter sessions by average latency
Click Apply filters to apply your selections, or Reset to clear all active filters. In Paused mode, use the Search telemetry bar to type a natural language query. The platform parses your input and maps it to structured filters automatically. Examples:
  • duration greater than 15 seconds
  • errors in the last hour
  • payment_failed

Save Quick Filters

When you find yourself applying the same combination of filters repeatedly, save them as a Quick Filter for one-click reuse.
  1. Apply your desired filters.
  2. Click Save as Quick Filter at the bottom of the filter panel.
  3. Enter a name (for example, “Last 24h Failures”) and save.
The saved filter appears as a pill below the search bar. Click any quick filter pill to instantly apply all its associated settings.

Create Datasets

Datasets let you group sessions for focused analysis, evaluation, or regression testing. You can create them directly from the Runs page without leaving your workflow.
TypeDescription
StaticFixed collection built by manually selecting sessions. You can add more sessions over time
Auto-updateDefined by filter criteria. The platform automatically adds all matching sessions, including new ones that arrive later
Static-SimulatedCreated automatically when you trigger a simulation, saving those sessions as a dataset
Datasets

Create a Static Dataset

  1. Pause the stream.
  2. Select one or more sessions using the row checkboxes.
  3. Click the Save dropdown in the header bar and select Save Selection to Dataset.
  4. Choose an existing dataset or create a new one by providing a name and description.
To add more sessions later, repeat the same steps and select the same dataset as the destination.

Create an Auto-update Dataset

  1. Apply the filters that define the sessions you want to track (for example, Status = Error, Time Range = Last 7 Days).
  2. Click the Save dropdown and select Save current filters as Auto-update Dataset.
  3. Review the active filters in the confirmation modal, then provide a name and description.
The platform continuously adds new sessions matching the criteria. This is useful for tracking trends — for example, monitoring whether error rates decrease after deploying a fix.
You cannot manually add or remove sessions from an auto-update dataset. The filter criteria control its contents entirely.
Access all your datasets by navigating to Evaluations → Datasets in the left sidebar.

Common Use Cases

Use CaseWorkflow
Spot errors in real timeKeep the Runs page in Streaming mode. Watch for sessions with failure status indicators in the Policies column. When you notice a spike, pause the stream and drill down into affected sessions to inspect traces, spans, and error details
Debug a slow agent workflowPause the stream and filter for high-latency sessions. Open a session’s Detail Panel and switch to Waterfall view to identify which spans consume the most time. Check the Token Usage section on the Metadata tab to review token consumption
Validate a new agent versionAfter deploying a new version to staging, keep the Runs page in Streaming mode. Pause after collecting sufficient data. Filter by error status and review the failure rate. Save the filtered results as a dynamic dataset to track improvements over time
Build a regression test datasetPause the stream. Apply filters to select representative sessions. Save the selection as a static dataset. Use this dataset to run evaluations whenever you update an agent version or policy
Monitor policy complianceCreate a dynamic dataset with filters for failed evaluation status. The platform automatically captures all non-compliant sessions. Review the dataset periodically and drill into sessions to understand root causes using the Policies tab