Runs

A run is the top-level unit of work. It represents a single, complete execution of your AI agent — from start to finish.

What is a run?

Every time your agent processes a request, it performs a run. A run is the container for everything that happens during that execution: the LLM calls your agent makes, the groups it organizes them into, and the outcomes it produces.

Think of a run as one row in a log — "the agent did this task, made these calls, and produced this result." Runs are how you track what your agent is doing over time.

ID format: wm_run_<ulid>

When to create a run

Create a run at the start of each distinct task your agent handles. Some examples:

Support agent

One run per customer conversation

Code reviewer

One run per pull request

Content generator

One run per article or document

Data pipeline

One run per extraction job

RAG system

One run per user query

Multi-step agent

One run per task attempt

Labels

Every run has a label that categorizes it. Runs with the same label are grouped together in the dashboard, so you can compare performance across executions of the same agent or workflow.

Labels should describe the type of work, not the specific instance. Use the opts bag for instance-specific metadata.

Label conventions

// Good: label describes the workflow type
run('Code review');
run('Support ticket');
run('Content generation');

// Bad: label contains instance-specific data
run('Code review PR #42');       // use opts instead
run('Support ticket for John');  // use opts instead

API

run(label: string, opts?: object): Run

Creates a new run with the given label. Returns a frozen run handle.

Parameters

label— Category name for this run. Runs with the same label are grouped in dashboards.

opts— Optional metadata object. Stored with the run and visible in the dashboard. Use for instance-specific data like PR numbers, user IDs, etc.

Returns

A frozen object with id (string) and _type ('run'). Pass this to group(), call(), or outcome() to build the execution tree.

Basic example

Basic run

import OpenAI from 'openai';
import { warp, run, call, outcome, flush } from '@warpmetrics/warp';

const openai = warp(new OpenAI());

// Start a run
const r = run('Summarizer');

// Make an LLM call
const res = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Summarize this article...' }],
});

// Link the call to the run
call(r, res);

// Record the outcome
outcome(r, 'Completed');

await flush();

Passing metadata with opts

The opts bag lets you attach arbitrary metadata to a run. This is useful for filtering, debugging, and correlating runs with your own systems.

Run with metadata

const r = run('Code review', {
  name: 'Review PR #42',
  pr: 42,
  repo: 'acme/api',
  author: 'alice',
  link: 'https://github.com/acme/api/pull/42',
});

Follow-up runs

When an agent retries or iterates on a task, you can chain runs together using the act primitive. A follow-up run is linked to the act that triggered it, creating a traceable improvement chain.

Chaining runs through acts

import { run, outcome, act, flush } from '@warpmetrics/warp';

// First attempt
const r1 = run('Code review');
// ... calls ...
const oc = outcome(r1, 'Failed', { reason: 'Tests failing' });

// Decide to retry
const a = act(oc, 'retry');

// Second attempt — linked to the act
const r2 = run(a, 'Code review');
// ... more calls ...
outcome(r2, 'Completed');

await flush();

When passing an act as the first argument, the second argument becomes the label and the third becomes opts:

run(actRef, 'Code review', { attempt: 2 });

What gets tracked

When you create a run, Warpmetrics tracks:

LabelCategory name for grouping runs

OptsCustom metadata you provide

GroupsAll groups created under this run

CallsAll LLM calls linked directly to this run

OutcomesResults recorded on this run

TimestampWhen the run was created

CostTotal cost across all calls (computed)

LatencyTotal latency across all calls (computed)

TokensTotal token usage across all calls (computed)

Tips

-Use consistent labels across your codebase. The label is how Warpmetrics groups runs for comparison.

-Put instance-specific data in opts, not in the label. This keeps your dashboards clean.

-You can attach calls directly to a run without groups. Groups are optional — use them when your agent has distinct phases.

-Runs are created instantly. The SDK queues events and flushes them in batches, so there's no latency impact on your agent.

Groups

Organize calls into logical phases