Outcomes

Outcomes record the result of a run, group, or call. They are the bridge between "the agent did something" and "it worked" — the foundation of success rate tracking.

What is an outcome?

An outcome is a named result attached to any entity in your execution tree. It answers the question: "What happened?"

LLM calls give you tokens, latency, and cost. But those don't tell you if the agent actually did its job. Outcomes close that gap. They let you define what success means for your agent and track it over time.

ID format: wm_oc_<ulid>

Why outcomes matter

Most LLM observability tools show you what the model said and how much it cost. That's useful, but it doesn't answer the question that matters: is your agent actually working?

Outcomes let you track:

-Success rate — what percentage of runs completed successfully?
-Failure patterns — which outcome names appear most? What metadata do they carry?
-Phase-level quality — which step in your pipeline fails most often?
-Improvement over time — are retries helping? Is the new prompt better?

API

outcome(target: Run | Group | Response, name: string, opts?: object): Outcome

Records an outcome on a run, group, or call response. Returns a frozen outcome handle.

Parameters
target— The run, group, or LLM response to attach this outcome to.
name— A short label for the outcome. Use consistent names across your codebase — these are what get classified and aggregated.
opts— Optional metadata. Use for context: reasons, error details, scores, etc.
Returns

A frozen object with id and _type: 'outcome'. Pass this to act() to trigger a follow-up action.

Basic examples

Recording outcomes
import { run, outcome, flush } from '@warpmetrics/warp';

const r = run('Code review');
// ... groups and calls ...

// Simple outcome
outcome(r, 'Completed');

// Outcome with metadata
outcome(r, 'Completed', {
  linesReviewed: 342,
  issuesFound: 3,
  severity: 'medium',
});

// Failure outcome
outcome(r, 'Failed', {
  reason: 'Rate limit exceeded',
  retryable: true,
});

await flush();

Outcome targets

You can attach outcomes to any level of the execution tree:

Different outcome targets
// On a run — "the whole task succeeded"
outcome(r, 'Completed');

// On a group — "this phase succeeded"
const validation = group(r, 'Validation');
outcome(validation, 'Passed');

// On a call — "this specific LLM output was good"
const res = await openai.chat.completions.create({...});
call(g, res);
outcome(res, 'Accurate');

Multiple outcomes can be attached to the same target. This is useful for tracking multiple quality dimensions:

Multiple outcomes on one target
outcome(res, 'Accurate');
outcome(res, 'Well Formatted');
outcome(res, 'No Hallucinations');

Classifications

Outcome names are free-form strings. To compute success rates, Warpmetrics maps outcome names to one of three classifications:

Success
completed, approved, passed, resolved, shipped
Failure
failed, error, rejected, timeout, invalid
Neutral
skipped, deferred, partial, unknown

Classifications are configured in the Warpmetrics dashboard under Outcomes. When the system sees an outcome name it hasn't classified yet, it appears as unclassified until you assign it.

Only the last outcome on a run determines its success/failure status. If a run has both a "failed" and then a "completed" outcome, the run counts as a success.

Naming conventions

Use consistent, human-readable names with Title Case and spaces. The outcome name is what gets classified and aggregated, so consistency matters.

Naming conventions
// Good: consistent, descriptive names
outcome(r, 'Completed');
outcome(r, 'Failed');
outcome(r, 'Rate Limited');
outcome(r, 'Validation Error');

// Bad: inconsistent casing or vague names
outcome(r, 'COMPLETED');      // use Title Case, not all caps
outcome(r, 'error_123');      // too specific — use opts for details
outcome(r, 'ok');             // too vague
outcome(r, 'rate-limited');   // use spaces, not dashes

Put specific details in the opts bag, not the name. This keeps your classifications clean.

// Good: generic name + specific opts
outcome(r, 'Failed', { reason: 'Rate limit on gpt-4o', code: 429 });

// Bad: encoding details in the name
outcome(r, 'rate-limit-gpt-4o-429');

Tips

-Always record an outcome on your runs. Without outcomes, you can track cost and latency but not whether your agent is working.
-Use outcomes on groups to pinpoint which phase of your pipeline fails most.
-Outcome opts are searchable in the dashboard. Use them for debugging context.
-The outcome handle returned by outcome() can be passed to act() to create follow-up actions.