Outcomes
Outcomes record the result of a run, group, or call. They are the bridge between "the agent did something" and "it worked" — the foundation of success rate tracking.
What is an outcome?
An outcome is a named result attached to any entity in your execution tree. It answers the question: "What happened?"
LLM calls give you tokens, latency, and cost. But those don't tell you if the agent actually did its job. Outcomes close that gap. They let you define what success means for your agent and track it over time.
ID format: wm_oc_<ulid>
Why outcomes matter
Most LLM observability tools show you what the model said and how much it cost. That's useful, but it doesn't answer the question that matters: is your agent actually working?
Outcomes let you track:
API
outcome(target: Run | Group | Response, name: string, opts?: object): OutcomeRecords an outcome on a run, group, or call response. Returns a frozen outcome handle.
target— The run, group, or LLM response to attach this outcome to.name— A short label for the outcome. Use consistent names across your codebase — these are what get classified and aggregated.opts— Optional metadata. Use for context: reasons, error details, scores, etc.A frozen object with id and _type: 'outcome'. Pass this to act() to trigger a follow-up action.
Basic examples
import { run, outcome, flush } from '@warpmetrics/warp';
const r = run('Code review');
// ... groups and calls ...
// Simple outcome
outcome(r, 'Completed');
// Outcome with metadata
outcome(r, 'Completed', {
linesReviewed: 342,
issuesFound: 3,
severity: 'medium',
});
// Failure outcome
outcome(r, 'Failed', {
reason: 'Rate limit exceeded',
retryable: true,
});
await flush();Outcome targets
You can attach outcomes to any level of the execution tree:
// On a run — "the whole task succeeded"
outcome(r, 'Completed');
// On a group — "this phase succeeded"
const validation = group(r, 'Validation');
outcome(validation, 'Passed');
// On a call — "this specific LLM output was good"
const res = await openai.chat.completions.create({...});
call(g, res);
outcome(res, 'Accurate');Multiple outcomes can be attached to the same target. This is useful for tracking multiple quality dimensions:
outcome(res, 'Accurate');
outcome(res, 'Well Formatted');
outcome(res, 'No Hallucinations');Classifications
Outcome names are free-form strings. To compute success rates, Warpmetrics maps outcome names to one of three classifications:
Classifications are configured in the Warpmetrics dashboard under Outcomes. When the system sees an outcome name it hasn't classified yet, it appears as unclassified until you assign it.
Only the last outcome on a run determines its success/failure status. If a run has both a "failed" and then a "completed" outcome, the run counts as a success.
Naming conventions
Use consistent, human-readable names with Title Case and spaces. The outcome name is what gets classified and aggregated, so consistency matters.
// Good: consistent, descriptive names
outcome(r, 'Completed');
outcome(r, 'Failed');
outcome(r, 'Rate Limited');
outcome(r, 'Validation Error');
// Bad: inconsistent casing or vague names
outcome(r, 'COMPLETED'); // use Title Case, not all caps
outcome(r, 'error_123'); // too specific — use opts for details
outcome(r, 'ok'); // too vague
outcome(r, 'rate-limited'); // use spaces, not dashesPut specific details in the opts bag, not the name. This keeps your classifications clean.
// Good: generic name + specific opts
outcome(r, 'Failed', { reason: 'Rate limit on gpt-4o', code: 429 });
// Bad: encoding details in the name
outcome(r, 'rate-limit-gpt-4o-429');