Running Evals
The run() method is the core of the SDK — it triggers an evaluation and returns the results.
run() never throws on transient network failures. After 3 automatic retries on network, 5xx, or 429 errors, the SDK returns a neutral { status: "skipped", passed: true } result so your application keeps running. You only need try/catch for AuthError (invalid key), PaymentError (usage limit reached), and ValidationError (bad request). See Error Handling for the full matrix.
Method signature
async run(suite: string, options: RunOptions): Promise<RunResult>Parameters
| Parameter | Type | Description |
|---|---|---|
suite | string | The eval suite slug (e.g., "rag-faithfulness") |
options | RunOptions | Run configuration (see below) |
RunOptions
| Property | Type | Default | Description |
|---|---|---|---|
output | string | required | The AI-generated output to evaluate |
input | object | {} | Input data passed to scorers (e.g., context, query) |
trigger | string | "sdk" | Source identifier: "sdk", "cli", "github_action", "dashboard", "api" |
triggerMeta | object | {} | Additional metadata (e.g., git SHA, PR number) |
async | boolean | false | If true, returns immediately with a pending run ID |
RunResult
| Property | Type | Description |
|---|---|---|
id | string | Unique run identifier |
passed | boolean | true if status is "cleared" |
status | string | "cleared", "aborted", "skipped", "error", or "pending" |
scores | Score[] | All individual case scores |
failures | Score[] | Only the failed case scores |
passRate | number | 0.0–1.0 ratio of passed cases |
totalCases | number | Total active cases evaluated |
passedCases | number | Number of passing cases |
failedCases | number | Number of failing cases |
durationMs | number | Total evaluation time in milliseconds |
Score
| Property | Type | Description |
|---|---|---|
case | string | Case name |
passed | boolean | Whether the case passed its threshold |
score | number | 0.0–1.0 score from the scorer |
threshold | number | The case’s pass threshold |
reason | string | Human-readable explanation |
Examples
Basic evaluation
const result = await lg.run("content-safety", {
output: "The capital of France is Paris.",
});
console.log(result.status); // "cleared"
console.log(result.passRate); // 1.0With input context
const result = await lg.run("rag-faithfulness", {
input: {
context: "The Eiffel Tower was completed in 1889 for the World's Fair.",
query: "When was the Eiffel Tower built?",
},
output: "The Eiffel Tower was completed in 1889.",
});With trigger metadata
const result = await lg.run("my-suite", {
output: generatedText,
trigger: "sdk",
triggerMeta: {
commitSha: process.env.GIT_SHA,
branch: process.env.GIT_BRANCH,
model: "gpt-4o",
promptVersion: "v2.3",
},
});Async mode
const pending = await lg.run("large-suite", {
output: "...",
async: true,
});
console.log(pending.id); // "run_abc123"
console.log(pending.status); // "pending"
// Poll for results
const completed = await lg.getRun(pending.id);Additional methods
getRun(runId)
Fetch the status and details of a run.
const run = await lg.getRun("run_abc123");getRunResults(runId)
Fetch detailed per-case results for a run.
const results = await lg.getRunResults("run_abc123");health()
Check the LaunchGate API health status.
const status = await lg.health();
// { status: "healthy", version: "0.1.0", timestamp: "...", services: {...} }Last updated on