Skip to Content
ConceptsScorers

Scorers

A scorer is the evaluation function that determines whether an AI output meets your quality criteria. LaunchGate supports 5 scorer types.

Choosing a scorer

You want to check…Use
Output matches an expected string exactlyexact_match
Output contains (or excludes) specific substringscontains
Output matches a regex pattern (dates, IDs, free text)regex
Output is valid JSON with a required structure, field types, numeric ranges, or array lengthsjson_schema
Semantic judgement — faithfulness, relevance, safety, tonellm_judge

Prefer deterministic scorers (exact_match, contains, regex, json_schema) over llm_judge where possible — they are free, instant, and reproducible. Reach for llm_judge only when the check genuinely requires language understanding.

Do not use regex to validate JSON structure. Regex against stringified JSON breaks on key reordering, whitespace, and nested objects. Use json_schema instead — it is declarative, robust, and lets you express required fields, numeric ranges (minimum/maximum), and array lengths (minItems/maxItems) natively.

Scorer types

exact_match

Compares the output string to an expected value.

{ "type": "exact_match", "config": { "case_sensitive": true } }
ConfigTypeDefaultDescription
case_sensitivebooleantrueWhether comparison is case-sensitive

Score: 1.0 if match, 0.0 if not.


regex

Tests the output against a regular expression pattern.

{ "type": "regex", "config": { "pattern": "\\d{4}-\\d{2}-\\d{2}", "flags": "i", "should_match": true } }
ConfigTypeDefaultDescription
patternstringrequiredRegular expression pattern
flagsstring""Regex flags (e.g., "i" for case-insensitive)
should_matchbooleantrueSet to false to assert the pattern does NOT match

Score: 1.0 if pattern matches (or doesn’t match when should_match: false), 0.0 otherwise.


json_schema

Validates that the output is valid JSON conforming to a JSON Schema.

{ "type": "json_schema", "config": { "schema": { "type": "object", "required": ["answer", "confidence"], "properties": { "answer": { "type": "string" }, "confidence": { "type": "number", "minimum": 0, "maximum": 1 } } } } }
ConfigTypeDescription
schemaobjectA valid JSON Schema definition

Score: 1.0 if valid, 0.0 if invalid. The reason includes validation errors.


contains

Checks whether the output contains (or doesn’t contain) specific substrings.

{ "type": "contains", "config": { "values": ["source:", "reference"], "mode": "any" } }
ConfigTypeDefaultDescription
valuesstring[]requiredSubstrings to check for
modestring"all""all" — must contain every value; "any" — at least one; "none" — must contain none

Score: 1.0 if condition met, 0.0 otherwise.


llm_judge

Uses an external LLM to evaluate the output against a rubric you define.

{ "type": "llm_judge", "config": { "rubric": "Rate how faithfully the answer reflects the provided context. Score 1.0 if fully faithful, 0.0 if it contains hallucinated information.", "model": "gpt-4o", "_provider": "openai" } }
ConfigTypeDefaultDescription
rubricstringrequiredEvaluation criteria for the LLM
modelstringvariesModel to use (e.g., gpt-4o, claude-sonnet-4-20250514)
scale[number, number][0, 1]Score range
_providerstringautoProvider: openai, anthropic, google, azure_openai

Score: A value between 0 and 1 as determined by the LLM judge.

LLM judge scorers require a BYOK key for the corresponding provider. Without one, the case will fail with a configuration error.

Cost and latency multiplier. Each llm_judge case in a suite adds one LLM API call per run, billed against your BYOK key. A suite with four judge cases run 100 times per day produces 400 extra LLM calls per day on top of your own application traffic. Budget accordingly and prefer deterministic scorers where the check allows.

Dual-sided scorers

Scorers can be configured as dual-sided, meaning they evaluate both precision and recall independently:

{ "dual_sided": true, "precision_threshold": 0.8, "recall_threshold": 0.7 }

This is useful for cases where you want to measure both the accuracy and completeness of an output separately.

Scorer scope

Scorers are created at the project level and can be reused across multiple eval cases within that project.

Last updated on