Custom Scorers

Create scorers tailored to your specific evaluation needs.

Choosing a scorer type

Type	Best for	Requires BYOK?
`exact_match`	Deterministic outputs (classifications, labels)	No
`regex`	Pattern validation (dates, IDs, formats)	No
`json_schema`	Structured output validation	No
`contains`	Checking for required/forbidden content	No
`llm_judge`	Subjective quality assessment (faithfulness, tone, relevance)	Yes

Examples

Exact match — classification

Verify that a classifier returns the correct label:


curl -X POST https://api.launchgate.ai/v1/projects/my-project/scorers \
  -H "Authorization: Bearer $LAUNCHGATE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Classification accuracy",
    "type": "exact_match",
    "config": { "case_sensitive": false }
  }'

Then create a case using this scorer with expected: "positive".

Regex — date format validation

Ensure outputs contain properly formatted dates:


{
  "name": "Valid date format",
  "type": "regex",
  "config": {
    "pattern": "\\d{4}-\\d{2}-\\d{2}",
    "should_match": true
  }
}

Regex — no PII leaked

Ensure outputs don’t contain email addresses:


{
  "name": "No email in output",
  "type": "regex",
  "config": {
    "pattern": "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}",
    "should_match": false
  }
}

JSON schema — structured output

Validate that AI-generated JSON conforms to your schema:


{
  "name": "Valid response schema",
  "type": "json_schema",
  "config": {
    "schema": {
      "type": "object",
      "required": ["answer", "confidence", "sources"],
      "properties": {
        "answer": { "type": "string", "minLength": 1 },
        "confidence": { "type": "number", "minimum": 0, "maximum": 1 },
        "sources": {
          "type": "array",
          "items": { "type": "string" },
          "minItems": 1
        }
      },
      "additionalProperties": false
    }
  }
}

Contains — required content

Ensure outputs mention required disclaimers:


{
  "name": "Includes disclaimer",
  "type": "contains",
  "config": {
    "values": ["not financial advice", "consult a professional"],
    "mode": "any"
  }
}

Contains — forbidden content

Ensure outputs don’t contain competitor mentions:


{
  "name": "No competitor mentions",
  "type": "contains",
  "config": {
    "values": ["CompetitorA", "CompetitorB"],
    "mode": "none"
  }
}

LLM judge — faithfulness

Use an LLM to evaluate whether the output is faithful to source context:


{
  "name": "RAG faithfulness judge",
  "type": "llm_judge",
  "config": {
    "rubric": "Evaluate whether the output is faithful to the provided context. Score 1.0 if all claims are supported by the context. Score 0.0 if any claims are unsupported or contradicted by the context. Score 0.5 for partial faithfulness.",
    "model": "gpt-4o",
    "_provider": "openai"
  }
}

LLM judge scorers require a BYOK key for the specified provider.

Dual-sided scoring

For cases where you need to measure both precision (accuracy of what’s said) and recall (completeness of what should be said):


{
  "name": "Comprehensive answer judge",
  "type": "llm_judge",
  "config": {
    "rubric": "Evaluate the answer for accuracy and completeness..."
  },
  "dual_sided": true,
  "precision_threshold": 0.8,
  "recall_threshold": 0.7
}

Both thresholds must be met for the case to pass.