EvalLens logo
Structured output evaluationOpen source on GitHub

Catch schema drift before production.

Evaluate LLM structured outputs, pinpoint failure reasons in seconds, and run the same workflow in hosted or fully private self-hosted mode.

Built for prompt engineers, eval teams, and AI product developers.

Regression evalsExtraction QAClassification auditsDocker deployable

How it works

01

Upload

Bring your CSV or JSONL with id, prompt, expected, and actual.

02

Evaluate

Score pass rate and classify schema, type, and value failures.

03

Inspect

Filter row-level failures and diagnose regressions quickly.

Deploy anywhere

HOSTED

Use instantly

Open the hosted app and start evaluating in seconds with no infrastructure setup.

SELF-HOSTED

Docker deployable in minutes

Run EvalLens in your own environment for private datasets and controlled provider keys.

docker run -p 3000:3000 -e EVALLENS_MODE=self-hosted evallens

Hosted vs self-hosted

Current mode: Hosted

HOSTED

Bring your completed outputs

  • Use when you already have model outputs.
  • Requires expected and actual in your file.
  • Fastest path for regression checks and release gates.

SELF-HOSTED

Generate, then evaluate in one run

  • Generates missing actual outputs before scoring.
  • Bring your own OpenAI, Anthropic, or Gemini key.
  • Deploy with Docker quickly for local or server environments.
  • Deterministic eval workflow for local, staging, or CI.

Your data stays in your environment.

Evaluate your outputs

Upload a CSV or JSONL file with id, prompt, expected, and actual columns.

Drop your file here, or browse

CSV, JSON, or JSONL