Structured output evaluationOpen source on GitHub

Catch schema drift before production.

Evaluate LLM structured outputs, pinpoint failure reasons in seconds, and run the same workflow in hosted or fully private self-hosted mode.

Built for prompt engineers, eval teams, and AI product developers.

Regression evalsExtraction QAClassification auditsDocker deployable

How it works

Bring your CSV or JSONL with id, prompt, expected, and actual.

Score pass rate and classify schema, type, and value failures.

Filter row-level failures and diagnose regressions quickly.

HOSTED

Open the hosted app and start evaluating in seconds with no infrastructure setup.

SELF-HOSTED

Run EvalLens in your own environment for private datasets and controlled provider keys.

docker run -p 3000:3000 -e EVALLENS_MODE=self-hosted evallens

Current mode: Hosted

HOSTED

SELF-HOSTED

Your data stays in your environment.

Upload a CSV or JSONL file with id, prompt, expected, and actual columns.

Drop your file here, or browse

CSV, JSON, or JSONL

Sample dataset