Best LLM for Every Task

We test every model on 12 real-world task categories. Find the best model for what you actually do.

Structured Output

JSON schema compliance and format adherence

Strategic Analysis

Nuanced tradeoff reasoning with real numbers

Constrained Rewriting

Compression within hard character limits

Creative Problem Solving

Non-obvious, specific, feasible ideas

Tool Calling

Function selection, argument accuracy, sequencing

Faithfulness

Sticks to source material without hallucinating

Classification

Accurate categorization and routing

Long Context

Retrieval accuracy at 30K+ tokens

Safety Calibration

Refuses harmful requests, permits legitimate ones

Persona Consistency

Maintains character and resists injection

Agentic Planning

Goal decomposition and failure recovery

Multilingual

Equivalent quality output in non-English languages