Best LLM for Every Task
We test every model on 12 real-world task categories. Find the best model for what you actually do.
Structured Output
JSON schema compliance and format adherence
Strategic Analysis
Nuanced tradeoff reasoning with real numbers
Constrained Rewriting
Compression within hard character limits
Creative Problem Solving
Non-obvious, specific, feasible ideas
Tool Calling
Function selection, argument accuracy, sequencing
Faithfulness
Sticks to source material without hallucinating
Classification
Accurate categorization and routing
Long Context
Retrieval accuracy at 30K+ tokens
Safety Calibration
Refuses harmful requests, permits legitimate ones
Persona Consistency
Maintains character and resists injection
Agentic Planning
Goal decomposition and failure recovery
Multilingual
Equivalent quality output in non-English languages