Question 1

Is Devstral Small 1.1 better than GPT-5.2?

Accepted Answer

On our 12-test suite GPT-5.2 wins 9 categories; Devstral wins none and ties 3. GPT-5.2 outperforms Devstral on agentic planning (5 vs 2), safety (5 vs 2), long-context (5 vs 4), faithfulness (5 vs 4), and creative problem solving (5 vs 2). Devstral’s advantage is price and a 131,072 context window versus GPT-5.2’s 400,000.

Question 2

Which model is cheaper?

Accepted Answer

Devstral Small 1.1 costs $0.10 input + $0.30 output = $0.40 per mTok. GPT-5.2 costs $1.75 input + $14.00 output = $15.75 per mTok. Devstral is ~2.14% of GPT-5.2’s per-mTok cost (priceRatio ~0.0214).

Question 3

Which model is better for coding and hard math?

Accepted Answer

GPT-5.2 has external scores on SWE-bench Verified (73.8%, Epoch AI) ranking 5 of 12 and AIME 2025 (96.1%, Epoch AI) ranking 1 of 23, indicating stronger coding and competition-math performance in third-party benchmarks. Devstral has no SWE/AIME external scores in the payload.

Question 4

How do they compare on long-context and agentic workflows?

Accepted Answer

GPT-5.2 scores 5 on long context and agentic planning (tied for 1st in our rankings), while Devstral scores 4 in long context and 2 in agentic planning (Devstral ranks 38/55 for long context and 53/54 for agentic planning). Expect GPT-5.2 to handle 30K+ retrieval accuracy and goal decomposition more reliably.

Question 5

Are there tests where they tie?

Accepted Answer

Yes — structured output, tool calling, and classification are ties (both score 4). In our testing both match on JSON/schema compliance, function selection/argument accuracy, and categorization tasks.

Question 6

Who should choose Devstral despite lower benchmark wins?

Accepted Answer

Teams with very high token volumes, tight budgets, or those building text-only engineering assistants where absolute top-tier safety/agentic capabilities are not required should choose Devstral Small 1.1 for the $0.40/mTok price point and 131,072-token context window.

Devstral Small 1.1 vs GPT-5.2

Devstral Small 1.1

GPT-5.2

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions