Claude Haiku 4.5 vs Codestral 2508 for Creative Problem Solving

Winner: Claude Haiku 4.5. In our testing on Creative Problem Solving, Claude Haiku 4.5 scores 4/5 vs Codestral 2508's 2/5 — a clear 2-point advantage and a much higher task rank (9 of 52 vs 46 of 52). Haiku’s lead is supported by stronger strategic_analysis (5 vs 2), agentic_planning (5 vs 4), persona_consistency (5 vs 3) and safety_calibration (2 vs 1). Codestral 2508 wins on structured_output (5 vs 4) and is far cheaper to run ($0.30 input / $0.90 output per mtoken vs Haiku’s $1.00 input / $5.00 output), but for non-obvious, specific, feasible ideas Haiku is the better AI choice in our benchmarks.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

mistral

Codestral 2508

Overall
3.50/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
4/5
Tool Calling
5/5
Classification
3/5
Agentic Planning
4/5
Structured Output
5/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
3/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.300/MTok

Output

$0.900/MTok

Context Window256K

modelpicker.net

Task Analysis

Creative Problem Solving demands non-obvious, specific, feasible ideas plus the ability to reason about tradeoffs, decompose goals, and present usable outputs. With no external benchmark provided, we rely on our internal creative_problem_solving test (Claude Haiku 4.5 = 4, Codestral 2508 = 2) as the primary signal. Supporting capability differences that explain the gap: strategic_analysis (Haiku 5 vs Codestral 2) shows Haiku gives more nuanced tradeoff reasoning; agentic_planning (5 vs 4) indicates better goal decomposition and recovery; tool_calling is tied (5 vs 5), so both can sequence tools accurately; structured_output favors Codestral (5 vs 4), so Codestral is stronger at strict schema-compliant responses. Persona_consistency (5 vs 3) and faithfulness (both 5) mean Haiku keeps a coherent voice while avoiding hallucination. Cost matters: Haiku output is $5.00 per mtoken vs Codestral $0.90, so scale and latency budgets can push teams toward Codestral despite lower creative scores.

Practical Examples

Where Claude Haiku 4.5 shines (use Haiku when):

  • New product concepts: Haiku’s creative_problem_solving 4 and strategic_analysis 5 produce non-obvious, feasible feature sets and tradeoff reasoning developers can act on.
  • Complex brainstorming that needs follow-up decomposition: agentic_planning 5 helps turn a high-level idea into stepwise experiments.
  • User-facing ideation where voice and consistency matter: persona_consistency 5 reduces jarring style shifts. Where Codestral 2508 shines (use Codestral when):
  • Schema-bound creative artifacts: structured_output 5 makes Codestral better at producing exact JSON/protocol-compliant ideas you’ll parse automatically.
  • Cost- and latency-sensitive iterations: Codestral runs at $0.30 input / $0.90 output per mtoken, far cheaper than Haiku’s $1.00 / $5.00 — useful for high-volume A/B ideation.
  • Quick code-adjacent creative solutions where long-context and tool_calling are needed (both have long_context 5 and tool_calling 5). Concrete grounded example: for a startup outlining ten novel monetization experiments with analysis and recovery paths, Haiku (4 vs 2) will produce more actionable, non-obvious options and a better failure-recovery plan. For producing 1,000 structured idea cards in strict JSON for automated ingestion, Codestral’s structured_output 5 and lower cost may be the pragmatic choice.

Bottom Line

For Creative Problem Solving, choose Claude Haiku 4.5 if you need higher-quality, non-obvious, feasible ideas with strong tradeoff reasoning and plan decomposition (Haiku: 4/5, rank 9 of 52). Choose Codestral 2508 if you need schema-exact outputs at much lower cost and throughput (Codestral: 2/5, structured_output 5; pricing $0.30 input / $0.90 output per mtoken).

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions