Which model scored higher on Structured Output in your tests?

Codestral 2508 scored 5/5 on our Structured Output test vs Claude Haiku 4.5's 4/5; Codestral is ranked 1 of 52 for this task while Claude Haiku 4.5 ranks 26.

How do costs compare for structured-output workloads?

In the payload_codes, Claude Haiku 4.5 input/output costs are $1 / $5 per mTok; Codestral 2508 input/output costs are $0.30 / $0.90 per mTok. Codestral is substantially cheaper per-token in our data.

Do both models support structured_outputs and response_format parameters?

Yes. Both Codestral 2508 and Claude Haiku 4.5 list structured_outputs and response_format among supported parameters in our dataset.

Which model should I use if my source is images (e.g., scanned forms)?

Choose Claude Haiku 4.5 for image→text structured extraction because its modality is listed as text+image→text in our data; Codestral 2508 is text→text.

Are there task areas where Claude Haiku 4.5 is stronger despite losing Structured Output?

Yes — in our tests Claude Haiku 4.5 scores higher on strategic_analysis (5 vs 2), persona_consistency (5 vs 3), creative_problem_solving (4 vs 2), and agentic_planning (5 vs 4). If those capabilities matter, weigh them against the 1-point Structured Output gap.

Claude Haiku 4.5 vs Codestral 2508 for Structured Output

Codestral 2508 is the winner for Structured Output. In our testing Codestral scores 5/5 on the Structured Output benchmark vs Claude Haiku 4.5's 4/5, and Codestral is ranked 1 of 52 for this task while Claude Haiku 4.5 ranks 26 of 52. That 1-point advantage reflects more reliable JSON schema compliance and format adherence in our suite. Codestral is also substantially cheaper per token (input/output costs of $0.30/$0.90 per mTok) and has a larger context window (256,000 tokens vs Claude Haiku 4.5's 200,000), making it better for high-volume, cost-sensitive structured-output pipelines. Claude Haiku 4.5 remains a strong alternative when multimodal (image→text) structured extraction or stronger strengths in strategic analysis, persona consistency, and agentic planning matter.

anthropic

Claude Haiku 4.5

Overall

4.33/5Strong

Benchmark Scores

Faithfulness

5/5

Long Context

5/5

Multilingual

5/5

Tool Calling

5/5

Classification

4/5

Agentic Planning

5/5

Structured Output

4/5

Safety Calibration

2/5

Strategic Analysis

5/5

Persona Consistency

5/5

Constrained Rewriting

3/5

Creative Problem Solving

4/5

External Benchmarks

SWE-bench Verified

N/A

MATH Level 5

N/A

AIME 2025

N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

mistral

Codestral 2508

Overall

3.50/5Strong

Benchmark Scores

Faithfulness

5/5

Long Context

5/5

Multilingual

4/5

Tool Calling

5/5

Classification

3/5

Agentic Planning

4/5

Structured Output

5/5

Safety Calibration

1/5

Strategic Analysis

2/5

Persona Consistency

3/5

Constrained Rewriting

3/5

Creative Problem Solving

2/5

External Benchmarks

SWE-bench Verified

N/A

MATH Level 5

N/A

AIME 2025

N/A

Pricing

Input

$0.300/MTok

Output

$0.900/MTok

Context Window256K

modelpicker.net

Task Analysis

Structured Output requires strict JSON schema compliance, predictable formatting, and deterministic adherence to response_format/structured_outputs parameters. Key capabilities: strong adherence to schema (format correctness), fine-grained control over tokens and max_tokens, reliable response_format/structured_outputs support, tool selection when outputs feed downstream systems, and sufficient context window for large schemas or reference data. In our testing the primary signal is the Structured Output test itself (Codestral 2508: 5, Claude Haiku 4.5: 4). Supporting proxies: both models score 5 on tool_calling and 5 on long_context in our tests (ties), indicating both can sequence function calls and handle large schemas. Differences emerge in modality and cost: Claude Haiku 4.5 supports text+image→text (useful when extracting structured data from images) and includes parameters like include_reasoning, while Codestral 2508 is text→text and is cheaper per mTok. Use these internal scores and parameter support to explain why Codestral achieved the top Structured Output score in our suite.

Practical Examples

API payload generation for billing systems: Codestral 2508 (5 vs 4) — higher schema adherence and rank 1/52 in our tests means fewer rejection loops and cheaper per-token runs (input $0.30 / output $0.90 per mTok). 2) Large schema with many embedded references (30K+ context): both models tie on long_context (5), but Codestral's larger window (256k) reduces context chopping risk. 3) Image→text forms (structured extraction from receipts/photos): choose Claude Haiku 4.5 — it supports text+image→text modality, and still scores 4 on Structured Output, so it’s preferable when the source is visual. 4) Developer workflows requiring response_format and structured_outputs parameters: both support structured_outputs and response_format; Codestral's superior structured_output score (5 vs 4) reduces post-processing. 5) Cost-sensitive batch processing (thousands of requests): Codestral is materially cheaper — priceRatio in our data favors Codestral (Claude Haiku 4.5 input/output $1/$5 per mTok vs Codestral $0.30/$0.90).

Bottom Line

For Structured Output, choose Codestral 2508 if you need the most reliable JSON/schema adherence, lowest per-token cost, and the top-ranked Structured Output model in our tests. Choose Claude Haiku 4.5 if your workflow requires multimodal (image→text) extraction or the model’s other strengths (strategic analysis, persona consistency, agentic planning) outweigh a small drop in schema adherence.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Claude Haiku 4.5 vs Codestral 2508 for Structured Output

Claude Haiku 4.5

Codestral 2508

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions

Which model scored higher on Structured Output in your tests?

How do costs compare for structured-output workloads?

Do both models support structured_outputs and response_format parameters?

Which model should I use if my source is images (e.g., scanned forms)?

Are there task areas where Claude Haiku 4.5 is stronger despite losing Structured Output?