Question 1

Which model scored higher on the Constrained Rewriting test?

Accepted Answer

Neither — in our testing both Claude Haiku 4.5 and DeepSeek V3.1 scored 3/5 on the constrained_rewriting task and share rank 31 of 52.

Question 2

Why do you call DeepSeek V3.1 the winner if the task scores are identical?

Accepted Answer

We used supporting metrics to break the tie. DeepSeek V3.1 has structured_output 5 vs Claude Haiku 4, and its output cost is $0.75 per mTok vs Haiku’s $5.00 per mTok — a practical advantage for strict format adherence and high-volume workflows.

Question 3

When should I use Claude Haiku 4.5 instead?

Accepted Answer

Choose Claude Haiku 4.5 when you must compress while keeping very long source context (200,000-token window) or when you rely on tool-based enforcement of exact limits (tool_calling 5). Be aware of higher output cost ($5.00 per mTok).

Question 4

Are there any external benchmarks deciding this task?

Accepted Answer

No. The payload includes no external benchmark for Constrained Rewriting, so our verdict is based on our internal constrained_rewriting score (3/5 for both) plus supporting proxy metrics and cost.

Question 5

How big is the cost difference in practice?

Accepted Answer

Per the payload, DeepSeek V3.1 output cost is $0.75 per mTok while Claude Haiku 4.5 output cost is $5.00 per mTok — Haiku is ~6.67× more expensive for output tokens, which matters for large-scale rewriting batches.

Claude Haiku 4.5 vs DeepSeek V3.1 for Constrained Rewriting

Claude Haiku 4.5

DeepSeek V3.1

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions