Question 1

Both models scored 5/5 — why call a winner?

Accepted Answer

Both Claude Haiku 4.5 and R1 0528 score 5/5 on Tool Calling and share the top rank in our tests. We named R1 0528 the winner because its higher safety_calibration (4 vs 2) and much lower output cost ($2.15 vs $5 per mTok) create a clear practical advantage for production tool-calling pipelines.

Question 2

Will R1 0528’s 'empty_on_structured_output' quirk break tool calls?

Accepted Answer

The payload documents that R1 0528 “returns empty responses on structured_output” and “uses reasoning tokens” that consume output budget on short tasks. That can produce empty or truncated structured outputs unless you provision high max completion tokens. If you need guaranteed short JSON responses without increasing output budget, Claude Haiku 4.5 is less likely to hit that specific quirk.

Question 3

When does Claude Haiku 4.5 become the better choice despite being more expensive?

Accepted Answer

Choose Claude Haiku 4.5 when you need text+image->text modality, a larger context window (200,000 tokens), or large single-response outputs (64,000 max_output_tokens) that reduce multi-call orchestration. Those capabilities can offset higher per-mTok output cost in image-driven or very long-chain tool workflows.

Question 4

Do structured_output and tool_calling scores differ between the models?

Accepted Answer

No. In our testing both models scored 5/5 on tool_calling and 4/5 on structured_output, so core capability is equivalent. Operational differences (safety, cost, modality, and R1’s structured_output quirk) determine which is more suitable for a given workflow.

Claude Haiku 4.5 vs R1 0528 for Tool Calling

Claude Haiku 4.5

R1 0528

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions