Question 1

Both models have the same structured_output score. Why does Claude Haiku 4.5 win?

Accepted Answer

They both score 4/5 and share task rank 26 of 52 in our tests, but R1 0528 has a documented quirk in our payload: it can return empty responses on structured_output. That reliability gap makes Claude Haiku 4.5 the safer practical choice for schema-compliant outputs.

Question 2

Can R1 0528 be used for Structured Output at scale to save cost?

Accepted Answer

R1 0528 is materially cheaper (input 0.5¢/mTok, output 2.15¢/mTok vs Haiku’s 1¢/mTok input and 5¢/mTok output), but its quirks note 'empty_on_structured_output' and 'uses_reasoning_tokens' that consume output budget and require high min_max_completion_tokens. In our testing those behaviors produced empty or budget-draining runs, so cost savings can be negated by retries or longer completions unless you verify it in your specific prompt/workflow.

Question 3

Do either model support images in structured outputs (e.g., OCR -> JSON)?

Accepted Answer

Claude Haiku 4.5 lists modality 'text+image->text' in our data, so it can accept image inputs and emit text/structured outputs. R1 0528 is 'text->text' only per the payload. For image-to-JSON tasks, Haiku is the supported option in our dataset.

Question 4

Which model is safer for refusing or redacting disallowed fields in JSON output?

Accepted Answer

R1 0528 scores higher on safety_calibration in our tests (4/5) compared with Claude Haiku 4.5 (2/5). If safety-calibrated refusals or redactions are the priority and you can handle R1’s structured_output quirk, R1 is stronger on that dimension in our testing.

Question 5

How should developers mitigate R1 0528's empty structured_output behavior?

Accepted Answer

Our payload flags the behavior explicitly ('empty_on_structured_output', 'needs_high_max_completion_tokens', 'min_max_completion_tokens': 1000). Mitigations in our judgment (based on these signals) include increasing max completion tokens, testing prompts thoroughly, and adding post-run detection+retry logic, but these add cost and complexity compared with using Claude Haiku 4.5, which produced consistent structured outputs in our suite.

Claude Haiku 4.5 vs R1 0528 for Structured Output

Claude Haiku 4.5

R1 0528

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions