Claude Sonnet 4.6 vs Grok 4 for Creative Writing
Winner: Claude Sonnet 4.6. In our Creative Writing suite Sonnet 4.6 scores 4.33 vs Grok 4's 4.00 — a 0.33-point advantage. Sonnet earns 5/5 on creative_problem_solving, 5/5 on safety_calibration, and 5/5 on persona_consistency in our testing, which translates to stronger ideation, safer handling of sensitive prompts, and more reliable voice/character maintenance. Grok 4 is competitive for constrained_rewriting (4 vs Sonnet's 3) and matches Sonnet on long-context and several format-oriented metrics, but overall Sonnet's higher creative problem-solving and safety scores make it the better pick for most fiction and storytelling workflows.
anthropic
Claude Sonnet 4.6
Benchmark Scores
External Benchmarks
Pricing
Input
$3.00/MTok
Output
$15.00/MTok
modelpicker.net
xai
Grok 4
Benchmark Scores
External Benchmarks
Pricing
Input
$3.00/MTok
Output
$15.00/MTok
modelpicker.net
Task Analysis
What Creative Writing demands: ideation of non-obvious plots and scenes, consistent character voice, safe handling of sensitive themes, and sometimes strict-length rewrites (microfiction/ad copy). Our Creative Writing task is driven by three benchmarks: creative_problem_solving (idea quality), persona_consistency (voice maintenance), and constrained_rewriting (compression within hard limits). No external benchmark is provided for this task, so our 3-test suite is the primary signal. In our testing Sonnet 4.6 leads on creative_problem_solving (5 vs 3) and safety_calibration (5 vs 2), supporting superior brainstorming, risk-aware content filtering, and stable character work. Grok 4 scores higher on constrained_rewriting (4 vs 3), so it handles tight character limits and tight editorial compression more reliably. Both models score 5 on long_context, so either can handle large drafts, but Sonnet's ideation and safety strengths are the deciding factors in our verdict.
Practical Examples
Where Claude Sonnet 4.6 shines (based on our scores):
- Worldbuilding and plot ideation: Sonnet's 5/5 creative_problem_solving produces more non-obvious, feasible story directions when you need multiple distinct arcs.
- Maintaining complex character voice across long drafts: persona_consistency 5 and long_context 5 help Sonnet keep tone and backstory coherent over tens of thousands of tokens.
- Handling sensitive or boundary-pushing themes safely: safety_calibration 5 reduces unsafe outputs while permitting legitimate creative exploration. Where Grok 4 shines (based on our scores):
- Microfiction, ad copy, and strict-length edits: constrained_rewriting 4 vs Sonnet's 3 — Grok more reliably compresses and preserves intent under hard character caps.
- Format-focused editing and structured rewrites: Grok ties Sonnet on structured_output (4) and matches on long_context (5), so it's good when you need precise format adherence plus extended context. Concrete comparison point: Sonnet's creative_problem_solving 5 vs Grok's 3 means Sonnet is substantially better for ideation-heavy tasks; Grok's constrained_rewriting 4 vs Sonnet 3 means Grok is measurably better for tight compression tasks.
Bottom Line
For Creative Writing, choose Claude Sonnet 4.6 if you need superior ideation, robust persona consistency, and safer handling of sensitive themes (task score 4.33, rank 5/52). Choose Grok 4 if your priority is strict-length rewrites, tight editorial compression, or format-bound microcontent (task score 4.00, rank 28/52).
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.