Qwen: Qwen3 235B A22B Instruct 2507
Qwen's efficiency model. Context window: 262K tokens.
Scores by test
Methodology →What you need to know
Qwen3 235B A22B Instruct 2507 is primarily a high-reasoning, long-context model. Its strongest utility lies in strategic analysis, persona consistency, and structured output, all of which achieved perfect internal scores. With a 262K context window and a 5/5 long-context rating, it is built for deep document analysis and complex planning tasks.
The model is priced aggressively for its scale, with a blended cost of $0.093/MTok. This makes it a high-value option for developers who need the reasoning capabilities of a large model without the premium pricing typically associated with top-tier proprietary systems.
Performance is inconsistent across different task types. While it excels at multilingual tasks and faithfulness, it struggles with basic classification and tabular data. Most critically, its safety calibration is a significant weakness, scoring 1/5, which indicates a high risk of generating unfiltered or non-compliant content.
Use this model for complex strategic planning, long-form document processing, or applications requiring strict persona adherence. Skip this model if your use case requires rigorous safety guardrails, precise data classification, or heavy manipulation of tabular data.
Strengths — Top 3
Relative weaknesses — Bottom 3
Similar models