mistral
Codestral 2508
Codestral 2508 is Mistral's code-specialized model, designed for low-latency, high-frequency coding tasks including fill-in-the-middle, code correction, and test generation. At $0.30 input / $0.90 output per million tokens, it is priced below most general-purpose models while targeting a focused use case: code-related workflows where faithfulness to source, reliable tool invocation, and structured output matter more than creative reasoning or strategic analysis. In our testing, it ranked 43rd out of 52 models overall — but that aggregate rank masks strong performance on the benchmarks most relevant to coding assistants.
Performance
Codestral 2508's three strongest benchmarks in our testing are tool calling (5/5, tied for 1st with 16 other models out of 54 tested), faithfulness (5/5, tied for 1st with 32 other models out of 55 tested), and structured output (5/5, tied for 1st with 24 other models out of 54 tested). Long context also scored 5/5 (tied for 1st with 36 other models out of 55 tested). These are the four dimensions most directly relevant to agentic coding — accurate function calls, reliable source adherence, schema-compliant output, and retrieval in large codebases. The model's notable weaknesses are strategic analysis (2/5, rank 44 of 54) and creative problem solving (2/5, rank 47 of 54), both of which fall well below the field median. Safety calibration scored 1/5 (rank 32 of 55). Overall rank: 43 out of 52 tested models.
Pricing
Codestral 2508 costs $0.30 per million input tokens and $0.90 per million output tokens. At 1 million output tokens/month, that is $0.90; at 10 million output tokens, $9.00. It undercuts GPT-4o ($10/MTok output), GPT-4.1 ($8/MTok output), and even Mistral Large 3 2512 ($1.50/MTok output) while outscoring all three on faithfulness and tool calling in our tests. For code-focused applications — where you run many short completions with tool calls — the economics are favorable. The 256,000-token context window accommodates large codebases without tiered pricing penalties.
mistral
Codestral 2508
Benchmark Scores
External Benchmarks
Pricing
Input
$0.300/MTok
Output
$0.900/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="mistralai/codestral-2508",
messages=[
{"role": "user", "content": "Hello, Codestral 2508!"}
],
)
print(response.choices[0].message.content)Recommendation
Codestral 2508 is the right choice for developers building code-specific pipelines where tool calling accuracy, faithfulness to source, and structured output are the primary requirements. At $0.90/MTok output, it delivers 5/5 on all three of those dimensions and supports fill-in-the-middle use cases. It is not a good fit for general-purpose assistants, strategic analysis tasks, or creative work — scoring 2/5 on both strategic analysis and creative problem solving. Developers who need a single model to cover both code and reasoning should evaluate models with more balanced scores, such as Mistral Medium 3.1 (avg 4.25, $2.00/MTok output).
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.