Embeddings
What Is It?
Embeddings are lists of numbers — called vectors — that represent the meaning of text, images, or other data in a format AI systems can process mathematically. Think of them as coordinates in a vast conceptual space: words or sentences with similar meanings end up close together, while unrelated concepts sit far apart. This makes it possible for a system to answer "find me documents about contract law" by measuring geometric distance between your query and thousands of stored documents — no keyword matching required. Embeddings are the foundation of semantic search, retrieval-augmented generation (RAG), recommendation engines, and duplicate detection.
Why It Matters
If you're building anything beyond a simple chat interface — a document search tool, a RAG pipeline, a recommendation system — you need an embedding model, and that's a separate selection decision from choosing a generative LLM. Embedding models are typically priced per token of input (not output), so high-volume applications like indexing large document libraries can accumulate significant cost. Quality matters too: a weak embedding model clusters unrelated documents together, which means your retrieval step feeds the generative LLM irrelevant context and degrades final answer quality regardless of how capable that LLM is. Developers should evaluate embedding models on retrieval accuracy benchmarks specific to their domain, not just on generative benchmarks.
How It Applies
ModelPicker currently tracks 52 active generative AI models across 8 providers, with input pricing ranging from $0.05 to $5.00 per million tokens. Our 12-test benchmark suite and external benchmark data (SWE-bench Verified, MATH Level 5, AIME 2025 via Epoch AI) focus on generative capabilities — reasoning, coding, instruction following — rather than embedding quality, because embedding and generative models serve distinct roles in an AI stack. When you see a model score on our long context benchmark (median score: 5/5 across our tracked models), that reflects how well a generative model handles long inputs, not its embedding capability. If your use case is RAG or semantic search, use ModelPicker to select your generative LLM for answer synthesis, then evaluate dedicated embedding models separately for the retrieval layer.