Models Leaderboard

Best Text Embedding Models for RAG, Ranked by Retrieval Quality and Cost

We scored five production embedding models on retrieval quality, multilingual coverage, context length, dimension flexibility, and price per million tokens, using public MTEB results and vendor-documented specs as of June 2026.

Tested by Priya Raman Lead Benchmark Analyst Updated June 12, 2026 5 products ranked

The Verdict

Voyage 3 Large finishes first on raw retrieval quality and is the right default when retrieval is the bottleneck. Gemini Embedding 001 wins for multilingual corpora at API scale, Cohere Embed v4 is the only mainstream pick for mixed text-and-image RAG, OpenAI text-embedding-3-large remains the safest default inside the OpenAI stack, and Qwen3-Embedding-8B is the strongest open-weight option once self-hosting volume justifies the GPUs.

Five general-purpose embedding models, one ranked field. We picked the embedders most teams actually shortlist for production RAG in 2026: three managed APIs (Voyage, Google Gemini, Cohere), one incumbent default (OpenAI), and one open-weight model (Qwen3). Each was scored against the same five metrics so the differences trace to the models rather than the workload.

Quality scores are tied to public MTEB results on retrieval-leaning subsets, with multilingual coverage scored separately because that's the dimension where general-purpose averages hide the most. Price per million tokens is tracked alongside quality but kept out of the quality score. A buyer optimizing for retrieval accuracy and a buyer optimizing for token spend are answering different questions.

The test suite · 5 measured metrics

Quality is grounded in published MTEB scores and vendor-documented specs, verified against official pricing and model pages in June 2026. Where vendors report different MTEB subsets, we use English MTEB averages for the English retrieval score and MMTEB / multilingual MTEB averages for the multilingual score. Self-hosted models are scored on retrieval quality and on the engineering cost of running them, not on a hosted API price.

English retrieval quality

Scored against published English MTEB averages on the retrieval-leaning subsets reported by each vendor or by the model's HuggingFace card. We mapped the raw MTEB number to a 0-100 score with 60 MTEB anchored to 60 and 75 MTEB anchored to 95, so a five-point MTEB gap corresponds to roughly a 15-point score gap, consistent with published guidance that a five-point MTEB gap typically translates to 3-8% better recall@10 in production search. Weighted 30%.

Multilingual coverage

Scored on MMTEB / MTEB Multilingual averages plus the documented number of languages each model is trained or marketed to support. Models that lead the MMTEB leaderboard (Qwen3-Embedding-8B at 70.58, Gemini Embedding 001 at 68.32) anchor the top of the band. English-first models with limited cross-lingual training score in the middle even when their English MTEB is competitive. Weighted 20%.

Context window

Maximum input length per embedding call, taken from the official model documentation. 128K-token models (Cohere Embed v4) anchor the top of the band, 32K and 8K models sit in the middle, and the 2K Gemini Embedding 001 anchors the bottom. Longer context reduces the share of the RAG pipeline that's dominated by chunking strategy. Weighted 15%.

Dimension flexibility

Scored on Matryoshka Representation Learning support, the number of documented dimension cut-points, and the supported output precisions (float, int8, binary). Models that publish multiple Matryoshka cuts and quantization-aware training (Voyage 3 Large at 2048/1024/512/256 with binary support, Cohere Embed v4 at 256/512/1024/1536 with int8/binary) score at the top because they expose direct storage-vs-quality controls without re-embedding. Weighted 15%.

Price per million tokens

List input price per million tokens from each vendor's official pricing page in June 2026, normalized so the cheapest model in the field scores highest. Self-hosted models are scored as a separate cost regime: no per-token API charge, but a non-zero GPU bill that only beats API pricing at high embed volume. Weighted 20%.

The Ranking

1RANK

Voyage 3 Large

Voyage AI

Highest published retrieval quality in the field, with Matryoshka dimensions and binary quantization that compress storage 200x without leaving the model.

voyage-3-large is Voyage AI's general-purpose embedding model, with Matryoshka dimensionality at 2048, 1024, 512, and 256, plus quantization-aware training that supports binary and int8 outputs. Vendor benchmarks report a 9.74% retrieval-quality margin over OpenAI text-embedding-3-large across 100 datasets in eight domains, and third-party spec tables put it at the top of MTEB retrieval among commercial APIs at $0.18 per million input tokens with a 32K context window. The trade-off is price. At the top of the band on both quality and cost, it's the right default when retrieval is the binding constraint and the wrong default when API spend is.

Source: Voyage AI ↗

Strengths

Highest reported retrieval quality across general-purpose APIs
Matryoshka cuts at 2048/1024/512/256 plus binary outputs cut storage up to 200x
First 200M tokens are free per account on voyage-4 family

Weaknesses

$0.18 per million tokens is the most expensive in the field
English-first; multilingual coverage trails Gemini and Qwen3

How it scored, by metric

English retrieval quality 94

Multilingual coverage 78

Context window 80

Dimension flexibility 95

Price per million tokens 60

Best for: Teams where retrieval recall directly drives product quality and embed spend isn't the binding constraint

2RANK

Gemini Embedding 001

Google

Multilingual MTEB leader among managed APIs, with Matryoshka dimensions down to 768 and a 50% batch discount, gated by a short 2K context window.

gemini-embedding-001 is Google's production text embedding model, accessible through the Gemini API and Vertex AI. It's trained with Matryoshka Representation Learning, outputs 3,072 dimensions by default with documented truncation to 1,536 or 768 without quality loss, and has held a top spot on the MMTEB leaderboard with an average task score of 68.32. List pricing is $0.15 per million input tokens, with a 50% batch discount at $0.075. The binding limitation is the 2,048-token input window, which forces aggressive chunking on any document workflow longer than a few pages.

Source: Google ↗

Strengths

MMTEB multilingual leader at 68.32 across managed APIs
Matryoshka cuts at 3072/1536/768 with documented quality preservation
Batch API at $0.075 per million tokens cuts indexing cost in half

Weaknesses

2,048-token context window forces chunking on long documents
API errors rather than truncating when inputs exceed 2K tokens

How it scored, by metric

English retrieval quality 86

Multilingual coverage 92

Context window 55

Dimension flexibility 86

Price per million tokens 72

Best for: Multilingual corpora on the Google Cloud stack, especially RAG pipelines with disciplined chunking

3RANK

Cohere Embed v4

Cohere

The only mainstream multimodal embedding API and the widest context window in the field at 128K tokens, with text MTEB just behind OpenAI's flagship.

Cohere Embed v4 is a multimodal embedding model that vectorizes text, single images, and interleaved text-and-image content in one 1,536-dimensional space, with Matryoshka cuts at 256, 512, 1,024, and 1,536 and support for float, int8, uint8, binary, and ubinary output types. The model reaches an MTEB score of 65.2 (ahead of OpenAI text-embedding-3-large at 64.6) at $0.12 per million text tokens, and its 128,000-token context window sits in the top 5% of embedders. It's API-only on the public tier, with on-prem available under Cohere's enterprise contract for SOC 2 Type 2 and HIPAA postures.

Source: Cohere ↗

Strengths

128K context window, top 5% among embedding models
Embeds PDFs, slides, tables, and figures directly without OCR
1536-dim Matryoshka with int8 and binary precision end-to-end

Weaknesses

Text-only MTEB trails Voyage 3 Large by 3-4 points
Image tokens billed at $0.47 per million, materially above the text rate

How it scored, by metric

English retrieval quality 78

Multilingual coverage 85

Context window 96

Dimension flexibility 90

Price per million tokens 78

Best for: Document-heavy RAG with embedded figures, regulated enterprises, and corpora past 8K-token chunks

4RANK

text-embedding-3-large

OpenAI

The incumbent default at $0.13 per million tokens, widest ecosystem coverage in the field, no model update since January 2024.

text-embedding-3-large is OpenAI's flagship embedding model, with a default 3,072-dimensional output that can be truncated via the dimensions parameter (Matryoshka-style) without re-embedding, an 8,192-token context window, and an MTEB English average of 64.6 reported at launch. List pricing is $0.13 per million input tokens, with a 50% batch discount at $0.065. It's the most-integrated embedding model in the ecosystem and the safest pick inside the OpenAI stack, but it hasn't been refreshed since January 2024 and now trails Voyage 3 Large, Gemini Embedding 001, and the strongest open-weight models on MTEB.

Source: OpenAI ↗

Strengths

Widest ecosystem and tooling support in the field
Dimensions parameter shortens 3,072-dim vectors without re-training
Batch API halves indexing cost to $0.065 per million tokens

Weaknesses

Not refreshed since January 2024; trails newer flagships on MTEB
8K context window is mid-pack, not class-leading

How it scored, by metric

English retrieval quality 76

Multilingual coverage 70

Context window 72

Dimension flexibility 82

Price per million tokens 75

Best for: Teams already on the OpenAI stack that want a single embedding vendor and don't need state-of-the-art retrieval

5RANK

Qwen3-Embedding-8B

Alibaba (Qwen)

Apache 2.0 open weights at the top of the MMTEB multilingual leaderboard, with no per-token bill and a real GPU bill instead.

Qwen3-Embedding-8B is the largest model in Alibaba's Qwen3 Embedding series, released under Apache 2.0 with 8B parameters, a 32K-token context window, and flexible vector dimensions up to 4,096. It ranks No.1 on the MTEB multilingual leaderboard with a score of 70.58 (as of June 5, 2025) and posts MTEB English 75.22, Chinese 73.84, and Code 80.68, outperforming Gemini-Embedding on the reported subsets. It's the strongest open-weight option for teams with GPU infrastructure, and the wrong pick for teams without one. Self-hosting only beats API pricing once embed volume crosses roughly 10-15 million embeddings per month.

Source: Alibaba (Qwen) ↗

Strengths

MMTEB leader at 70.58, ahead of every managed-API model in this ranking
Apache 2.0 license supports commercial use, fine-tuning, and on-prem
32K context window and flexible dimensions up to 4,096

Weaknesses

No managed API, requires GPU infrastructure and MLOps
Self-hosting is more expensive than APIs below ~10-15M embeddings per month

How it scored, by metric

English retrieval quality 92

Multilingual coverage 94

Context window 80

Dimension flexibility 88

Price per million tokens 55

Best for: Teams with GPU infrastructure, sovereignty constraints, or embed volumes past the API/self-host crossover

Analysis

The ranking above reflects published MTEB and MMTEB scores plus vendor-documented specs verified against official pricing pages in June 2026. The single largest separator at the top of the table isn’t English retrieval quality, every model in the field clears the threshold where the embedder is no longer the bottleneck, but how each one trades off against the other four metrics: multilingual reach, context length, dimension flexibility, and price.

What the scores measure

English retrieval quality carries the most weight because a RAG pipeline that retrieves the wrong chunk gets the wrong answer regardless of how strong the generation model is. We scored it from public MTEB averages on the retrieval-leaning subsets rather than from any vendor-on-vendor head-to-head, because vendor benchmarks consistently flatter the vendor. The mapping is calibrated so a five-point MTEB gap is meaningful, consistent with the public observation that a 5-point gap on MTEB typically translates to 3-8% better recall@10 in real-world search applications .

Multilingual coverage is scored separately because it’s the dimension that general-purpose MTEB averages hide most. For multilingual applications, Qwen3-Embedding-8B offers the best performance (70.6 MTEB) with excellent French support (69.8). If you prefer an API, Google Gemini Embedding offers excellent value with good multilingual performance. OpenAI text-embedding-3-large lags behind on European languages. A buyer indexing a multilingual corpus will see those gaps in production retrieval long before the headline MTEB number suggests they should.

Where the field separates

Voyage 3 Large and Qwen3-Embedding-8B lead on raw retrieval quality on the subsets we scored. Cohere Embed v4 leads on context length and is the only model in the table that natively embeds interleaved text and images: Embed v4.0 is Cohere’s fourth-generation embedding model, released April 15, 2025. It reaches a 65.2 MTEB score, ahead of OpenAI’s text-embedding-3-large (64.6). Beyond text-only retrieval, it embeds interleaved text and images in the same vector space. You can index screenshots of PDFs, slides, figures, and tables directly alongside text documents without converting visual content to text first. Gemini Embedding 001 leads on multilingual MMTEB among managed APIs but is bounded by a 2,048 token input limit per embedding request. Longer documents must be chunked.

Cost and the self-host crossover

Price per million tokens is tracked on the same suite but kept out of the quality score, because a buyer optimizing for retrieval recall and a buyer optimizing for embed spend are answering different questions. The list rates are tight: Embedding input is billed at $0.12 per million tokens at listed AI Gateway rates for Cohere Embed v4, $0.13 per million input tokens, $0 per million output tokens. 8,192 token context window for OpenAI text-embedding-3-large, $0.15 per million input tokens, $0 per million output tokens. 20,000 token context window for Gemini Embedding 001, and a context window of 32K tokens. Pricing starts at 0.18 per million input tokens for Voyage 3 Large. The bigger structural decision is the API-versus-self-host crossover, where Qwen3-Embedding-8B leads the field on MMTEB at no per-token cost but only beats the cheapest API in this ranking once embed volume crosses roughly 10-15 million embeddings per month.

Dimension flexibility is the underrated metric

Every model in the top four supports Matryoshka-style dimension reduction, but the cut-points and precision options differ enough to matter at scale. Voyage 3 Large exposes embeddings in four dimensions: 2048, 1024, 512, and 256 through Matryoshka learning. You can tune the tradeoff between retrieval accuracy and vector storage cost without retraining or running multiple models. Cohere Embed v4 goes further on precision: The model supports various embedding types including float, int8, uint8, binary, and ubinary formats, with configurable output dimensions from 256 to 1536. Those two controls, Matryoshka cuts plus quantization, are what makes the difference between a vector store that fits in RAM and one that doesn’t, and they’re why the top two picks in the ranking score 90 or higher on dimension flexibility while the bottom of the table sits in the low 80s.

Sources

Frequently Asked Questions

Q.Which embedding model has the highest retrieval quality in 2026?

Among managed APIs, <cite index="22-15">voyage-3-large outperforms OpenAI text-embedding-3-large by 9.74% and Cohere Embed v3 English by 20.71%</cite> across 100 retrieval datasets in eight domains, and third-party spec tables place it at the top of MTEB retrieval among commercial embedders. Among open-weight models, <cite index="67-2">Qwen3-Embedding-8B achieves state-of-the-art mean task-level scores on MMTEB (70.58) and MTEB (English: 75.22, Chinese: 73.84, Code: 80.68), outperforming Gemini-Embedding</cite> on those subsets, but it requires self-hosting on GPU infrastructure.

Q.What is the best embedding model for multilingual RAG?

For multilingual corpora on a managed API, Gemini Embedding 001 is the leader: <cite index="46-3">Google's gemini-embedding-001 is the state-of-the-art text embedding model from Google, currently holding top positions on the Massive Text Embedding Benchmark (MTEB) Multilingual leaderboard with an average task score of 68.32</cite>. If self-hosting is on the table, Qwen3-Embedding-8B sits ahead of it on MMTEB at 70.58 under an Apache 2.0 license.

Q.When does it make sense to self-host instead of using an embedding API?

The crossover is volume. On commodity GPUs, <cite index="53-17,53-18,53-19">a single A10G GPU (~$0.75/hour on AWS) can process roughly 500-1,000 embeddings per second. At that rate, self-hosting becomes cheaper than API calls only when you exceed about 10-15 million embeddings per month. Below that threshold, a paid API is both cheaper and simpler to maintain.</cite> Sovereignty constraints, fine-tuning needs, or MMTEB leadership at no per-token cost are the other reasons to take on the GPU bill.

Q.Should I still use OpenAI text-embedding-3-large?

Inside the OpenAI stack, it remains a defensible default at $0.13 per million tokens with a documented 64.6 MTEB and a dimensions parameter that lets you truncate the 3,072-dim output. Outside that stack, the model hasn't been refreshed since January 2024, and newer flagships (Voyage 3 Large, Gemini Embedding 001) and open-weight models (Qwen3-Embedding-8B) now lead it on the public MTEB and MMTEB boards.

The Analyst

Priya Raman

Lead Benchmark Analyst

Priya Raman runs the Top AI Tracker test bench. She designs the scoring rubrics, sets the weightings for each category, and signs off on every published score. Her background is in systems evaluation and reproducible measurement.

Best Text Embedding Models for RAG, Ranked by Retrieval Quality and Cost

Strengths

Weaknesses

How it scored, by metric

Strengths

Weaknesses

How it scored, by metric

Strengths

Weaknesses

How it scored, by metric

Strengths

Weaknesses

How it scored, by metric

Strengths

Weaknesses

How it scored, by metric

What the scores measure

Where the field separates

Cost and the self-host crossover

Dimension flexibility is the underrated metric

Other leaderboards