Qwen3 Embedding: A new milestone for semantic search and RAG

Jun 09, 2025

Have you ever wondered how AI systems “understand” the meaning of words, phrases, or even entire documents?

The answer lies in a fundamental technique of Natural Language Processing (NLP): text embeddings, or semantic vectorization.

Follow our page on LinkedIn for more content like this! 😉

What are text embeddings?

Embeddings are numerical representations of texts that map words, phrases, or documents into vectors in a high-dimensional space. In this vector form, semantically similar contents are close to each other, even if they use different words.

This allows computational systems to compare and interpret texts based on meaning, not just exact word matches.

In practice, embeddings are the foundation of NLP tasks such as:

Semantic search (finding relevant documents from a query)
Retrieval-augmented generation (RAG)
Recommendation systems
Question answering
Machine translation
Text classification and clustering, among others.

With the advancement of large language models (LLMs), like GPT-4, Gemini, and Claude, embedding creation has undergone a revolution in both quality and scale.

That’s where Qwen3 Embedding comes in

Developed by Alibaba’s Tongyi lab, the Qwen3 Embedding series is a family of language models specialized in generating embeddings and performing text reranking.

Launched now in June 2025, this series is built upon the powerful foundational Qwen3 models, leveraging their multilingual capability and deep contextual understanding to deliver very high-quality semantic vectors.

The goal of these models is to transform texts into high-quality semantic vectors, enabling efficient comparisons in tasks like semantic search and document retrieval. The reranking models are optimized to more precisely evaluate relevance between text pairs (e.g., query and document).

The LLMs come in three sizes (0.6B, 4B, and 8B parameters), covering both embeddings and rerankers. This lets developers balance computational cost and performance according to their needs.

The model announcement is accompanied by robust benchmarks, open-source code, and a big ambition: to become a reference in information retrieval systems, classification, bitext mining, and more!

Highlights of Qwen3 Embedding

All models are based on the Qwen3 dense foundation models, featuring causal attention architecture and support for inputs up to 32,000 tokens.

Some highlights:

Cutting-edge performance: the Qwen3-Embedding-8B model ranked 1st on the MTEB Multilingual leaderboard (70.58), outperforming commercial alternatives like Gemini Embedding and OpenAI models.
Innovative training: the team trained the models with a multi-stage strategy, including generation of high-quality synthetic data, supervised fine-tuning, and checkpoint fusion. This approach improved model generalization and robustness.
Flexibility and customization: the models support custom instructions (instruction-aware), which can be used to adapt the behavior of embeddings and rerankers for specific tasks. Also, the output vector size can be adjusted between 32 and 4096 dimensions, depending on the model.

Embeddings and reranking: two key pieces in the modern NLP pipeline

A typical pipeline for systems like neural search or RAG uses two stages:

Embedding (indexing): each document and query is transformed into a vector. The system compares vectors to find relevant candidates.
Reranking: after the initial step, “query + document” pairs are reevaluated with more context and precision, using models like Qwen3-Reranker that apply more refined logic (cross-encoder) to judge relevance.

This hybrid architecture is today standard in search engines, AI assistants, recommendation systems, and autonomous agents.

Multilingualism and code: the strength of Qwen3

The Qwen3 Embedding series supports more than 100 languages 💪🏼, as well as various programming languages.

This enables use cases such as:

Semantic search in multiple languages with consistent vector alignment
Multilingual document retrieval
Parallel pair mining (bitext mining)
Source code retrieval in bases like Stack Overflow or GitHub

In code benchmarks (MTEB-Code), the 8B model also stood out, outperforming even commercial competitors.

How this can impact your projects

The advancement of embeddings is essential for the next generation of applications based on RAG, autonomous agents, and complex interactive systems. Models like Qwen3 Embedding serve as the “backbone” for intelligent search, recommendations, semantic classification, and large-scale clustering.

Qwen3 Embedding represents a milestone in the development of models for retrieval and textual understanding. Its results, combined with open code and research, promise to strengthen the open AI ecosystem and inspire new applications in Portuguese and other languages.

Want to see it in practice? 💡

We prepared a practical example showing how to use the embedding model for semantic search and an example of how to use the reranking model to sort results. You can access the notebooks directly in the “Notebooks” section of our page.

Where to find Qwen3 Embedding

The models are available under the Apache 2.0 license and can be found at:

The technical report of the model can be accessed at https://arxiv.org/pdf/2506.05176.

With cutting-edge performance, multilingual support, and open architecture, it not only improves tasks like RAG but redefines the role of embeddings in modern AI systems.

💬 And you, have you tested Qwen3 yet? Leave a comment.

Exploring Artificial Intelligence

Discussion about this post