Exploring Artificial Intelligence

Exploring Artificial Intelligence

Multimodal RAG in Practice with Open-source Models

Elisa Terumi's avatar
Elisa Terumi
Nov 08, 2025
∙ Paid

In our last post, we talked about multimodal models, which integrate multiple modalities into a single representation to perform complex tasks.

If you’re a paid subscriber, you also had access to a practical example of how to generate images using an open-source model on Google Colab.

In today’s article, we’ll see how to create a multimodal RAG system where our assistant interacts with an image database.

Follow our page on LinkedIn for more content like this! 😉

What is RAG

Traditional RAG is a process that involves:

  • Retrieval: the system searches for relevant information in a knowledge base (documents, articles, PDFs, etc.).

  • Generation: a language model uses this information as context to produce a response.

This approach allows the model to access up-to-date information without relying solely on what it learned during training (see how to create a chatbot with RAG).

What changes in Multimodal RAG?

Multimodal RAG expands on this idea, enabling the system to search and reason over different types of data, such as images, sounds, and videos.

Imagine you upload an X-ray image and ask:

“What does this image indicate about the left lung?”

A multimodal RAG system can:

  • Use a vision model to understand the image;

  • Search a medical database for similar cases;

  • And finally, generate an explanatory text response, combining the retrieved context with visual analysis.

Practical example

Let’s get hands-on and build our multimodal RAG!

The complete code that runs on Google Colab can be downloaded from our Notebooks page.

Our multimodal RAG will be built by combining two models:

Keep reading with a 7-day free trial

Subscribe to Exploring Artificial Intelligence to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Elisa Terumi · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture