Multimodal Chatbot with Ollama and Docker-Compose - Part 3 (Final)

Explore the Power of a Multimodal Chatbot with Ollama and Docker-Compose

Nov 02, 2024

Welcome to the final part of our guide on creating an AI-powered chatbot that runs locally on your computer!

In this post, we’ll build a chatbot using a multimodal model with Ollama and Docker-Compose.

So far, we’ve explored how Ollama works and how to run it within Docker containers.

In this final stage, we’ll learn how to deploy the chatbot with Docker-Compose and set up a multimodal model so we can converse and ask questions about our images.

Review

Before moving forward, let’s recap what we’ve done in Parts 1 and 2:

Exploring Ollama: We learned what Ollama is and how it works.
Container Creation and Integration: We set up Docker and installed Ollama, creating a ready environment for chatbot development.

Now, let’s take things further using Docker-Compose and a multimodal model.

But first… What Is a Multimodal Model?

A multimodal model is an AI model that integrates and processes data from multiple sources or "modalities" (such as text, image, audio, and video) to understand and generate responses based on combined information. This allows the model to perform complex tasks, such as describing images in text and answering questions about videos.

Ok, let´s start!

Using Docker-Compose

Docker-Compose is a tool that enables you to define and manage multiple Docker containers in a unified environment. It simplifies configuration and automation of complex applications using a YAML file.

With Docker-Compose, you can manage multiple containers easily by defining all services in a single configuration file called docker-compose.yml.

We’ll use the docker-compose setup provided by the Open-webui project by cloning the GitHub repository (just like in the previous tutorial).

If you haven’t already, create a folder called workspace, go to this folder, and clone the repository with the following command:

git clone https://github.com/open-webui/open-webui.git

In this repository, you’ll find a pre-configured docker-compose.yml file that’s ready to run Ollama, using the latest available image, and the open-webui service built from a local Dockerfile, which depends on the Ollama service to function.

To launch the application with both containers, simply run:

docker-compose up

And that’s it! Open your browser and go to http://localhost:3000 to start using the application.

Repeat the process from the previous post to get a model (e.g., llama3.2) and chat with your model!

Multimodal Chatbot

Now, let’s enhance our chatbot to read and interpret images using a multimodal model.

To do this, go to the settings screen at Profile / Settings / Admin Settings / Models, and type “llava” in the Get a model from Ollama.com field.

LLaVA is an end-to-end trained multimodal model that combines a vision encoder with Vicuna for general-purpose visual and language understanding. It was recently updated to version 1.6.

Now, return to the chat screen and select the llava model from the list of available models: