content/guides/rag-ollama/develop.md
Complete Containerize a RAG application.
In this section, you'll learn how to set up a development environment to access all the services that your generative RAG application needs. This includes:
[!NOTE] You can see more samples of containerized GenAI applications in the GenAI Stack demo applications.
You can use containers to set up local services, like a database. In this section, you'll explore the database service in the docker-compose.yaml file.
To run the database service:
In the cloned repository's directory, open the docker-compose.yaml file in an IDE or text editor.
In the docker-compose.yaml file, you'll see the following:
services:
qdrant:
image: qdrant/qdrant
container_name: qdrant
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
[!NOTE] To learn more about Qdrant, see the Qdrant Official Docker Image.
Start the application. Inside the winy directory, run the following command in a terminal.
$ docker compose up --build
Access the application. Open a browser and view the application at http://localhost:8501. You should see a simple Streamlit application.
Stop the application. In the terminal, press ctrl+c to stop the application.
The sample application supports both Ollama. This guide provides instructions for the following scenarios:
While all platforms can use any of the previous scenarios, the performance and GPU support may vary. You can use the following guidelines to help you choose the appropriate option:
Choose one of the following options for your LLM service.
{{< tabs >}} {{< tab name="Run Ollama in a container" >}}
When running Ollama in a container, you should have a CUDA-supported GPU. While you can run Ollama in a container without a supported GPU, the performance may not be acceptable. Only Linux and Windows 11 support GPU access to containers.
To run Ollama in a container and provide GPU access:
Install the prerequisites.
The docker-compose.yaml file already contains the necessary instructions. In your own apps, you'll need to add the Ollama service in your docker-compose.yaml. The following is
the updated docker-compose.yaml:
ollama:
image: ollama/ollama
container_name: ollama
ports:
- "8000:8000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
[!NOTE] For more details about the Compose instructions, see Turn on GPU access with Docker Compose.
Once the Ollama container is up and running it is possible to use the download_model.sh inside the tools folder with this command:
. ./download_model.sh <model-name>
Pulling an Ollama model can take several minutes.
{{< /tab >}} {{< tab name="Run Ollama outside of a container" >}}
To run Ollama outside of a container:
Install and run Ollama on your host machine.
Pull the model to Ollama using the following command.
$ ollama pull llama2
Remove the ollama service from the docker-compose.yaml and update properly the connection variables in winy service:
- OLLAMA=http://ollama:11434
+ OLLAMA=<your-url>
{{< /tab >}} {{< /tabs >}}
At this point, you have the following services in your Compose file:
Once the application is running, open a browser and access the application at http://localhost:8501.
Depending on your system and the LLM service that you chose, it may take several minutes to answer.
In this section, you learned how to set up a development environment to provide access all the services that your GenAI application needs.
Related information:
See samples of more GenAI applications in the GenAI Stack demo applications.