doc/solutions/integrations/aws_googlecloud_ollama.md
{{< details >}}
{{< /details >}}
The document describes the installation and integration of GitLab and GitLab Duo with a self-hosted Large Language Model (LLM) running a Mistral model on Ollama. The guide describes the setup using 3 different virtual machines and can be followed along on AWS or GCP. Of course, the process is applicable to different deployment platforms, too.
This guide is a comprehensive, end-to-end set of instructions for getting the desired setup working. It calls out references to the many areas of GitLab documentation that were used to support the creation of the final configuration. The referenced docs are important when more background is needed to adjust the implementation to a specific scenario.
<!-- TOC -->We will install GitLab, GitLab AI Gateway and Ollama each in their own separate virtual machine. While we used Ubuntu 24.0x in this guide, you have flexibility in choosing any Unix-based operating system that meets your organization's requirements and preferences. However, using a Unix-based operating system is mandatory for this setup. This ensures system stability, security, and compatibility with the required software stack. This setup provides a good balance between cost and performance for testing and evaluation phases, though you may need to upgrade the GPU instance type when moving to production, depending on your usage requirements and team size.
| GCP | AWS | OS | Disk | |
|---|---|---|---|---|
| GitLab | c2-standard-4 | c6xlarge | Ubuntu 24 | 50 GB |
| AI Gateway | e2-medium | t2.medium | Ubuntu 24 | 20 GB |
| Ollama | n1-standard-4 | g4dn.xlarge | Ubuntu 24 | 50 GB |
For more information about the component and its purpose, see AI Gateway.
%%{init: { "fontFamily": "GitLab Sans" }}%%
flowchart LR
accTitle: GitLab Duo Self-Hosted architecture
accDescr: Shows the flow from GitLab Ultimate to the AI Gateway, which connects to Ollama running Mistral.
A[GitLab
Ultimate] --> C
C[GitLab
AI Gateway] --> B[Ollama
Mistral]
These components work together to realize the Self-Hosted AI functionality. This guide provides detailed instructions for building a complete self-hosted AI environment using Ollama as the LLM server.
[!note] While for a full production environment, the official documentation recommends more powerful GPU instances such as 1x NVIDIA A100 (40 GB), the g4dn.xlarge instance type should be sufficient for evaluation purposes with a small team of users.
To enable access to GitLab, a static public IP address (such as an Elastic IP in AWS or an External IP in Google Cloud) is required. All other components can and should use static internal IP addresses for internal communication. We assume all VMs are on the same network and can communicate directly.
| Public IP | Private IP | |
|---|---|---|
| GitLab | yes | yes |
| AI Gateway | no | yes |
| Ollama | no | yes |
Why Use Internal IPs?
The rest of this guide assumes you already have an instance of GitLab up and running that meets the following requirements:
Operating GitLab Duo Self-Hosted requires both a GitLab Ultimate license and a GitLab Duo Enterprise license. The GitLab Ultimate license works with either online or offline licensing options. This documentation assumes that both licenses have been previously obtained and are available for implementation.
A valid SSL certificate (such as Let's Encrypt) must be configured for the GitLab instance. This is not just a security best practice, but a technical requirement because:
GitLab provides a convenient automated SSL setup process:
https:// prefixDuring the installation of GitLab, the procedure looks something like this:
https://gitlab.yourdomain.com)For details, refer to the documentation page.
Before setting up GitLab Duo Self-Hosted, it's important to understand how AI works. AI model is the AI's brain trained with data. This brain needs a framework to operate, which is called an LLM Serving Platform or simply "Serving Platform." In AWS, this is "Amazon Bedrock," in Azure, it's "Azure OpenAI Service," and for ChatGPT, it's their platform. For Anthropic, it's "Claude.". For self-hosing models, Ollama is a common choice.
For example:
When you host an AI model yourself, you'll also need to choose a serving platform. A popular option for self-hosted models is Ollama.
In this analogy, the brain part for ChatGPT is the GPT-4 model, while in the Anthropic ecosystem, it's the Claude 3.7 Sonnet model. The serving platform acts as the vital framework that connects the brain to the world, enabling it to "think" and interact effectively.
For further information about supported serving platforms and models, see LLM Serving Platforms and Models.
What is Ollama?
Ollama is a streamlined, open-source framework for running Large Language Models (LLMs) in local environments. It simplifies the traditionally complex process of deploying AI models, making it accessible to both individuals and organizations looking for efficient, flexible, and scalable AI solutions.
Key Highlights:
Designed for simplicity and performance, Ollama empowers users to harness the power of LLMs without the complexity of traditional AI infrastructure. Further details on setup and supported models will be covered later in the documentation.
While the official installation guide is available in Install the GitLab AI Gateway, here's a streamlined approach for setting up the AI Gateway. As of January
2025, the image gitlab/model-gateway:self-hosted-v17.6.0-ee has been verified to work with GitLab 17.7.
Ensure that ...
GITLAB_DOMAIN with the domain name to YOUR instance of GitLab in the following code snippet:Run the following command to start the GitLab AI Gateway:
GITLAB_DOMAIN="gitlab.yourdomain.com"
docker run -p 5052:5052 \
-e AIGW_GITLAB_URL=$GITLAB_DOMAIN \
-e AIGW_GITLAB_API_URL=https://${GITLAB_DOMAIN}/api/v4/ \
-e AIGW_AUTH__BYPASS_EXTERNAL=true \
gitlab/model-gateway:self-hosted-v17.6.0-ee
The following table explains key environment variables and their roles in setting up your instance:
| Variable | Description |
|---|---|
AIGW_GITLAB_URL | Your GitLab instance domain. |
AIGW_GITLAB_API_URL | The API endpoint of your GitLab instance. |
AIGW_AUTH__BYPASS_EXTERNAL | Configuration for handling authentication. |
During the initial setup and testing phase, you can set AIGW_AUTH__BYPASS_EXTERNAL=true to bypass authentication and avoid issues. However, this configuration should never be used in a production environment or on servers exposed to the internet.
Install Ollama using the official installation script:
curl --fail --silent --show-error --location "https://ollama.com/install.sh" | sh
Configure Ollama to listen on the internal IP by adding the OLLAMA_HOST environment variable to its startup configuration
systemctl edit ollama.service
[Service]
Environment="OLLAMA_HOST=172.31.11.27"
[!note] Replace the IP address with your actual server's internal IP address.
Reload and restart the service:
systemctl daemon-reload
systemctl restart ollama
Set the environment variable:
export OLLAMA_HOST=172.31.11.27
Install the Mistral Instruct model:
ollama pull mistral:instruct
The mistral:instruct model requires approximately 4.1 GB of storage space and will take a while to download depending on your connection speed.
Verify the model installation:
ollama list
The command should show the installed model in the list.
Access the GitLab Web Interface
Configure Duo License
Assign Duo License to Root
[!note] Enabling Duo for just the root user is sufficient for initial setup and testing. Additional users can be granted Duo access later if needed, within your seat license limitations.
Access GitLab Duo Self-Hosted Configuration
Configure Model Settings
Deployment name: Choose a descriptive name (for example Mistral-7B-Instruct-v0.3 on AWS Tokyo)
Model family: Select "Mistral" from the dropdown list
Endpoint: Enter your Ollama server URL in the format:
http://[Internal-IP]:11434/v1
Example: http://172.31.11.27:11434/v1
Model identifier: Enter custom_openai/mistral:instruct
API Key: Enter any placeholder text (for example, test) as this field cannot be left blank
Enable AI Features
These settings establish the connection between your GitLab instance and the self-hosted Ollama model through the AI Gateway, enabling AI-native features within GitLab.