Code Suggestions - Gitlabhq

Tier: Free, Premium, Ultimate
Offering: GitLab.com, GitLab Self-Managed, GitLab Dedicated

Default LLM
LLM for Amazon Q: Amazon Q Developer
Available on GitLab Duo with self-hosted models

Introduced support for Google Vertex AI Codey APIs in GitLab 16.1.
Removed support for GitLab native model in GitLab 16.2.
Introduced support for code generation in GitLab 16.3.
Generally available in GitLab 16.7.
Changed to require the GitLab Duo Pro add-on on February 15, 2024. Previously, this feature was included with Premium and Ultimate subscriptions.
Changed to require the GitLab Duo Pro or GitLab Duo Enterprise add-on for all supported GitLab versions starting October 17, 2024.
Introduced support for Fireworks AI-hosted Qwen2.5 code completion model in GitLab 17.6, with a flag named fireworks_qwen_code_completion.
Removed support for Qwen2.5 code completion model in GitLab 17.11.
Enabled Fireworks hosted Codestral by default via the feature flag use_fireworks_codestral_code_completion in GitLab 17.11.
Changed to include GitLab Duo Core in GitLab 18.0.
Enabled Fireworks hosted Codestral as the default model in GitLab 18.1.
Changed the default model for code generation to Claude Sonnet 4 in GitLab 18.2.
Removed feature flag code_suggestions_context in GitLab 18.6.
Available on the Free tier on GitLab.com with GitLab Credits in GitLab 18.10.

[!note] Code Suggestions is available for:

GitLab Duo Agent Platform. Billing is usage-based.

GitLab Duo Core, Pro, or Enterprise, GitLab Duo with Amazon Q. Billing is based on your add-on.

Use GitLab Duo Code Suggestions to write code more efficiently by using generative AI to suggest code while you're developing.

View a click-through demo.
Watch an overview

Prerequisites

To use Code Suggestions:

If you have GitLab Duo Core, turn on IDE features.
Set up Code Suggestions.

[!note] GitLab Duo requires GitLab 17.2 or later. For GitLab Duo Core access, and for the best user experience and results, upgrade to GitLab 18.0 or later. Earlier versions might continue to work, however the experience might be degraded.

Use Code Suggestions

To use Code Suggestions:

Open your Git project in a supported IDE.
Add the project as a remote of your local repository using git remote add.
Add your project directory, including the hidden .git/ folder, to your IDE workspace or project.
Author your code. As you type, suggestions are displayed. Code Suggestions provides code snippets or completes the current line, depending on the cursor position.
Describe the requirements in natural language. Code Suggestions generates functions and code snippets based on the context provided.
When you receive a suggestion, you can do any of the following:
- To accept a suggestion, press <kbd>Tab</kbd>.
- To accept a partial suggestion, press either <kbd>Control</kbd>+<kbd>Right arrow</kbd> or <kbd>Command</kbd>+<kbd>Right arrow</kbd>.
- To reject a suggestion, press <kbd>Esc</kbd>. In Neovim, press <kbd>Control</kbd>+<kbd>E</kbd> to exit the menu.
- To ignore a suggestion, keep typing as you usually would.

View multiple code suggestions

Introduced in GitLab 17.1.

For a code completion suggestion in VS Code, multiple suggestion options might be available. To view all available suggestions:

Hover over the code completion suggestion.
Scroll through the alternatives. Either:
- Use keyboard shortcuts:
 - On a Mac, press <kbd>Option</kbd>+<kbd>[</kbd> to view the previous suggestion, and press <kbd>Option</kbd>+<kbd>]</kbd> to view the next suggestion.
 - On Linux and Windows, press <kbd>Alt</kbd>+<kbd>[</kbd> to view the previous suggestion, and press <kbd>Alt</kbd>+<kbd>]</kbd> to view the next suggestion.
- On the dialog that's displayed, select the right or left arrow to see next or previous options.
Press <kbd>Tab</kbd> to apply the suggestion you prefer.

Code completion and generation

Code Suggestions uses code completion and code generation:

	Code completion	Code generation
Purpose	Provides suggestions for completing the current line of code.	Generates new code based on a natural language comment.
Trigger	Triggers when typing, usually with a short delay.	Triggers when pressing <kbd>Enter</kbd> after writing a comment that includes specific keywords.
Scope	Limited to the current line or small block of code.	Can generate entire methods, functions, or even classes based on the context.
Accuracy	More accurate for small tasks and short blocks of code.	Is more accurate for complex tasks and large blocks of code because a bigger large language model (LLM) is used, additional context is sent in the request (for example, the libraries used by the project), and your instructions are passed to the LLM.
How to use	Code completion automatically suggests completions to the line you are typing.	You write a comment and press <kbd>Enter</kbd>, or you enter an empty function or method.
When to use	Use code completion to quickly complete one or a few lines of code.	Use code generation for more complex tasks, larger codebases, when you want to write new code from scratch based on a natural language description, or when the file you're editing has fewer than five lines of code.

Code Suggestions always uses both of these features. You cannot use only code generation or only code completion.

View a code completion vs. code generation comparison demo.

Best practices for code generation

To get the best results from code generation:

Be as specific as possible while remaining concise.
State the outcome you want to generate (for example, a function) and provide details on what you want to achieve.
Add additional information, like the framework or library you want to use.
Add a space or new line after each comment. This space tells the code generator that you have completed your instructions.
Review and adjust the context available to Code Suggestions.

For example, to create a Python web service with some specific requirements, you might write something like:

plaintext

# Create a web service using Tornado that allows a user to sign in, run a security scan, and review the scan results.
# Each action (sign in, run a scan, and review results) should be its own resource in the web service
...

AI is non-deterministic, so you may not get the same suggestion every time with the same input. To generate quality code, write clear, descriptive, specific tasks.

For use cases and best practices, follow the GitLab Duo examples documentation.

Available language models

Different language models can be the source for Code Suggestions.

On GitLab.com: GitLab hosts the models and connects to them through the cloud-based AI Gateway.
On GitLab Self-Managed, two options exist:
- GitLab can host the models and connects to them through the cloud-based AI Gateway.
- Your organization can use self-hosted models which means you host the AI Gateway and language models. You can use GitLab-managed models, other supported language models, or bring your own compatible model.

Performance

Learn about the default response times for Code Suggestions, and options for streaming, prompt caching, and configuring connections.

Response time

Code Suggestions is powered by a generative AI model.

For code completion, suggestions are usually low latency and take less than one second.
For code generation, algorithms or large code blocks might take more than five seconds to generate.

Your personal access token enables a secure API connection to GitLab.com or to your GitLab instance. This API connection securely transmits a context window from your IDE/editor to the GitLab AI Gateway, a GitLab hosted service. The gateway calls the large language model APIs, and then the generated suggestion is transmitted back to your IDE/editor.

Streaming

Streaming of code generation responses is supported in JetBrains and Visual Studio, leading to perceived faster response times. Other supported IDEs will return the generated code in a single block.

Streaming is not enabled for code completion.

Prompt caching

Introduced in GitLab 18.0.

Prompt caching is enabled by default on all Fireworks-hosted models to improve Code Suggestions latency.

When prompt caching is enabled, code completion prompt data is temporarily stored in memory by the model vendor.

Prompt caching significantly improves latency by avoiding the re-processing of cached prompt and input data. The cached data is never logged to any persistent storage.

Turn off prompt caching

You can turn off prompt caching for top-level groups in the GitLab Duo settings. This also turns off prompt caching for GitLab Duo Agentic Chat.

Prerequisites:

Administrator access for GitLab Self-Managed.

On GitLab.com:

In the top bar, select Search or go to and find your group.
Select Settings > GitLab Duo.
Select Change configuration.
Disable the Prompt caching toggle.
Select Save changes.

On GitLab Self-Managed:

In the upper-right corner, select Admin.
In the left sidebar, select GitLab Duo.
Select Change configuration.
Under Prompt cache, clear the Turn on prompt caching checkbox.
Select Save changes.

Direct and indirect connections

Introduced in GitLab 17.2 with a flag named code_suggestions_direct_access. Disabled by default.

By default, code completion requests are sent from the IDE directly to the AI Gateway to minimize the latency. For this direct connection to work, the IDE must be able to connect to https://cloud.gitlab.com:443. If this is not possible (for example, because of network restrictions), you can disable direct connections for all users. If you do this, code completion requests are sent indirectly through the GitLab Self-Managed instance, which in turn sends the requests to the AI Gateway. This might result in your requests having higher latency.

Configure direct or indirect connections

Prerequisites:

You must be an administrator for the GitLab Self-Managed instance.

In the upper-right corner, select Admin.
Select Settings > General.
Expand GitLab Duo features.
Under Connection method, choose an option:
- To minimize latency for code completion requests, select Direct connections.
- To disable direct connections for all users, select Indirect connections through GitLab Self-Managed.
Select Save changes.

In the upper-right corner, select Admin.
Select Settings > General.
Expand AI-native features.
Choose an option:
- To enable direct connections and minimize latency for code completion requests, clear the Disable direct connections for code suggestions checkbox.
- To disable direct connections, select the Disable direct connections for code suggestions checkbox.

Limitations

Truncation of file content

Because of LLM limits and performance reasons, the content of the currently opened file is truncated:

For code completion: to 32,000 tokens (roughly 128,000 characters).
For code generation: to 80,000 tokens (roughly 320,000 characters).

Content above the cursor is prioritized over content below the cursor. The content above the cursor is truncated from the left side, and content below the cursor is truncated from the right side. These numbers represent the maximum input context size for Code Suggestions.

Support for increasing the code generation limit is proposed in issue 585841.

Output length

Because of LLM limits and for performance reasons, the output of Code Suggestions is limited:

For code completion: to 64 tokens (roughly 256 characters).
For code generation: to 2048 tokens (roughly 7168 characters).

Accuracy of results

We are continuing to work on the accuracy of overall generated content. However, Code Suggestions might generate suggestions that are:

Irrelevant.
Incomplete.
Likely to result in failed pipelines.
Potentially insecure.
Offensive or insensitive.

When using Code Suggestions, code review best practices still apply.

Feedback

Provide feedback about your Code Suggestions experience in issue 435783.