doc/development/feature_flags/controls.md
[!note] This document explains how to contribute to the development of the GitLab product. If you want to use feature flags to show and hide functionality in your own applications, view this feature flags information instead.
To turn on/off features behind feature flags in any of the
GitLab-provided environments, like staging and production, you need to
have access to the ChatOps bot. The ChatOps bot
is currently running on the ops instance, which is different from
GitLab.com or dev.gitlab.org.
Follow the ChatOps document to request access.
After you are added to the project test if your access propagated, run:
/chatops gitlab run feature --help
When the changes are deployed to the environments it is time to start rolling out the feature to our users. The exact procedure of rolling out a change is unspecified, as this can vary from change to change. However, in general we recommend rolling out changes incrementally, instead of enabling them for everybody right away. We also recommend you to not enable a feature before the code is being deployed. This allows you to separate rolling out a feature from a deploy, making it easier to measure the impact of both separately.
The GitLab feature library (using Flipper, and covered in the Feature flags process guide) supports rolling out changes to a percentage of time to users. This in turn can be controlled using GitLab ChatOps.
For an up to date list of feature flag commands see
the source code.
All the examples in that file must be preceded by /chatops gitlab run.
If you get an error "Whoops! This action is not allowed. This incident will be reported." that means your Slack account is not allowed to change feature flags or you do not have access.
As a first step in a feature rollout, you should enable the feature on
staging.gitlab.com
and dev.gitlab.org.
These two environments have different scopes.
dev.gitlab.org is a production CE environment that has internal GitLab Inc.
traffic and is used for some development and other related work.
staging.gitlab.com has a smaller subset of GitLab.com database and repositories
and does not have regular traffic. Staging is an EE instance and can give you
a (very) rough estimate of how your feature will look and behave on GitLab.com.
Both of these instances are connected to Sentry so make sure you check the projects
there for any exceptions while testing your feature after enabling the feature flag.
For these pre-production environments, it's strongly encouraged to run the command in
#staging, #production, or #chat-ops-test, for improved visibility.
To enable a feature 25% of the time for any given actor, run the following in Slack:
/chatops gitlab run feature set new_navigation_bar 25 --actors --dev
/chatops gitlab run feature set new_navigation_bar 25 --actors --staging
See percentage of actors for your choices of actors for which you would like to randomize the rollout.
When a feature has successfully been enabled on a pre-production environment and verified as safe and working, you can roll out the change to GitLab.com (production).
If a feature is deprecated, do not enable the flag.
Some feature flag changes on GitLab.com should be communicated with parts of the company. The developer responsible needs to determine whether this is necessary and the appropriate level of communication. This depends on the feature and what sort of impact it might have.
Guidelines:
#support_gitlab-com beforehand. So in case if the feature has any side effects on user experience, they can mitigate and disable the feature flag to reduce some impact.
Explain the feature flag <feature-flag-name> in the gitlab-org/gitlab project.#production.Choosing which the percentages while rolling out the feature flag depends on different factors, for example:
Let's take some examples for different types of feature flags, and how you can consider the rollout in these cases:
If, for example, you're releasing a new feature that runs a few times per day in a cron job, and the feature is controlled by the newly introduced feature flag. For example, rewriting the database query for a cron job. In this case, releasing the feature flag for a percentage below 25% might give you slow feedback regarding whether to proceed with the rollout or not. Also, if the cron job fails, it retries. So the consequences of something going wrong won't be that big. In this case, releasing with a percentage of 25% or 50% will be an acceptable choice.
But you have to make sure to log the result of the feature flag check to the log of your worker. For more instructions about best practices for logging, see Logging context metadata (through Rails or Grape requests).
Your newly introduced feature or change might be more customer facing than whatever runs in Sidekiq jobs. But
it might not be run often. In this case, choose a percentage high enough to collect some results in order
to know whether to proceed or not. You can consider starting with 5% or 10% in this case, while monitoring
the logs for any errors, or returned 500s statuses to the users.
But as you continue with the rollout and increase the percentage, you need to consider looking at the performance impact of the feature. You can consider monitoring the Latency: Apdex and error ratios dashboard on Grafana.
Sometimes, a new change that might touch every aspect of the GitLab application. For example, changing
a database query on one of the core models, like User, Project or Namespace. In this case, releasing
the feature for 1% of the requests, or even less than that (via Change Request) is highly recommended to avoid any incidents.
See this change request example of a feature flag that was released
for around 0.1% of the requests, due to the high impact of the change.
To make sure that the rollout does not affect many customers, consider following these steps:
If you are not certain what percentages to use, then choose the safe recommended option, and choose these percentages:
Between every step you'll want to wait a little while and monitor the appropriate graphs on https://dashboards.gitlab.net. The exact time to wait may differ. For some features a few minutes is enough, while for others you may want to wait several hours or even days. This is entirely up to you, just make sure it is clearly communicated to your team and the Production team if you anticipate any potential problems.
When enabling a feature flag rollout, the system will automatically block the
ChatOps command from succeeding if there are active "severity::1" or ~"severity::2"
incidents or in-progress change issues, for example:
/chatops gitlab run feature set gitaly_lfs_pointers_pipeline true
- Production checks fail!
- active incidents
2021-06-29 Canary deployment failing QA tests
Before enabling a feature flag, verify that you are not violating any Production Change Lock periods and are in compliance with the Feature flags and the Change Management Process.
The following /chatops commands must be performed in the Slack
#production channel.
To enable a feature for 25% of actors such as users, projects, groups or the current request or job, run the following in Slack:
/chatops gitlab run feature set some_feature 25 --actors
This sets a feature flag to true based on the following formula:
feature_flag_state = Zlib.crc32("some_feature<Actor>:#{actor.id}") % (100 * 1_000) < 25 * 1_000
# where <Actor>: is a `User`, `Group`, `Project` and actor is an instance
During development, based on the nature of the feature, an actor choice should be made.
For user focused features:
Feature.enabled?(:feature_cool_avatars, current_user)
For group or namespace level features:
Feature.enabled?(:feature_cooler_groups, group)
For project level features:
Feature.enabled?(:feature_ice_cold_projects, project)
For current request:
Feature.enabled?(:feature_ice_cold_projects, Feature.current_request)
Feature gates can also be actor based, for example a feature could first be
enabled for only the gitlab project. The project is passed by supplying a
--project flag:
/chatops gitlab run feature set --project=gitlab-org/gitlab some_feature true
You can use the --user option to enable a feature flag for a specific user:
/chatops gitlab run feature set --user=myusername some_feature true
If you would like to gather feedback internally first,
feature flags scoped to a user can also be enabled
for GitLab team members with the gitlab_team_members
feature group:
/chatops gitlab run feature set --feature-group=gitlab_team_members some_feature true
You can use the --group flag to enable a feature flag for a specific group:
/chatops gitlab run feature set --group=gitlab-org some_feature true
Note that --group does not work with user namespaces. To enable a feature flag for a
generic namespace (including groups) use --namespace:
/chatops gitlab run feature set --namespace=gitlab-org some_feature true
/chatops gitlab run feature set --namespace=myusername some_feature true
Actor-based gates are applied before percentages. For example, considering the
group/project as gitlab-org/gitlab and a given example feature as some_feature, if
you run these 2 commands:
/chatops gitlab run feature set --project=gitlab-org/gitlab some_feature true
/chatops gitlab run feature set some_feature 25 --actors
Then some_feature will be enabled for both 25% of actors and always when interacting with
gitlab-org/gitlab. This is a good idea if the feature flag development makes use of group
actors.
Feature.enabled?(:some_feature, group)
Multiple actors can be passed together in a comma-separated form:
/chatops gitlab run feature set --project=gitlab-org/gitlab,example-org/example-project some_feature true
/chatops gitlab run feature set --group=gitlab-org,example-org some_feature true
/chatops gitlab run feature set --namespace=gitlab-org,example-org some_feature true
Lastly, to verify that the feature is deemed stable in as many cases as possible, you should fully roll out the feature by enabling the flag globally by running:
/chatops gitlab run feature set some_feature true
This changes the feature flag state to be enabled always, which overrides the
existing gates (for example, --group=gitlab-org) in the above processes.
Note, that if an actor based feature gate is present, switching the
default_enabled attribute of the YAML definition from false to true
will not have any effect. The feature gate must be deleted first.
For example, a feature flag is set via ChatOps:
/chatops gitlab run feature set --project=gitlab-org/gitlab some_feature true
When the default_enabled attribute in the YAML definition is switched to
true, the feature gate must be deleted to have the desired effect:
/chatops gitlab run feature delete some_feature
Previously, to enable a feature 25% of the time, we would run the following in Slack:
/chatops gitlab run feature set new_navigation_bar 25 --random
This command enables the new_navigation_bar feature for GitLab.com. However, this command does not enable the feature for 25% of the total users.
Instead, when the feature is checked with enabled?, it returns true 25% of the time.
Percentage of time feature flags are now deprecated in favor of percentage of actors
using the Feature.current_request actor. The problem with not using an actor is that the randomized
choice evaluates for each call into Feature.enabled? rather than once per request or job execution,
which can lead to flip-flopping between states. For example:
feature_flag_state = rand < (25 / 100.0)
For the time being, we continue to allow use of percentage of time feature flags.
During rollout, you can force it using the --ignore-random-deprecation-check switch in ChatOps.
To disable a feature flag that has been globally enabled you can run:
/chatops gitlab run feature set some_feature false
To disable a feature flag that has been enabled for a specific project you can run:
/chatops gitlab run feature set --project=gitlab-org/gitlab some_feature false
You cannot selectively disable feature flags for a specific project/group/user without applying a specific method of implementing the feature flags.
If a feature flag is disabled via ChatOps, that will take precedence over the default_enabled value in the YAML. In other words, you could have a feature enabled for on-premise installations but not for GitLab.com.
By default you cannot selectively disable a feature flag by actor.
# This will not work how you would expect.
/chatops gitlab run feature set some_feature true
/chatops gitlab run feature set --project=gitlab-org/gitlab some_feature false
However, if you add two feature flags, you can write your conditional statement in such a way that the equivalent selective disable is possible.
Feature.enabled?(:a_feature, project) && Feature.disabled?(:a_feature_override, project)
# This will enable a feature flag globally, except for gitlab-org/gitlab
/chatops gitlab run feature set a_feature true
/chatops gitlab run feature set --project=gitlab-org/gitlab a_feature_override true
When using the percentage rollout of actors on multiple feature flags, the actors for each feature flag are selected separately.
For example, the following feature flags are enabled for a certain percentage of actors:
/chatops gitlab run feature set feature-set-1 25 --actors
/chatops gitlab run feature set feature-set-2 25 --actors
If a project A has :feature-set-1 enabled, there is no guarantee that project A also has :feature-set-2 enabled.
For more detail, see This is how percentages work in Flipper.
After turning on the feature flag, you need to monitor the relevant graphs between each step:
dashboards.gitlab.net.feature-flag.Latency: Apdex for services that might be impacted by your change
(like sidekiq service, api service or web service). Then check out more in-depth
dashboards by selecting Service Overview Dashboards and choosing a dashboard that might
be related to your change.In this illustration, you can see that the Apdex score started to decline after the feature flag was enabled at 09:46. The feature flag was then deactivated at 10:31, and the service returned to the original value:
Certain features necessitate extensive monitoring over multiple days, particularly those that are high-risk and critical to business operations. In contrast, other features may only require a 24-hour monitoring period before continuing with the rollout.
It is recommended to determine the necessary extent of monitoring before initiating the rollout.
Any feature flag change that affects GitLab.com (production) via ChatOps is automatically logged in an issue.
The issue is created in the gl-infra/feature-flag-log project, and it will at minimum log the Slack handle of person enabling a feature flag, the time, and the name of the flag being changed.
The issue is then also posted to the GitLab internal Grafana dashboard as an annotation marker to make the change even more visible.
Changes to the issue format can be submitted in the ChatOps project.
Any feature flag change that affects any GitLab instance is automatically logged in
features_json.log.
You can search the change history in Kibana.
You can also access the feature flag change history for GitLab.com in Kibana.
A feature flag should be removed as soon as it is no longer needed. Each additional feature flag in the codebase increases the complexity of the application and reduces confidence in our testing suite covering all possible combinations. Additionally, a feature flag overwritten in some of the environments can result in undefined and untested system behavior.
development type feature flags should have a short lifecycle because their purpose
is for rolling out a persistent change. development feature flags that are older
than 2 milestones are reported to engineering managers. The
report tool runs on a
monthly basis. For example, see the report for December 2021.
If a development feature flag is still present in the codebase after 6 months we should
take one of the following actions:
To remove a feature flag, open one merge request to make the changes. In the MR:
Once the above MR has been merged, you should:
/chatops gitlab run feature delete some_feature.When a feature gate has been removed from the codebase, the feature record still exists in the database that the flag was deployed too. The record can be deleted once the MR is deployed to all the environments:
/chatops gitlab run feature delete <feature-flag-name> --dev --pre --staging --staging-ref --production
You can use the following ChatOps command to see a feature flag's current state:
/chatops gitlab run feature get <feature-flag-name>
Since this is a read-only command, you can avoid cluttering the production channels by either:
#chat-ops-test Slack channelThe result of this command will display: