doc/user/application_security/detect/vulnerability_deduplication.md
{{< details >}}
{{< /details >}}
When a pipeline contains jobs that produce multiple security reports of the same type, it is possible that the same vulnerability is present in multiple reports. This duplication is common when different scanners are used to increase coverage, but can also exist in a single report. Vulnerability deduplication automatically consolidates duplicate vulnerabilities across scans, helping you focus on unique vulnerabilities while maintaining full scanning coverage.
The logic for deduplicating vulnerabilities varies depending on the scan type:
The scan type must match because each can have its own definition for the location of a vulnerability. For example, static analyzers are able to locate a file path and line number, whereas a container scanning analyzer uses the image name instead.
When comparing identifiers, GitLab does not compare CWE and WASC during deduplication because
they are "type identifiers" and are used to classify groups of vulnerabilities. Including these
identifiers would result in many vulnerabilities being incorrectly considered duplicates. Two vulnerabilities are
considered unique if none of their identifiers match.
In a set of duplicated vulnerabilities, the first occurrence of a vulnerability is kept and the remaining are skipped. Security reports are processed in alphabetical file path order, and vulnerabilities are processed sequentially in the order they appear in a report.
The location used for deduplication is dependent on the scan type.
registry.gitlab.com/group-name/project-name/image1:12345019:libcrypto3registry.gitlab.com/group-name/project-name/image1:libcrypto3registry.gitlab.com/group-name/project-name/image1:v19202021:libcrypto3registry.gitlab.com/group-name/project-name/image1:libcrypto3When security scanners analyze your code, they sometimes report the same vulnerability multiple times, especially when code is refactored or moved around. Advanced vulnerability tracking uses a smart deduplication system to recognize when these are actually the same issue, not new ones.
Imagine you have a security issue in a function. If a developer refactors the code and moves that function to a different line, the scanner might report it as a new vulnerability. Without deduplication, you'd see duplicate alerts for the same problem, making it harder to track what you actually need to fix.
When using scope-offset signatures, GitLab creates a unique "fingerprint" for each vulnerability using the following information:
This combination creates a signature that stays the same even when code moves around, as long as it stays within the same scope.
Say you have this Ruby code:
class OuterClass
class InnerClassA
def function_A(x)
puts "calling call1"
call1(x) # ← Vulnerability found here on line 5
end
call2("calling call 2")
end
end
The scanner finds a vulnerability on line 5. GitLab needs to figure out whether the vulnerability is in OuterClass, InnerClassA, or function_A?
The scanner calculates which scope is the best fit by measuring the distance from the vulnerability to the beginning and to the end of each scope:
OuterClass (lines 1-9): Distance = (5-1) + (9-5) = 8InnerClassA (lines 2-8): Distance = (5-2) + (8-5) = 6function_A (lines 3-6): Distance = (5-3) + (6-5) = 3The smallest distance wins, so GitLab identifies function_A as the scope.
GitLab creates a signature like lib/outer_class.rb|OuterClass[0]|InnerClassA[0]|function_A[0]:2
to identify the location of the vulnerability. If the function or class that contains the vulnerability is moved
to a different location within its parent scope, the vulnerability will not be reintroduced.
However, if OuterClass is renamed the scope is different and a new vulnerability is created.
Here are some examples of how vulnerability deduplication behaves.
dependency_scanningadc83b19e793491b1c6ea0fd8b46cd9f32e592fccontainer_scanningadc83b19e793491b1c6ea0fd8b46cd9f32e592fcsastadc83b19e793491b1c6ea0fd8b46cd9f32e592fcsastadc83b19e793491b1c6ea0fd8b46cd9f32e592fcCWE identifiers are ignored.container_scanningadc83b19e793491b1c6ea0fd8b46cd9f32e592fccontainer_scanningadc83b19e793491b1c6ea0fd8b46cd9f32e592fc