doc/user/packages/package_registry/tutorial_generate_sbom.md
This tutorial shows you how to generate a software bill of materials (SBOM) in CycloneDX format with a CI/CD pipeline. The pipeline you'll build collects packages across multiple projects in a group, providing you with a comprehensive view of the dependencies in related projects.
You'll create a virtual Python environment to complete this tutorial, but you can apply the same approach to other supported package types, too.
An SBOM is a machine-readable inventory of all the software components that comprise a software product. The SBOM might include:
An organization that's interested in using a software product may require an SBOM to determine how secure the product is before adopting it.
If you're familiar with the GitLab package registry, you might wonder what the difference is between an SBOM and a dependency list. The following table highlights the key differences:
| Differences | Dependency list | SBOM |
|---|---|---|
| Scope | Shows dependencies for individual projects or groups. | Creates an inventory of all packages published across your group. |
| Direction | Tracks what your projects depend on (incoming dependencies). | Tracks what your group publishes (outgoing packages). |
| Coverage | Based on package manifests, like package.json or pom.xml. | Covers actual published artifacts in your package registry. |
CycloneDX is a lightweight, standardized format for creating SBOMs. CycloneDX provides a well-defined schema that helps organizations:
CycloneDX supports multiple output formats, including JSON, XML, and Protocol Buffers, making it versatile for different integration needs. The specification is designed to be comprehensive yet efficient, covering everything from basic component identification to detailed metadata about software provenance.
To complete this tutorial, you need:
This tutorial involves two sets of steps to complete:
Here's an overview of what you'll do:
prepare stage.collect stage.aggregate stage.publish stage.[!note] Before implementing this solution, be aware that:
- Package dependencies are not resolved (only direct packages are listed).
- Package versions are included, but not analyzed for vulnerabilities.
First, set up the base image that defines the variables and stages used throughout the pipeline.
In the following sections, you'll build out the pipeline by adding the configuration for each stage.
In your project:
Create a .gitlab-ci.yml file.
In the file, add the following base configuration:
# Base image for all jobs
image: alpine:latest
variables:
SBOM_OUTPUT_DIR: "sbom-output"
SBOM_FORMAT: "cyclonedx"
OUTPUT_TYPE: "json"
GROUP_PATH: ${CI_PROJECT_NAMESPACE}
AUTH_HEADER: "${GROUP_DEPLOY_TOKEN:+Deploy-Token: $GROUP_DEPLOY_TOKEN}"
before_script:
- apk add --no-cache curl jq ca-certificates
stages:
- prepare
- collect
- aggregate
- publish
This configuration:
curl for API requests, jq for JSON processing, and ca-certificates to ensure secure HTTPS connectionssbom-output directoryprepare stageThe prepare stage sets up a Python environment and installs the required dependencies.
In your .gitlab-ci.yml file, add the following configuration:
# Set up Python virtual environment and install required packages
prepare_environment:
stage: prepare
script: |
mkdir -p ${SBOM_OUTPUT_DIR}
apk add --no-cache python3 py3-pip py3-virtualenv
python3 -m venv venv
source venv/bin/activate
pip3 install cyclonedx-bom
artifacts:
paths:
- ${SBOM_OUTPUT_DIR}/
- venv/
expire_in: 1 week
This stage:
collect stageThe collect stage gathers package information from your group's package registry.
In your .gitlab-ci.yml file, add the following configuration:
# Collect package information and versions from GitLab registry
collect_group_packages:
stage: collect
script: |
echo "[]" > "${SBOM_OUTPUT_DIR}/packages.json"
GROUP_PATH_ENCODED=$(echo "${GROUP_PATH}" | sed 's|/|%2F|g')
PACKAGES_URL="${CI_API_V4_URL}/groups/${GROUP_PATH_ENCODED}/packages"
# Optional exclusion list - you can add package types you want to exclude
# EXCLUDE_TYPES="terraform"
page=1
while true; do
# Fetch all packages without specifying type, with pagination
response=$(curl --silent --header "${AUTH_HEADER:-"JOB-TOKEN: $CI_JOB_TOKEN"}" \
"${PACKAGES_URL}?per_page=100&page=${page}")
if ! echo "$response" | jq 'type == "array"' > /dev/null 2>&1; then
echo "Error in API response for page $page"
break
fi
count=$(echo "$response" | jq '. | length')
if [ "$count" -eq 0 ]; then
break
fi
# Filter packages if EXCLUDE_TYPES is set
if [ -n "${EXCLUDE_TYPES:-}" ]; then
filtered_response=$(echo "$response" | jq --arg types "$EXCLUDE_TYPES" '[.[] | select(.package_type | inside($types | split(" ")) | not)]')
response="$filtered_response"
count=$(echo "$response" | jq '. | length')
fi
# Merge this page of results with existing data
jq -s '.[0] + .[1]' "${SBOM_OUTPUT_DIR}/packages.json" <(echo "$response") > "${SBOM_OUTPUT_DIR}/packages.tmp.json"
mv "${SBOM_OUTPUT_DIR}/packages.tmp.json" "${SBOM_OUTPUT_DIR}/packages.json"
# Move to next page if we got a full page of results
if [ "$count" -lt 100 ]; then
break
fi
page=$((page + 1))
done
artifacts:
paths:
- ${SBOM_OUTPUT_DIR}/
expire_in: 1 week
dependencies:
- prepare_environment
This stage:
aggregate stageThe aggregate stage processes the collected data and generates the SBOM.
In your .gitlab-ci.yml file, add the following configuration:
# Generate SBOM by aggregating package data
aggregate_sboms:
stage: aggregate
before_script:
- apk add --no-cache python3 py3-pip py3-virtualenv
- python3 -m venv venv
- source venv/bin/activate
- pip3 install --no-cache-dir cyclonedx-bom
script: |
cat > process_sbom.py << 'EOL'
import json
import os
from datetime import datetime
def analyze_version_history(packages_file):
"""Process version information by aggregating packages with same name and type"""
version_history = {}
package_versions = {} # Dict to group packages by name and type
try:
with open(packages_file, 'r') as f:
packages = json.load(f)
if not isinstance(packages, list):
return version_history
# First, group packages by name and type
for package in packages:
key = f"{package.get('name')}:{package.get('package_type')}"
if key not in package_versions:
package_versions[key] = []
package_versions[key].append({
'id': package.get('id'),
'version': package.get('version', 'unknown'),
'created_at': package.get('created_at')
})
# Then process each group to create version history
for package_key, versions in package_versions.items():
# Sort versions by creation date, newest first
versions.sort(key=lambda x: x.get('created_at', ''), reverse=True)
# Use the first package's ID as the key (newest version)
if versions:
package_id = str(versions[0]['id'])
version_history[package_id] = {
'versions': [v['version'] for v in versions],
'latest_version': versions[0]['version'] if versions else None,
'version_count': len(versions),
'first_published': min((v.get('created_at') for v in versions if v.get('created_at')), default=None),
'last_updated': max((v.get('created_at') for v in versions if v.get('created_at')), default=None)
}
except Exception as e:
print(f"Error processing version history: {e}")
return version_history
def merge_package_data(package_file):
"""Combine package data and generate component list"""
merged_components = {}
package_stats = {
'total_packages': 0,
'package_types': {}
}
try:
with open(package_file, 'r') as f:
packages = json.load(f)
if not isinstance(packages, list):
return [], package_stats
for package in packages:
package_stats['total_packages'] += 1
pkg_type = package.get('package_type', 'unknown')
package_stats['package_types'][pkg_type] = package_stats['package_types'].get(pkg_type, 0) + 1
component = {
'type': 'library',
'name': package['name'],
'version': package.get('version', 'unknown'),
'purl': f"pkg:gitlab/{package['name']}@{package.get('version', 'unknown')}",
'package_type': pkg_type,
'properties': [{
'name': 'registry_url',
'value': package.get('_links', {}).get('web_path', '')
}]
}
key = f"{component['name']}:{component['version']}"
if key not in merged_components:
merged_components[key] = component
except Exception as e:
print(f"Error merging package data: {e}")
return [], package_stats
return list(merged_components.values()), package_stats
# Main processing
version_history = analyze_version_history(f"{os.environ['SBOM_OUTPUT_DIR']}/packages.json")
components, stats = merge_package_data(f"{os.environ['SBOM_OUTPUT_DIR']}/packages.json")
stats['version_history'] = version_history
# Create final SBOM document
sbom = {
"bomFormat": os.environ['SBOM_FORMAT'],
"specVersion": "1.4",
"version": 1,
"metadata": {
"timestamp": datetime.utcnow().isoformat(),
"tools": [{
"vendor": "GitLab",
"name": "Package Registry SBOM Generator",
"version": "1.0.0"
}],
"properties": [{
"name": "package_stats",
"value": json.dumps(stats)
}]
},
"components": components
}
# Write results to files
with open(f"{os.environ['SBOM_OUTPUT_DIR']}/merged_sbom.{os.environ['OUTPUT_TYPE']}", 'w') as f:
json.dump(sbom, f, indent=2)
with open(f"{os.environ['SBOM_OUTPUT_DIR']}/package_stats.json", 'w') as f:
json.dump(stats, f, indent=2)
EOL
python3 process_sbom.py
artifacts:
paths:
- ${SBOM_OUTPUT_DIR}/
expire_in: 1 week
dependencies:
- collect_group_packages
This stage:
packages.json filepurl) for each componentpublish stageThe publish stage uploads the generated SBOM and statistics file to GitLab.
In your .gitlab-ci.yml file, add the following configuration:
# Publish SBOM files to GitLab package registry
publish_sbom:
stage: publish
script: |
STATS=$(cat "${SBOM_OUTPUT_DIR}/package_stats.json")
# Upload generated files
curl --header "${AUTH_HEADER:-"JOB-TOKEN: $CI_JOB_TOKEN"}" \
--upload-file "${SBOM_OUTPUT_DIR}/merged_sbom.${OUTPUT_TYPE}" \
"${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/sbom/${CI_COMMIT_SHA}/merged_sbom.${OUTPUT_TYPE}"
curl --header "${AUTH_HEADER:-"JOB-TOKEN: $CI_JOB_TOKEN"}" \
--upload-file "${SBOM_OUTPUT_DIR}/package_stats.json" \
"${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/sbom/${CI_COMMIT_SHA}/package_stats.json"
# Add package description
curl --header "${AUTH_HEADER:-"JOB-TOKEN: $CI_JOB_TOKEN"}" \
--header "Content-Type: application/json" \
--request PUT \
--data @- \
"${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/sbom/${CI_COMMIT_SHA}" << EOF
{
"description": "Group Package Registry SBOM generated on $(date -u)\nStats: ${STATS}"
}
EOF
dependencies:
- aggregate_sboms
This stage:
When the pipeline completes, it generates these files:
merged_sbom.json: The complete SBOM in CycloneDX formatpackage_stats.json: Statistics about your packagesTo access the generated files:
sbom.The SBOM file follows the CycloneDX 1.4 JSON specification, and provides details about published packages, package versions, and artifacts in your group's package registry.
You can also use the SBOM file for compliance and auditing purposes, such as:
When working with CycloneDX files, consider using the following tools:
The statistics file provides package registry analytics and activity tracking.
For example, to analyze your package registry, you can:
To track package registry activity, you can:
You can use a CLI tool like jq with the statistics file
to generate analytics or activity information in a readable
JSON format.
The following code block lists several examples of jq commands you can run against the statistics file for general analysis or reporting purposes:
# Get total package count in registry
jq '.total_packages' package_stats.json
# List package types and their counts
jq '.package_types' package_stats.json
# Find packages with most versions published
jq '.version_history | to_entries | sort_by(.value.version_count) | reverse | .[0:5]' package_stats.json
If you frequently update your package registry, you should update your SBOM accordingly. You can configure pipeline scheduling to generate an updated SBOM based on your publishing activity.
Consider the following recommendations:
To schedule the pipeline:
You might run into the following issues while completing this tutorial.
If you encounter authentication errors:
read_package_registry and write_package_registry scopes.If you're missing package types:
aggregate stageIf you experience memory issues:
For optimal performance:
If you encounter other issues:
curl commands directly.