syft/pkg/cataloger/internal/cpegenerate/README.md
This package generates Common Platform Enumeration (CPE) identifiers for software packages discovered by Syft. CPEs are standardized identifiers that enable vulnerability matching by linking packages to known vulnerabilities in databases like the National Vulnerability Database (NVD).
CPE generation in Syft uses a two-tier approach to balance accuracy and coverage:
This dual approach ensures:
CPEs link discovered packages to security vulnerabilities (CVEs) in tools like Grype. Without accurate CPE generation, vulnerability scanning misses security issues.
┌─────────────────────────────────────────────────────────┐
│ Syft Package Discovery │
└──────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ CPE Generation │
│ (this package) │
└──────────┬──────────┘
│
┌───────────┴────────────┐
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────────┐
│ Dictionary │ │ Heuristic │
│ Lookup │ │ Generation │
│ │ │ │
│ • Embedded index │ │ • Ecosystem rules │
│ • ~22K entries │ │ • Vendor/product │
│ • 11 ecosystems │ │ candidates │
└──────────────────┘ │ • Curated mappings │
│ • Smart filters │
└─────────────────────┘
The dictionary is generated offline and embedded into the Syft binary for fast, offline lookups.
Location: dictionary/index-generator/
Process:
.cpe-cache/)h) and OS (o) CPEs (keeps only applications a)ecosystem → package_name → [CPE strings]data/cpe-index.json embedded via go:embed directiveEntry Point: generate.go
When Syft discovers a package:
FromDictionaryFind):
NVDDictionaryLookupSourceFromPackageAttributes):
GeneratedSourceDictionary Lookups (11 ecosystems): npm, RubyGems, PyPI, Jenkins Plugins, crates.io, PHP, Go Modules, WordPress Plugins/Themes
Heuristic Generation (all package types): All dictionary ecosystems plus Java, .NET/NuGet, Alpine APK, Debian/RPM, and any other package type Syft discovers
The heuristic generator uses per-ecosystem strategies:
_project suffix variantsgithub.com/org/repo)@scope/package)curl → haxx, spring-boot → pivotal, etc.{
"ecosystems": {
"npm": {
"lodash": ["cpe:2.3:a:lodash:lodash:*:*:*:*:*:node.js:*:*"]
},
"pypi": {
"Django": ["cpe:2.3:a:djangoproject:django:*:*:*:*:*:python:*:*"]
}
}
}
The dictionary generator maps packages to ecosystems using reference URL patterns (npmjs.com, pypi.org, rubygems.org, etc.).
The CPE dictionary should be updated periodically to include new packages:
# Full workflow: pull cache → update from NVD → build index
make generate:cpe-index
# Or run individual steps:
make generate:cpe-index:cache:pull # Pull cached CPE data from ORAS
make generate:cpe-index:cache:update # Fetch updates from NVD Products API
make generate:cpe-index:build # Generate cpe-index.json from cache
Optional: Set NVD_API_KEY for faster updates (50 req/30s vs 5 req/30s)
This workflow:
Add dictionary support for a new ecosystem:
index-generator/generate.gomake generate:cpe-indexImprove heuristic generation:
java.go, python.go)candidate_by_package_type.goKey files:
generate.go - Main generation logicdictionary/ - Dictionary generator and embedded indexcandidate_by_package_type.go - Ecosystem-specific candidatesfilter.go - Filtering rules