Back to Hackbrowserdata

RFC-001: Project Architecture & Data Model

rfcs/001-project-architecture.md

1.0.010.3 KB
Original Source

RFC-001: Project Architecture & Data Model

Author: moonD4rk Status: Living Document Created: 2026-04-05

1. Project Positioning

HackBrowserData is a CLI security research tool that extracts and decrypts browser data from Chromium-based browsers and Firefox across Windows, macOS, and Linux.

Key constraints:

  • Go 1.20 — the module must build with Go 1.20 to maintain Windows 7 support. Features from Go 1.21+ (log/slog, slices, maps, cmp) must not be used.
  • Supported engines: Chromium (including Yandex and Opera variants) and Firefox.
  • Supported platforms: Windows (DPAPI), macOS (Keychain), Linux (D-Bus Secret Service).
  • No root-level library API — the CLI calls browser.PickBrowsers() directly; there is no importable pkg/ surface.

2. Directory Structure

HackBrowserData/
├── cmd/hack-browser-data/    # CLI entrypoint: cobra root, dump, list, version
├── browser/                  # Browser interface, PickBrowsers(), platform browser lists
│   ├── chromium/             # Chromium engine: extraction, decryption, profile discovery
│   └── firefox/              # Firefox engine: extraction, NSS key derivation
├── types/                    # Data model: Category enum, Entry structs, BrowserData
├── crypto/                   # Encryption primitives, cipher version detection
│   └── keyretriever/         # Platform-specific master key retrieval (Keychain/DPAPI/D-Bus)
├── filemanager/              # Temp file session, locked file handling (Windows)
├── output/                   # Output Writer: CSV, JSON, CookieEditor formatters
├── log/                      # Logging with level filtering
└── utils/                    # SQLite query helpers, file utilities

3. Core Data Model

3.1 Category

Category is an int enum representing 9 browser-agnostic data kinds: Password, Cookie, Bookmark, History, Download, CreditCard, Extension, LocalStorage, SessionStorage.

Three categories are classified as sensitive (Password, Cookie, CreditCard) via IsSensitive(), enabling safe-by-default export scenarios.

3.2 Entry Types

Each category has a corresponding Entry struct with json and csv struct tags. All structs are flat (no nesting) and use time.Time for timestamps.

StructCategoryKey Fields
LoginEntryPasswordURL, Username, Password, CreatedAt
CookieEntryCookieHost, Path, Name, Value, IsSecure, IsHTTPOnly, ExpireAt, CreatedAt
BookmarkEntryBookmarkName, URL, Folder, CreatedAt
HistoryEntryHistoryURL, Title, VisitCount, LastVisit
DownloadEntryDownloadURL, TargetPath, TotalBytes, StartTime, EndTime
CreditCardEntryCreditCardName, Number, ExpMonth, ExpYear
ExtensionEntryExtensionName, ID, Description, Version
StorageEntryLocalStorage, SessionStorageURL, Key, Value

StorageEntry is shared by both LocalStorage and SessionStorage.

3.3 BrowserData Container

BrowserData is the result container returned by Extract(). It holds typed slices — one per category. The container is populated field-by-field during extraction. The output layer uses makeExtractor[T]() generics to pull the correct slice for serialization.

4. Browser Interface & Registration

4.1 BrowserKind

Each config declares an engine kind that determines source paths and extraction logic. Kinds fall into three engine families:

  • Chromium (Chromium, ChromiumYandex, ChromiumOpera) — the standard Chromium layout plus two variants that override file names or storage paths for Yandex and Opera forks. See RFC-003.
  • Firefox — NSS-based key derivation from key4.db, SQLite + JSON source files. See RFC-005.
  • Safari — macOS only, with direct Keychain-based credential extraction. See RFC-006 §7.

See types/category.go for the authoritative enum definition.

4.2 BrowserConfig

BrowserConfig is the declarative, platform-specific browser definition containing: Key (CLI matching), Name (display), Kind (engine), Storage (keychain label), UserDataDir (data path).

4.3 Browser Selection Flow

There are two entry points, one for extraction and one for discovery:

PickBrowsers(opts)                    // used by `dump` — ready to Extract
  → pickFromConfigs(configs, opts)     // shared discovery core
      → platformBrowsers()             // build-tagged list for this OS
      → filter by name / profile path
      → newBrowsers(cfg)                // dispatch to chromium/firefox/safari.NewBrowsers
          → discoverProfiles()          // scan profile subdirectories
          → resolveSourcePaths()        // stat candidates, first match wins
  → newPlatformInjector(opts)          // build-tagged: returns a func(Browser)
      → for each browser:               // closure captures retriever + keychain pw lazily
          inject(b)                     // type-assert retrieverSetter / keychainPasswordSetter

DiscoverBrowsers(opts)                 // used by `list` / `list --detail`
  → pickFromConfigs(configs, opts)     // same shared discovery core, NO injection

PickBrowsers does discovery + decryption setup in one call; the returned browsers are ready for b.Extract. DiscoverBrowsers skips injection entirely, so list-style commands never trigger the macOS Keychain password prompt — they have no use for the credential. Both entry points share the same pickFromConfigs core, so filtering/profile-path/glob semantics stay consistent.

Key design decisions:

  • One KeyRetriever chain per process — built lazily inside newPlatformInjector and reused across every Chromium browser and every profile to prevent repeated keychain prompts on macOS.
  • Discovery is decoupled from injectionpickFromConfigs is injection-free; DiscoverBrowsers stops after it, PickBrowsers continues into injection.
  • Profile discovery differs by engine: Chromium looks for Preferences files in subdirectories; Firefox accepts any subdirectory containing known source files.
  • Flat layout fallback — Opera-style browsers that store data directly in UserDataDir (no profile subdirectories) are handled by falling back to the base directory.

4.4 Platform Browser Lists

Browser configs are defined per-platform via build tags in platformBrowsers() (browser/browser_{darwin,linux,windows}.go). The supported set groups by engine family:

  • Chromium-based — the largest family, covering mainstream browsers (Chrome, Edge, Brave, Vivaldi, Opera, Chromium) across all three platforms plus regional variants and forks. Windows carries the longest list because of China-region Chromium forks (360, QQ, Sogou, DC, …) and MSIX-packaged browsers with dynamic install paths (Arc, DuckDuckGo).
  • Firefox — all three platforms, via internal NSS key derivation (RFC-005).
  • Safari — macOS only, via direct Keychain InternetPassword extraction (RFC-006 §7).

Adding a new browser is a config-only change in platformBrowsers(); this section does not need updates for new variants within an existing family.

5. Extract() Orchestration

Both Chromium and Firefox engines follow the same extraction pattern:

Extract(categories)
  1. NewSession()               → create isolated temp directory
  2. acquireFiles(session)      → copy source files to temp dir (with dedup and WAL/SHM)
  3. getMasterKey(session)       → platform-specific key retrieval
  4. for each category:
       extractCategory(data, cat, masterKey, path)
  5. defer session.Cleanup()    → remove temp directory

For details on file acquisition, see RFC-008. For encryption details, see RFC-003 (Chromium) and RFC-005 (Firefox). For key retrieval, see RFC-006.

5.1 Collect-and-Continue Pattern

The extraction loop maximizes data recovery. Each category is extracted independently — a failure in one does not affect others. Errors are handled at three levels:

LevelTriggerAction
Session failureTemp dir cannot be createdAbort entirely, return error
Category failureSource file missing or extraction errorSkip category, continue to next
Record failureSingle row decryption failsSkip record, continue extraction

Master key failure is non-fatal. If the key cannot be retrieved, categories requiring decryption (passwords, cookies, credit cards) produce empty values, while non-encrypted categories (history, bookmarks, downloads) still succeed.

5.2 Custom Extractors

The categoryExtractor interface allows browser-specific extraction logic. Yandex and Opera use custom extractors for passwords and extensions respectively, while all other categories fall through to the default Chromium implementation.

6. Dependency Constraints

The module is pinned to go 1.20 in go.mod. This is enforced by a CI lint check that fails if the directive changes.

DependencyVersionPurpose
modernc.org/sqlitev1.31.1 (pinned)Pure-Go SQLite. v1.32+ requires Go 1.21
github.com/syndtr/goleveldbv1.0.0LevelDB for Chromium localStorage/sessionStorage
github.com/tidwall/gjsonv1.18.0JSON path queries
github.com/spf13/cobrav1.10.2CLI framework
github.com/moond4rk/keychainbreakerv0.2.5macOS keychain decryption
github.com/godbus/dbus/v5v5.2.2Linux D-Bus Secret Service
golang.org/x/sysv0.27.0Windows syscalls (DPAPI, DuplicateHandle)
RFCTopic
RFC-002Chromium data file locations and storage formats
RFC-003Chromium encryption mechanisms per platform
RFC-004Firefox data file locations and storage formats
RFC-005Firefox NSS encryption and key derivation
RFC-006Platform-specific master key retrieval
RFC-007CLI commands and output formats
RFC-008File acquisition and platform quirks
RFC-009Windows locked file bypass technique