doc/ARCHITECTURE.md
A contributor-oriented map of how sqlmap is put together: the major components, how a run flows through them, and where to start looking for a given concern.
This is a map, not a spec. It describes the durable structure and data flow; for exact signatures, option names, and enumerable lists (tampers, DBMSes, options), the source is authoritative. When this document disagrees with the code, the code wins.
sqlmap runs on both Python 2.7 and 3.x; sources are kept pure-ASCII unless a literal
non-ASCII byte is unavoidable. Compatibility shims live in lib/core/compat.py and
thirdparty/six.
| Entry | File | Purpose |
|---|---|---|
| CLI | sqlmap.py -> main() | the scanner. Applies runtime patches, parses options, runs a scan. |
| REST API | sqlmapapi.py | -s server / -c client wrappers around lib/utils/api.py. |
main() (sqlmap.py) does, in order: dirtyPatches() (monkey-patches stdlib for
quirks/security - see below), setPaths(), init() (option parsing + environment
setup), then dispatches to start() for a normal scan, or to the self-tests
(--smoke / --vuln-test / --api-test) in lib/core/testing.py.
conf and kbAlmost everything hangs off two process-global singletons defined in lib/core/data.py,
both AttribDict (attribute-accessible dicts; missing keys read back as None):
conf - the resolved user configuration (options + derived settings). What the
user asked for.kb ("knowledge base") - mutable runtime state discovered during a run
(identified DBMS, injection points, page templates, caches, locks, counters).The configuration pipeline (lib/core/):
parse/cmdline.py - argparse definition of every CLI option.core/optiondict.py - option name -> type map (used for config-file/API coercion).core/defaults.py - default values.core/option.py - the heavy lifter: _setConfAttributes(), _setKnowledgeBaseAttributes(),
_setHTTPHandlers() (installs the global urllib opener incl. keep-alive), DBMS/encoding
setup, etc. Merges CLI + config file + defaults into conf/kb.core/settings.py - constants, version, regexes, thresholds. New constants go here.Identifiers in the codebase are camelCase.
| Path | Responsibility |
|---|---|
lib/core/ | conf/kb model, common helpers, settings, enums, dump, session, agent, option parsing |
lib/controller/ | the scan orchestrator (controller.py), detection checks (checks.py), enumeration dispatch (action.py), DBMS handler selection (handler.py) |
lib/request/ | HTTP layer: connect.py (sending), comparison.py (the true/false oracle), inject.py (value extraction), protocol handlers, response processing |
lib/techniques/ | the exploitation engines: blind/inference.py, error/use.py, union/{test,use}.py, dns/ |
lib/parse/ | parsing of inputs: CLI, config, HTTP request/log files, HTML, sitemap, and the XML payload/boundary loader (payloads.py) |
lib/utils/ | feature modules: api.py (REST), hashdb.py (session), crawler.py, hash.py (cracking), har.py, brute.py, search.py, ... |
lib/takeover/ | OS-level takeover: shells, file access, UDF, registry, Metasploit, xp_cmdshell |
plugins/generic/ | DBMS-agnostic enumeration/fingerprint/filesystem/takeover base classes |
plugins/dbms/<dbms>/ | per-DBMS subclasses + dialect (one dir per supported DBMS) |
tamper/ | payload-mutation scripts (WAF bypass), one tamper() per file |
data/xml/ | the data-driven engine: boundaries.xml, payloads/*.xml, queries.xml, errors.xml |
data/ (other) | wordlists/common tables/columns (txt/), UDFs (udf/), stored procs (procs/), shells (shell/) |
tests/ | stdlib-unittest suite (offline); see section 11 |
thirdparty/ | vendored dependencies (six, bottle, chardet, ...) - no pip at runtime |
extra/ | auxiliary tools (e.g. vulnserver used by --vuln-test) |
lib/controller/controller.py: start())For each target:
initTargetEnv() / setupTargetEnv() (lib/core/target.py):
resolve URL/params, open the per-target output dir and session file
(conf.hashDBFile), and resume anything already known (DBMS, injection points,
cached values) from the session.lib/controller/checks.py): checkConnection(),
checkWaf() (fills kb.identifiedWafs), checkStability() /
dynamic-content detection (establishes kb.pageTemplate, kb.matchRatio).heuristicCheckSqlInjection() (cheap error-based hint).checkSqlInjection(place, parameter, value) per parameter, driven by
the data engine (section 5). Confirmed points are appended to kb.injections.lib/controller/handler.py: setHandler() identifies the
back-end DBMS and assigns conf.dbmsHandler, the object through which all
enumeration is dispatched (section 7).action() (lib/controller/action.py) routes the requested operation
(--banner, --dbs, --tables, --dump, --sql-query, --os-shell, ...) to
conf.dbmsHandler methods, and feeds results to conf.dumper.If nothing is injectable, the dead-end advisory (level/risk, technique, --text-only,
--tamper - definitive when kb.identifiedWafs is set) is raised as
SqlmapNotVulnerableException.
Detection behavior lives in data, not code - data/xml/, loaded by
lib/parse/payloads.py (loadBoundaries(), loadPayloads()):
boundaries.xml - injection boundaries: prefix/suffix pairs and the
clause/where/parameter-type context they apply to (e.g. quote vs. numeric contexts).payloads/*.xml - the tests, one file per technique
(boolean_blind, error_based, inline_query, stacked_queries, time_blind,
union_query), each with the request template and the comparison/grep logic that
decides success.getSortedInjectionTests() (lib/core/common.py) orders the candidate tests by the
identified/likely DBMS, --level, and --risk. The agent (lib/core/agent.py)
forges the actual payload string - applying boundary prefix/suffix, the [RANDNUM]/
[DELIMITER]-style markers, comments, and tamper scripts. Requests go out via
lib/request/connect.py; the oracle lib/request/comparison.py decides true/false
by comparing the response against kb.pageTemplate (difflib ratio vs. kb.matchRatio,
plus titles/errors/HTTP-code signals).
Once a parameter is injectable, value extraction is dispatched by
lib/request/inject.py: getValue() to the matching engine in lib/techniques/:
| Technique | Engine | Mechanism |
|---|---|---|
| boolean-based blind | blind/inference.py: bisection() | binary-search each character via true/false oracle |
| time-based blind / stacked | blind/inference.py (time compare) | same bisection, oracle is a measured delay |
| error-based | error/use.py: errorUse() | parse the value straight out of a provoked DB error |
| UNION query | union/{test,use}.py | column-count detection then UNION SELECT extraction |
| inline query | (inline, via inject) | value embedded in the original query position |
| DNS exfiltration | dns/ | --dns-domain out-of-band channel |
bisection() is the hot loop; it caches the --charset table in
kb.cache.charsetAsciiTbl and respects the kb.disableShiftTable runaway-guard latch
(intentional). Multi-threaded extraction is coordinated via kb.locks and
getCurrentThreadData() (lib/core/threads.py).
Enumeration is DBMS-agnostic at the top and specialized underneath:
plugins/generic/ - base classes for each concern: fingerprint.py,
enumeration.py, databases.py, entries.py, users.py, filesystem.py,
takeover.py, syntax.py, misc.py, search.py, custom.py, connector.py
(direct DB connection for -d).plugins/dbms/<dbms>/ - one directory per supported DBMS, subclassing the generic
pieces and supplying dialect specifics.data/xml/queries.xml - per-DBMS SQL query templates (banner, current user, table
enumeration, casting, etc.) keyed by DBMS. The generic code asks for a query by name;
the dialect comes from XML.conf.dbmsHandler (set in handler.py) is the live object that action() calls into.
conf.dumper is a Dump instance (lib/core/dump.py): console tables
plus per-table file export in CSV / HTML / SQLITE / JSONL (--dump-format). Logging
is via logger (lib/core/log.py).<output>/<host>/session.sqlite). hashDBWrite() / hashDBRetrieve()
(lib/core/common.py, backed by lib/utils/hashdb.py) cache injection points,
fingerprint, and extracted values so a re-run resumes instead of re-testing
(--flush-session discards it; --fresh-queries ignores cached query results). A
stale-session nudge fires on resume when the file is older than HASHDB_STALE_DAYS.lib/request/connect.py (Connect.getPage) is the single HTTP chokepoint. Around it:
protocol handlers (httpshandler, redirecthandler, chunkedhandler, rangehandler,
persistent connections via lib/request/keepalive.py), response processing (basic.py), and the
comparison oracle (comparison.py).
Tamper scripts (tamper/) mutate the payload just before sending to evade WAF/IPS.
Each file exposes a tamper(payload, **kwargs) and a __priority__; --tamper=a,b,c
chains them in priority order. They are payload-string transforms only (no engine
coupling), which is why they compose freely.
lib/utils/api.py runs a Bottle server (sqlmapapi.py -s) that drives sqlmap scans as
subprocesses and exposes them over HTTP. Key pieces: DataStore/Task (task registry),
an IPC SQLite Database (the subprocess writes results/logs/errors back through
StdDbOut), and the route handlers (/task/*, /option/*, /scan/*, /version, ...).
The contract is documented in sqlmapapi.yaml (OpenAPI) and REST-API.md.
--report-json reuses the same assembly code (_assembleData / _sanitizeScanData)
that the /scan/<id>/data endpoint uses, so the CLI report and the API result can't
drift; RESTAPI_VERSION is the API contract version (major exposed as integer).
Two complementary layers:
tests/) - stdlib unittest only (no pytest/pip),
green on py2 + py3. _testutils.py bootstraps global state and provides the
property/fuzz harness (Rng - a cross-version-identical PRNG - and for_all). Run:
python -B -m unittest discover -s tests -p "test_*.py" (-B matters: a cached .pyc
makes a getFileType(__file__) doctest see binary).lib/core/testing.py, hidden switches): --smoke-test
(doctests + regex sanity over the whole tree), --vuln-test (end-to-end scans against
the bundled extra/vulnserver), --api-test (live REST round-trip). The CI workflow
(.github/workflows/tests.yml) runs all of these.| I want to change... | Start in |
|---|---|
| a CLI option | lib/parse/cmdline.py (+ optiondict.py, defaults.py) |
| a constant/threshold | lib/core/settings.py |
| how injection is detected | data/xml/boundaries.xml + data/xml/payloads/*.xml, then lib/controller/checks.py |
| how a value is extracted | lib/request/inject.py + the relevant lib/techniques/ engine |
| the true/false decision | lib/request/comparison.py |
| a per-DBMS query/dialect | data/xml/queries.xml + plugins/dbms/<dbms>/ |
| enumeration behavior | plugins/generic/*.py |
| dump/output format | lib/core/dump.py |
| a WAF-bypass transform | add a file under tamper/ |
| the REST API surface | lib/utils/api.py (+ keep sqlmapapi.yaml in sync) |
| session/resume behavior | lib/utils/hashdb.py + hashDB* in lib/core/common.py |
| a stdlib monkey-patch / security shim | lib/core/patch.py |