docs/code-indexing/cobol/file-detection.md
GitNexus detects COBOL files through two mechanisms: extension-based mapping and directory-based override for extensionless files. This document covers both, plus the copybook/program classification logic.
| Extension | Type |
|---|---|
.cbl | COBOL program |
.cob | COBOL program |
.cobol | COBOL program |
| Extension | Type | Notes |
|---|---|---|
.cpy | Copybook | Standard |
.copy | Copybook | Standard |
.gnm / .GNM | Copybook | Enterprise (GnuCOBOL naming) |
.fd / .FD | Copybook | File Description fragment |
.wrk / .WRK | Copybook | Working-Storage fragment |
.sel / .SEL | Copybook | SELECT clause fragment |
.open / .OPEN | Copybook | File OPEN fragment |
.close / .CLOSE | Copybook | File CLOSE fragment |
.ini / .INI | Copybook | Initialization fragment |
.def / .DEF | Copybook | Definition fragment |
All extension matching is case-sensitive in getLanguageFromFilename (the extensions above are matched as written, including uppercase variants like .GNM).
GITNEXUS_COBOL_DIRSMany enterprise COBOL repositories use extensionless files -- the filename alone identifies the program (e.g., s/BGTABFL is the source for program BGTABFL). GitNexus handles this via the GITNEXUS_COBOL_DIRS environment variable.
Set GITNEXUS_COBOL_DIRS to a comma-separated list of directory names:
# Files in s/, c/, and wfproc/ directories (at any depth) are treated as COBOL
export GITNEXUS_COBOL_DIRS=s,c,wfproc
The matching is case-insensitive and checks all path segments:
/repo/s/BGTABFL -- matches segment s -- COBOL/repo/src/c/CPSESP -- matches segment c -- COBOL/repo/wfproc/WF001 -- matches segment wfproc -- COBOL/repo/docs/README -- no matching segment -- skippedflowchart TD
A[getLanguageFromPath] --> B[getLanguageFromFilename]
B --> C{Known extension?}
C -->|Yes .cbl/.cob/.cobol/.cpy/...| D[Return COBOL]
C -->|Yes .ts/.py/.java/...| E[Return other language]
C -->|No match| F{Has extension?}
F -->|"Has dot in basename"| G[Return null]
F -->|"No dot = extensionless"| H{GITNEXUS_COBOL_DIRS set?}
H -->|No| G
H -->|Yes| I{Any path segment
matches a configured dir?}
I -->|Yes| D
I -->|No| G
style D fill:#e8f5e9,stroke:#2e7d32
style G fill:#ffebee,stroke:#c62828
The GITNEXUS_COBOL_DIRS value is parsed once (on first call) and cached in a Set<string>:
// From gitnexus/src/core/ingestion/utils.ts
const getCobolDirs = (): Set<string> => {
if (_cobolDirs) return _cobolDirs;
const raw = process.env.GITNEXUS_COBOL_DIRS;
_cobolDirs = raw
? new Set(raw.split(',').map(d => d.trim().toLowerCase()))
: new Set();
return _cobolDirs;
};
The path segment check splits the full path on / and tests each segment against the cached set.
After a file is identified as COBOL, it must be classified as either a program (to be parsed for symbols) or a copybook (to be loaded into the copybook map for COPY expansion).
A COBOL file is classified as a copybook if ANY of these conditions is true:
.cpy, .copy, .gnm, .fd, .wrk, .sel, .open, .close, .ini, .def)c, copy, copybooks, copylib, cpyA file is classified as a program if:
.cbl, .cob, .cobol), ORCopybook names are derived from the filename:
Examples:
c/CPSESP -- name: CPSESPcopy/workgrid.cpy -- name: WORKGRIDc/ANAZI.GNM -- name: ANAZIThis name is used to resolve COPY CPSESP. statements during expansion.
gitnexus/src/core/ingestion/utils.ts -- getLanguageFromPath(), getLanguageFromFilename(), getCobolDirs()gitnexus/src/core/ingestion/pipeline.ts -- isCobolCopybook(), getCopybookName(), COPYBOOK_EXTENSIONS, COBOL_PROGRAM_EXTENSIONS