docs/plans/2026-03-26-feat-cobol-full-language-coverage-plan.md
Deepened on: 2026-03-26 Research agents used: COBOL expert (Phase 1+2), graph value analyst, codebase explorer Sections enhanced: Phase 1 (5 features), Phase 2 (4 features), graph value ranking
USING BY REFERENCE WS-A BY CONTENT WS-B BY VALUE WS-CADDRESS OF and OMITTED must be filtered from parameter listsImplement the remaining 25 unhandled COBOL language features and fix 10 partial features to achieve ~95% coverage (up from 71.9%). The goal is to build the richest possible knowledge graph from COBOL codebases, enabling a future modernize MCP command (out of scope for this plan) that would use the graph to assist with COBOL-to-modern-language migration.
The COBOL processor currently handles 54 of 89 applicable language features (71.9%). The 25 unhandled features represent real data loss in the knowledge graph:
Implement features in 4 phases, ordered by graph value density (edges created per LOC of implementation). Each phase is independently shippable and testable.
The highest-ROI features: they create new ACCESSES and IMPORTS edges that directly improve impact analysis.
Critical research finding: Multi-line statement accumulation is the dominant challenge. CALL USING, STRING/UNSTRING, and multi-line data item clauses all span multiple lines in production COBOL. The free-format path processes each line independently — these features need statement accumulators (like SORT/SELECT) or the free-format path needs multi-line awareness. Estimated LOC increased from 110 to 150 to account for accumulator infrastructure.
cobol-preprocessor.ts (parseExecSqlBlock)INCLUDE as the operation, extract member name, emit as a copies[] entrysql-includeEXEC SQL INCLUDE SQLCA END-EXEC and EXEC SQL INCLUDE CUSTCOPY END-EXECResearch insights (EXEC SQL INCLUDE):
EXEC SQL INCLUDE CUST_TBL_DCL END-EXEC — regex must use [A-Z][A-Z0-9_-]+EXEC SQL INCLUDE 'DBRMLIB.MEMBER' END-EXEC (z/OS PDS qualified name)INCLUDE to OP_MAP in parseExecSqlBlock; extract member via RE_SQL_INCLUDE = /^INCLUDE\s+(?:'([^']+)'|"([^"]+)"|([A-Z][A-Z0-9_-]+))/icobol-preprocessor.ts (processLogicalLine CALL section)calls[].parameters: string[]parameters?: string[] to calls array type in CobolRegexResultscobol-processor.ts (CALL edge block)cobol-call-usingCALL 'AUDITLOG' USING CUST-ID WS-AMOUNT -> 2 ACCESSES edgesResearch insights (CALL USING forms):
CALL 'PGM' USING BY REFERENCE WS-A BY CONTENT WS-B BY VALUE WS-CCALL 'PGM' USING ADDRESS OF WS-ACALL 'PGM' USING OMITTED WS-BADDRESS, OMITTED, LENGTH to USING_KEYWORDS (already has BY/VALUE/REFERENCE/CONTENT)cobol-preprocessor.ts (new section in extractProcedure)strings: Array<{ sources: string[]; target: string; type: 'string' | 'unstring'; line: number; caller: string | null }> to CobolRegexResultscobol-string-read / cobol-string-writeResearch insights (STRING/UNSTRING):
DELIMITED BY. Filter: STRING, DELIMITED, BY, SIZE, ALL, INTO, WITH, POINTER, ON, OVERFLOW, NOT, END-STRING'text') must be filtered — quote-aware tokenization neededSTRING ... DISPLAY without period between themcobol-preprocessor.ts (parseDataItemClauses)dependingOn?: string, occursMax?: number, occursKeys?: Array<{direction: string; fields: string[]}>, indexedBy?: string[] to data itemscobol-depends-on05 WS-TABLE OCCURS 100 DEPENDING ON WS-COUNT -> edgeResearch insights (OCCURS):
OCCURS 0 TO n DEPENDING ON (zero minimum) and OCCURS UNBOUNDED DEPENDING ON (V6.4)DEPENDING ON WS-COUNT(WS-IDX) — strip subscripts before storing05 WS-TABLE\n OCCURS 100\n DEPENDING ON WS-COUNT. — the current RE_DATA_ITEM only gets the first line, rest is empty. Fixing properly requires a data item accumulator (like SELECT). Defer full fix to Phase 3; implement same-line capture now.ASCENDING KEY IS WS-KEY-1 WS-KEY-2 — capture for SEARCH ALL resolutionINDEXED BY IDX-1 IDX-2 — capture for SET/SEARCH contextcobol-preprocessor.ts (parseDataItemClauses)values?: string[] on data items (currently only populated for 88-level)01 WS-STATUS PIC X VALUE 'A' -> values: ['A']Research insights (VALUE forms):
VALUE X'F1F2F3F4', National: VALUE N'text', DBCS: VALUE G'text'VALUE ALL '*'VALUE -123.45, VALUE +1VALUE IS optional — both VALUE 'A' and VALUE IS 'A' validVALUE 100. — is . decimal or terminator? parseDataItemClauses already strips trailing period, so this is handledVALUE 1.0E5 — extend numeric regex if neededextractValue(rest) function, not a single complex regexIMS/DB support and error handling flows.
cobol-preprocessor.ts (processLogicalLine — add RE_EXEC_DLI_START check alongside SQL/CICS)execDliBlocks: Array<{ line: number; verb: string; pcbNumber?: number; segmentName?: string; intoField?: string; fromField?: string; whereField?: string; psbName?: string }> to CobolRegexResults<ims>:<segmentName> Record node with reason dli-{verb}; ACCESSES edges to INTO/FROM data areas; PSB ACCESSES for SCHDEXEC DLI GU USING PCB(1) SEGMENT(CUSTOMER) INTO(WS-CUST) END-EXECResearch insights (dual IMS interface):
CALL 'CBLTDLI' USING function-code PCB io-area SSA1..SSA15matchAll on segment regexcobol-preprocessor.ts (processLogicalLine — detect DECLARATIVES keyword, track USE AFTER blocks)DECLARATIVES. is encountered, switch to declaratives mode. Extract USE statements binding sections to files/modes.declaratives: Array<{ sectionName: string; useType: 'error' | 'debug' | 'label' | 'reporting'; target: string; line: number }> to CobolRegexResultscobol-declarative-error-handlerResearch insights (DECLARATIVES syntax):
USE AFTER STANDARD {EXCEPTION|ERROR} ON {file-name|INPUT|OUTPUT|I-O|EXTEND}END DECLARATIVES. must NOT reset PROCEDURE DIVISION stateDECLARATIVES is already in EXCLUDED_PARA_NAMES — no false paragraph riskcobol-preprocessor.ts (extractProcedure — new RE_SET regex)sets: Array<{ targets: string[]; form: 'to-true'|'to-value'|'up-by'|'down-by'|'address-of'|'to-null'|'to-entry'; value?: string; entryTarget?: string; entryIsLiteral?: boolean; line: number; caller: string | null }> to CobolRegexResultscobol-set-condition (TO TRUE), cobol-set-index (TO/UP/DOWN), cobol-set-address (ADDRESS OF). SET ENTRY with literal -> CALLS edge.SET WS-EOF TO TRUE, SET IDX-1 TO 5, SET IDX-1 UP BY 1Research insights (SET forms by frequency):
SET condition TO TRUE — 80-90% of all SET usage. Multiple targets: SET COND-A COND-B TO TRUESET index TO/UP BY/DOWN BY — ~8%. Multiple indices: SET IDX-1 IDX-2 UP BY 1SET pointer TO ADDRESS OF data-item / SET ADDRESS OF data-item TO pointer — ~2%SET proc-ptr TO ENTRY "PROGNAME" — rare but creates CALLS edge (like dynamic CALL)SET COND-A OF WS-RECORD TO TRUE (strip OF WS-RECORD)cobol-preprocessor.ts (extractProcedure — new inspectAccum accumulator like SORT)inspects: Array<{ inspectedField: string; counters: string[]; form: 'tallying'|'replacing'|'converting'|'tallying-replacing'; line: number; caller: string | null }> to CobolRegexResultscobol-inspect-read/cobol-inspect-write/cobol-inspect-tallyINSPECT WS-FIELD TALLYING WS-COUNT FOR ALL 'A' -> read on WS-FIELD, write on WS-COUNTResearch insights (INSPECT forms by frequency):
INSPECT WS-STR REPLACING ALL 'A' BY 'B'INSPECT WS-STR TALLYING WS-CNT FOR ALL 'A' — multiple counters possibleINSPECT WS-STR CONVERTING 'abc' TO 'ABC'([A-Z][A-Z0-9-]+)\s+FOR\b matchAll patternFix the 10 partial features and small gaps.
calls[].returning?: stringcobol-call-returningisOptional: boolean in FileDeclaration interface/\bALTERNATE\s+RECORD\s+KEY\s+(?:IS\s+)?([A-Z][A-Z0-9-]+)/ialternateKeys?: string[]/\bPROGRAM-ID\.\s*([A-Z][A-Z0-9-]+)(?:\s+IS\s+COMMON)?/iisCommon: boolean on Module nodeisExternal?: boolean, isGlobal?: boolean to data item interfacegraph.addNode({ ..., properties: { ..., author, dateWritten } })Low-priority but nice for completeness.
/\bINITIALIZE\s+([A-Z][A-Z0-9-]+)/icobol-initializecobol-data-flow.ts module if it exceeds 1500.Research agent analyzed all 5 MCP tools (query, context, impact, detect_changes, rename) against planned edge types:
| Edge Type | QUERY | CONTEXT | IMPACT | DETECT | RENAME | Overall |
|---|---|---|---|---|---|---|
cobol-call-using | 4/5 | 5/5 | 5/5 | 4/5 | 4/5 | 9.2/10 |
cobol-error-handler | 5/5 | 4/5 | 5/5 | 5/5 | 2/5 | 9.0/10 |
dli-* (IMS verbs) | 4/5 | 4/5 | 5/5 | 4/5 | 2/5 | 8.2/10 |
cobol-string-* | 4/5 | 3/5 | 3/5 | 3/5 | 2/5 | 6.2/10 |
Key finding: cobol-call-using alone would fix ~40% of missing caller references in COBOL graphs.
This plan provides the graph data foundation for a future modernize MCP command (out of scope) that would:
MCP tool enhancements needed (after this plan ships):
cobol-call-using, cobol-error-handler, dli-* to IMPACT tool's default relationTypes for COBOL reposIMPACT_RELATION_CONFIDENCEVALID_RELATION_TYPES set (local-backend.ts:52)docs/plans/2026-03-25-feat-cobol-100-percent-feature-coverage-plan.mddocs/code-indexing/cobol/ (7 documentation files)