Back to Gitnexus

COBOL Graph Model

docs/code-indexing/cobol/graph-model.md

1.6.38.8 KB
Original Source

COBOL Graph Model

This document describes the graph nodes and edges that GitNexus creates for COBOL codebases. The COBOL graph model is richer than most tree-sitter languages because it captures domain-specific constructs: file declarations, FD entries, data hierarchies, SQL tables, CICS maps, and cross-program contracts.

Entity-Relationship Diagram

mermaid
erDiagram
    File ||--o{ Module : DEFINES
    File ||--o{ Function : DEFINES
    File ||--o{ Namespace : DEFINES
    File ||--o{ Record : DEFINES
    File ||--o{ Property : DEFINES
    File ||--o{ Const : DEFINES
    File ||--o{ CodeElement : DEFINES
    File ||--o{ Constructor : DEFINES
    File }o--o{ File : IMPORTS

    Module ||--o{ Record : CONTAINS
    Module ||--o{ Constructor : CONTAINS
    Module }o--o{ CodeElement : ACCESSES
    Module }o--o{ Module : CALLS
    Module }o--o{ Module : CONTRACTS
    Module }o--o{ Property : RECEIVES

    Record ||--o{ Property : CONTAINS
    Record ||--o{ Const : CONTAINS
    Record }o--o{ Record : REDEFINES

    Property ||--o{ Property : CONTAINS
    Property ||--o{ Const : CONTAINS
    Property }o--o{ Property : REDEFINES
    Property }o--o{ CodeElement : RECORD_KEY_OF
    Property }o--o{ CodeElement : FILE_STATUS_OF

    CodeElement ||--o{ CodeElement : CONTAINS
    CodeElement ||--o{ Record : CONTAINS

    Function }o--o{ Function : CALLS

Node Types

Node TypeCOBOL ConceptCreated FromExample
ModulePROGRAM-IDPROGRAM-ID. BGTABFLName: BGTABFL, description may include author and date
FunctionParagraphPROCESS-RECORD. at column 8Name: PROCESS-RECORD
NamespaceProcedure sectionMAIN-LOGIC SECTION. at column 8Name: MAIN-LOGIC
Record01-level data item01 WK-EMPLOYEE.Description: level:01 section:working-storage
Property02-49/66/77 data item05 WK-NAME PIC X(30).Description: level:05 pic:X(30) section:working-storage
Const88-level condition88 WK-ACTIVE VALUE "A".Description: level:88 values:A
CodeElementSELECT, FD, SQL table, CICS map, cursor, transidVariousDescription varies by subtype
ConstructorENTRY pointENTRY "SUBPROG" USING WK-DATADescription: entry params:WK-DATA

CodeElement Subtypes

CodeElement is used for multiple COBOL constructs, distinguished by their description prefix:

SubtypeID PatternDescription FormatExample
File SELECTCodeElement:{path}:SELECT:{name}select org:INDEXED access:DYNAMIC ...SELECT MASTER-FILE
FD entryCodeElement:{path}:FD:{name}fd record:{recordName}FD MASTER-FILE
SQL tableCodeElement:{path}:sql-table:{name}sql-table op:SELECTTable EMPLOYEES
SQL cursorCodeElement:{path}:sql-cursor:{name}sql-cursorCursor C-EMPLOYEES
CICS mapCodeElement:{path}:cics-map:{name}cics-map cmd:SEND MAPMap EMPMENU
CICS transidCodeElement:{path}:cics-transid:{name}cics-transid cmd:STARTTransid EMP1

Edge Types

Edge TypeSourceTargetCreated ByConfidenceExample
DEFINESFileany nodeFile defines its symbols1.0File -> Module BGTABFL
CALLSFunctionFunctionPERFORM X [THRU Y](via call-processor)PROCESS-RECORD -> CALC-TAX
CALLSModuleModuleCALL "BGTABUP"(via call-processor)BGTABFL -> BGTABUP
CALLSModuleModuleEXEC CICS LINK PROGRAM('X')(via call-processor)BGTABFL -> BGTABUP
IMPORTSFileFileCOPY copybook(via import-processor)Source file -> Copybook file
CONTAINSModuleRecordData hierarchy root1.0BGTABFL -> WK-EMPLOYEE
CONTAINSRecordPropertyData hierarchy1.0WK-EMPLOYEE -> WK-NAME
CONTAINSPropertyPropertyNested data items1.0WK-ADDRESS -> WK-CITY
CONTAINSRecord/PropertyConst88-level parent1.0WK-STATUS -> WK-ACTIVE
CONTAINSCodeElement (FD)RecordFD record link1.0FD:MASTER-FILE -> MASTER-RECORD
CONTAINSCodeElement (SELECT)CodeElement (FD)SELECT-FD link0.9SELECT:MASTER-FILE -> FD:MASTER-FILE
CONTAINSModuleConstructorENTRY in module1.0BGTABFL -> SUBPROG
REDEFINESRecordRecord01 X REDEFINES Y1.0WK-DATE-NUM -> WK-DATE-ALPHA
REDEFINESPropertyProperty05 X REDEFINES Y1.0WK-CODE-NUM -> WK-CODE-ALPHA
RECORD_KEY_OFPropertyCodeElement (SELECT)RECORD KEY IS field0.8WK-EMP-ID -> SELECT:MASTER-FILE
FILE_STATUS_OFPropertyCodeElement (SELECT)FILE STATUS IS field0.8WK-FS -> SELECT:MASTER-FILE
ACCESSESModuleCodeElementEXEC SQL/CICS0.9BGTABFL -> sql-table:EMPLOYEES
RECEIVESModulePropertyPROCEDURE USING0.8BGTABFL -> WK-INPUT-REC
CONTRACTSModuleModuleShared copybook detection0.9BGTABFL -> BGTABUP (via CPSESP)

Full Annotated Example

Given this COBOL program:

cobol
       IDENTIFICATION DIVISION.
       PROGRAM-ID. EMPMAINT.
       AUTHOR. Development Team.

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT EMP-FILE
               ASSIGN TO "EMPLOYEE.DAT"
               ORGANIZATION IS INDEXED
               ACCESS MODE IS DYNAMIC
               RECORD KEY IS EMP-ID
               FILE STATUS IS WS-FILE-STATUS.

       DATA DIVISION.
       FILE SECTION.
       FD  EMP-FILE.
       01  EMP-RECORD.
           05  EMP-ID             PIC 9(6).
           05  EMP-NAME           PIC X(30).

       WORKING-STORAGE SECTION.
       01  WS-FLAGS.
           05  WS-FILE-STATUS     PIC X(02).
           05  WS-EOF-FLAG        PIC X(01).
               88  WS-EOF         VALUE "Y".

       LINKAGE SECTION.
       01  LK-SEARCH-KEY          PIC 9(6).

       PROCEDURE DIVISION USING LK-SEARCH-KEY.
       MAIN-LOGIC SECTION.
       MAIN-START.
           PERFORM OPEN-FILE
           PERFORM PROCESS-RECORDS
           PERFORM CLOSE-FILE
           STOP RUN.

       OPEN-FILE.
           OPEN I-O EMP-FILE.

       PROCESS-RECORDS.
           MOVE LK-SEARCH-KEY TO EMP-ID
           EXEC SQL
               SELECT EMP_SALARY INTO :WS-SALARY
               FROM EMPLOYEES
               WHERE EMP_ID = :EMP-ID
           END-EXEC
           CALL "EMPREPORT".

       CLOSE-FILE.
           CLOSE EMP-FILE.

The graph produced contains:

Nodes:

  • Module: EMPMAINT (description: author:Development Team)
  • Namespace: MAIN-LOGIC
  • Function: MAIN-START, OPEN-FILE, PROCESS-RECORDS, CLOSE-FILE
  • Record: EMP-RECORD, WS-FLAGS, LK-SEARCH-KEY
  • Property: EMP-ID, EMP-NAME, WS-FILE-STATUS, WS-EOF-FLAG
  • Const: WS-EOF (values: Y)
  • CodeElement: SELECT:EMP-FILE, FD:EMP-FILE, sql-table:EMPLOYEES
  • (COPY imports, if any, would produce File IMPORTS edges)

Edges:

  • DEFINES: File -> all nodes
  • CONTAINS: EMPMAINT -> EMP-RECORD, EMPMAINT -> WS-FLAGS, EMPMAINT -> LK-SEARCH-KEY
  • CONTAINS: EMP-RECORD -> EMP-ID, EMP-RECORD -> EMP-NAME
  • CONTAINS: WS-FLAGS -> WS-FILE-STATUS, WS-FLAGS -> WS-EOF-FLAG
  • CONTAINS: WS-EOF-FLAG -> WS-EOF
  • CONTAINS: FD:EMP-FILE -> EMP-RECORD
  • CONTAINS: SELECT:EMP-FILE -> FD:EMP-FILE
  • CALLS: MAIN-START -> OPEN-FILE, MAIN-START -> PROCESS-RECORDS, MAIN-START -> CLOSE-FILE
  • CALLS: EMPMAINT -> EMPREPORT (external CALL)
  • ACCESSES: EMPMAINT -> sql-table:EMPLOYEES
  • RECEIVES: EMPMAINT -> LK-SEARCH-KEY (PROCEDURE USING)
  • RECORD_KEY_OF: EMP-ID -> SELECT:EMP-FILE
  • FILE_STATUS_OF: WS-FILE-STATUS -> SELECT:EMP-FILE

How COBOL Differs from Tree-Sitter Languages

AspectCOBOLTree-Sitter Languages
Node variety8 types (Module, Function, Namespace, Record, Property, Const, CodeElement, Constructor)Typically 4-6 (Function, Class, Method, Interface, Module, Const)
Domain edgesRECORD_KEY_OF, FILE_STATUS_OF, ACCESSES, RECEIVES, CONTRACTS, REDEFINESPrimarily CALLS, IMPORTS, EXTENDS, IMPLEMENTS
Data hierarchyDeep CONTAINS chains (01 -> 05 -> 10 -> 88)Flat class members
Cross-program callsCALL "name" + CICS LINK PROGRAMImport-based resolution
Contract detectionShared COPY copybook between caller/calleeNot applicable
MetadataAUTHOR, DATE-WRITTEN on ModuleJSDoc/docstring (not indexed)

Source Files

  • gitnexus/src/core/ingestion/workers/parse-worker.ts -- processCobolRegexOnly(), node/edge emission logic
  • gitnexus/src/core/ingestion/pipeline.ts -- detectCrossProgamContracts() for CONTRACTS edges
  • gitnexus/src/core/ingestion/cobol-preprocessor.ts -- CobolRegexResults interface (all extracted data)