Back to Opendataloader Pdf

Hybrid hancom-ai Options & CLI Refactoring Implementation Plan

docs/superpowers/plans/2026-04-29-hybrid-hancom-ai-options.md

2.4.253.2 KB
Original Source

Hybrid hancom-ai Options & CLI Refactoring Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Expose 5 hancom-ai-specific knobs as --hybrid-hancom-ai-* CLI options, then refactor CLIOptions so opendataloader-pdfua reuses the entire core option set instead of redefining its own.

Architecture: Three independent commits. (1) Add 5 options + gate validation in core. (2) Extract addAllTo / applyAllTo public API on CLIOptions. (3) Refactor pdfua's Main.java and RemediationConfig to consume that API; collapse 9 RemediationConfig constructors into a Builder; embed Config directly. Hard-break removed getters since all call sites are internal/test.

Tech Stack: Java 11, Apache Commons CLI, JUnit 5, Maven multi-module (opendataloader-pdf-core, opendataloader-pdf-cli, opendataloader-pdfua).


File Structure

opendataloader-pdf (core)

FileActionResponsibility
java/opendataloader-pdf-cli/src/main/java/org/opendataloader/pdf/cli/CLIOptions.javaModifyAdd 5 option constants, 5 OPTION_DEFINITIONS entries, parsing in applyHybridOptions, hancom-ai gate. Add public addAllTo(Options) and applyAllTo(Config, CommandLine).
java/opendataloader-pdf-cli/src/test/java/org/opendataloader/pdf/cli/CLIOptionsTest.javaModifyTests for each new option's parsing + gate enforcement.
options.json (root)RegenerateAuto-generated from CLIOptions via npm run sync.
Python/Node bindingsRegenerateAuto-generated by npm run sync.

opendataloader-pdfua

FileActionResponsibility
src/main/java/org/opendataloader/pdf/Main.javaModifyReplace self-defined hybrid options with CLIOptions.addAllTo(). Add applyPdfuaDefaults(). Use new RemediationConfig.Builder.
src/main/java/org/opendataloader/pdf/remediation/RemediationConfig.javaModifyEmbed core Config. Replace 9 constructors with one Builder. Remove getHybridUrl() / getHybridMode() (delegate via getHybridConfig()).
src/main/java/org/opendataloader/pdf/remediation/RemediationProcessor.javaModifyReplace per-field hybrid forwarding (lines 163-170) with config = remediationConfig.getCoreConfig().
src/test/java/.../AuditBundleEmitterTest.javaModifySwitch to Builder.
src/test/java/.../CertificateIssuerTest.javaModifySwitch to Builder (4 instantiations).
src/test/java/.../AuditManifestBuilderTest.javaModifySwitch to Builder (4 instantiations).
src/test/java/.../RemediationConfigAuditBundleTest.javaModifySwitch to Builder.

Phase 1 — Add 5 hybrid-hancom-ai-* options to core

Task 1.1: Add option name + description constants

Files:

  • Modify: opendataloader-pdf/java/opendataloader-pdf-cli/src/main/java/org/opendataloader/pdf/cli/CLIOptions.java (insert after line 137, the existing HYBRID_FALLBACK_DESC)

  • Step 1: Add the 5 long-option constants and descriptions

Insert after the HYBRID_FALLBACK_DESC block (currently around line 137) and before the // ===== Stdout Output ===== section header:

java
    // ===== Hybrid hancom-ai backend-specific =====
    private static final String HYBRID_HANCOM_AI_REGIONLIST_STRATEGY_LONG_OPTION =
            "hybrid-hancom-ai-regionlist-strategy";
    private static final String HYBRID_HANCOM_AI_REGIONLIST_STRATEGY_DESC =
            "DLA label 7 (regionlist) handling. Requires --hybrid=hancom-ai. "
            + "Values: table-first (default; check TSR overlap), list-only (skip TSR, always treat as list)";

    private static final String HYBRID_HANCOM_AI_OCR_STRATEGY_LONG_OPTION =
            "hybrid-hancom-ai-ocr-strategy";
    private static final String HYBRID_HANCOM_AI_OCR_STRATEGY_DESC =
            "OCR strategy. Requires --hybrid=hancom-ai. "
            + "Values: off (stream-only), auto (default; stream first, OCR fallback), force (OCR-only)";

    private static final String HYBRID_HANCOM_AI_IMAGE_CACHE_LONG_OPTION =
            "hybrid-hancom-ai-image-cache";
    private static final String HYBRID_HANCOM_AI_IMAGE_CACHE_DESC =
            "Page image cache backing. Requires --hybrid=hancom-ai. "
            + "Values: memory (default), disk";

    private static final String HYBRID_HANCOM_AI_SAVE_CROPS_LONG_OPTION =
            "hybrid-hancom-ai-save-crops";
    private static final String HYBRID_HANCOM_AI_SAVE_CROPS_DESC =
            "Persist cropped figure images to disk for debugging. Requires --hybrid=hancom-ai";

    private static final String HYBRID_HANCOM_AI_CROP_OUTPUT_DIR_LONG_OPTION =
            "hybrid-hancom-ai-crop-output-dir";
    private static final String HYBRID_HANCOM_AI_CROP_OUTPUT_DIR_DESC =
            "Output directory for --hybrid-hancom-ai-save-crops. Requires --hybrid=hancom-ai";
  • Step 2: Verify file still compiles

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli compile -DskipTests -q Expected: BUILD SUCCESS (constants added but unused — no warning since private static final is fine).

  • Step 3: Commit
bash
cd opendataloader-pdf && git add java/opendataloader-pdf-cli/src/main/java/org/opendataloader/pdf/cli/CLIOptions.java
git commit -m "$(cat <<'EOF'
feat(cli): add --hybrid-hancom-ai-* option name constants

Defines long-name and description constants for 5 hancom-ai-specific
flags. Wiring into OPTION_DEFINITIONS and parser comes in subsequent
commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 1.2: Register options in OPTION_DEFINITIONS

Files:

  • Modify: opendataloader-pdf/java/opendataloader-pdf-cli/src/main/java/org/opendataloader/pdf/cli/CLIOptions.java (the OPTION_DEFINITIONS list, currently lines 166-208)

  • Step 1: Add 3 exported entries to OPTION_DEFINITIONS (after HYBRID_FALLBACK_LONG_OPTION line, around line 196)

Insert these three lines immediately after the new OptionDefinition(HYBRID_FALLBACK_LONG_OPTION, ...) line and before the new OptionDefinition(TO_STDOUT_LONG_OPTION, ...) line:

java
            new OptionDefinition(HYBRID_HANCOM_AI_REGIONLIST_STRATEGY_LONG_OPTION, null, "string",
                    "table-first", HYBRID_HANCOM_AI_REGIONLIST_STRATEGY_DESC, true),
            new OptionDefinition(HYBRID_HANCOM_AI_OCR_STRATEGY_LONG_OPTION, null, "string",
                    "auto", HYBRID_HANCOM_AI_OCR_STRATEGY_DESC, true),
            new OptionDefinition(HYBRID_HANCOM_AI_IMAGE_CACHE_LONG_OPTION, null, "string",
                    "memory", HYBRID_HANCOM_AI_IMAGE_CACHE_DESC, true),
  • Step 2: Add 2 hidden entries (exported=false) after the legacy options block (around line 208)

Insert after the NO_JSON_REPORT_LONG_OPTION line (the last legacy entry). Note the trailing )/; of the list — adjust so the new lines come before the closing ):

java
            new OptionDefinition(HYBRID_HANCOM_AI_SAVE_CROPS_LONG_OPTION, null, "boolean",
                    false, HYBRID_HANCOM_AI_SAVE_CROPS_DESC, false),
            new OptionDefinition(HYBRID_HANCOM_AI_CROP_OUTPUT_DIR_LONG_OPTION, null, "string",
                    null, HYBRID_HANCOM_AI_CROP_OUTPUT_DIR_DESC, false),
  • Step 3: Add a "options registered" sanity test

Modify opendataloader-pdf/java/opendataloader-pdf-cli/src/test/java/org/opendataloader/pdf/cli/CLIOptionsTest.java — append the following test methods at the end of the class (before the closing }):

java
    @Test
    void testDefineOptions_containsHybridHancomAiRegionlistStrategy() {
        assertTrue(options.hasOption("hybrid-hancom-ai-regionlist-strategy"));
    }

    @Test
    void testDefineOptions_containsHybridHancomAiOcrStrategy() {
        assertTrue(options.hasOption("hybrid-hancom-ai-ocr-strategy"));
    }

    @Test
    void testDefineOptions_containsHybridHancomAiImageCache() {
        assertTrue(options.hasOption("hybrid-hancom-ai-image-cache"));
    }

    @Test
    void testDefineOptions_containsHybridHancomAiSaveCrops() {
        assertTrue(options.hasOption("hybrid-hancom-ai-save-crops"));
    }

    @Test
    void testDefineOptions_containsHybridHancomAiCropOutputDir() {
        assertTrue(options.hasOption("hybrid-hancom-ai-crop-output-dir"));
    }
  • Step 4: Run the sanity tests

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -Dtest=CLIOptionsTest -q Expected: All 5 new tests PASS. (No parsing yet — these only verify Apache Commons CLI registration.)

  • Step 5: Commit
bash
cd opendataloader-pdf && git add java/opendataloader-pdf-cli
git commit -m "$(cat <<'EOF'
feat(cli): register --hybrid-hancom-ai-* in OPTION_DEFINITIONS

Three options (regionlist-strategy, ocr-strategy, image-cache) are
exported to options.json and wrapper bindings; two debug options
(save-crops, crop-output-dir) stay hidden via exported=false.

Sanity tests verify Apache Commons CLI registration. Parsing logic
and Config wiring come next.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 1.3: Parse new options into HybridConfig

Files:

  • Modify: opendataloader-pdf/java/opendataloader-pdf-cli/src/main/java/org/opendataloader/pdf/cli/CLIOptions.javaapplyHybridOptions() method (around lines 463-557)

  • Step 1: Write failing test for --hybrid-hancom-ai-regionlist-strategy=list-only

Append to CLIOptionsTest.java:

java
    @Test
    void testCreateConfig_withHybridHancomAiRegionlistStrategy() throws ParseException {
        String[] args = {"--hybrid", "hancom-ai",
                         "--hybrid-hancom-ai-regionlist-strategy", "list-only",
                         testPdf.getAbsolutePath()};
        CommandLine cmd = parser.parse(options, args);
        Config config = CLIOptions.createConfigFromCommandLine(cmd);
        assertEquals("list-only", config.getHybridConfig().getRegionlistStrategy());
    }
  • Step 2: Run, expect FAIL

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -Dtest=CLIOptionsTest#testCreateConfig_withHybridHancomAiRegionlistStrategy -q Expected: FAIL — value still default table-first because parsing hasn't been added yet.

  • Step 3: Add parsing for all 5 new options

Insert this block inside applyHybridOptions(), immediately after the existing if (commandLine.hasOption(HYBRID_FALLBACK_LONG_OPTION)) { config.getHybridConfig().setFallbackToJava(true); } line and before if (commandLine.hasOption(TO_STDOUT_LONG_OPTION)) { ... }:

java
        if (commandLine.hasOption(HYBRID_HANCOM_AI_REGIONLIST_STRATEGY_LONG_OPTION)) {
            String value = commandLine.getOptionValue(HYBRID_HANCOM_AI_REGIONLIST_STRATEGY_LONG_OPTION);
            if (value != null && !value.trim().isEmpty()) {
                config.getHybridConfig().setRegionlistStrategy(value.trim().toLowerCase(Locale.ROOT));
            }
        }
        if (commandLine.hasOption(HYBRID_HANCOM_AI_OCR_STRATEGY_LONG_OPTION)) {
            String value = commandLine.getOptionValue(HYBRID_HANCOM_AI_OCR_STRATEGY_LONG_OPTION);
            if (value != null && !value.trim().isEmpty()) {
                config.getHybridConfig().setOcrStrategy(value.trim().toLowerCase(Locale.ROOT));
            }
        }
        if (commandLine.hasOption(HYBRID_HANCOM_AI_IMAGE_CACHE_LONG_OPTION)) {
            String value = commandLine.getOptionValue(HYBRID_HANCOM_AI_IMAGE_CACHE_LONG_OPTION);
            if (value != null && !value.trim().isEmpty()) {
                config.getHybridConfig().setImageCache(value.trim().toLowerCase(Locale.ROOT));
            }
        }
        if (commandLine.hasOption(HYBRID_HANCOM_AI_SAVE_CROPS_LONG_OPTION)) {
            config.getHybridConfig().setSaveCrops(true);
        }
        if (commandLine.hasOption(HYBRID_HANCOM_AI_CROP_OUTPUT_DIR_LONG_OPTION)) {
            String value = commandLine.getOptionValue(HYBRID_HANCOM_AI_CROP_OUTPUT_DIR_LONG_OPTION);
            if (value != null && !value.trim().isEmpty()) {
                config.getHybridConfig().setCropOutputDir(value.trim());
            }
        }
  • Step 4: Run the test, expect PASS

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -Dtest=CLIOptionsTest#testCreateConfig_withHybridHancomAiRegionlistStrategy -q Expected: PASS.

  • Step 5: Add coverage tests for the other four options

Append to CLIOptionsTest.java:

java
    @Test
    void testCreateConfig_withHybridHancomAiOcrStrategy() throws ParseException {
        String[] args = {"--hybrid", "hancom-ai",
                         "--hybrid-hancom-ai-ocr-strategy", "force",
                         testPdf.getAbsolutePath()};
        CommandLine cmd = parser.parse(options, args);
        Config config = CLIOptions.createConfigFromCommandLine(cmd);
        assertEquals("force", config.getHybridConfig().getOcrStrategy());
    }

    @Test
    void testCreateConfig_withHybridHancomAiImageCache() throws ParseException {
        String[] args = {"--hybrid", "hancom-ai",
                         "--hybrid-hancom-ai-image-cache", "disk",
                         testPdf.getAbsolutePath()};
        CommandLine cmd = parser.parse(options, args);
        Config config = CLIOptions.createConfigFromCommandLine(cmd);
        assertEquals("disk", config.getHybridConfig().getImageCache());
    }

    @Test
    void testCreateConfig_withHybridHancomAiSaveCrops() throws ParseException {
        String[] args = {"--hybrid", "hancom-ai",
                         "--hybrid-hancom-ai-save-crops",
                         testPdf.getAbsolutePath()};
        CommandLine cmd = parser.parse(options, args);
        Config config = CLIOptions.createConfigFromCommandLine(cmd);
        assertTrue(config.getHybridConfig().isSaveCrops());
    }

    @Test
    void testCreateConfig_withHybridHancomAiCropOutputDir() throws ParseException {
        String[] args = {"--hybrid", "hancom-ai",
                         "--hybrid-hancom-ai-crop-output-dir", "/tmp/crops",
                         testPdf.getAbsolutePath()};
        CommandLine cmd = parser.parse(options, args);
        Config config = CLIOptions.createConfigFromCommandLine(cmd);
        assertEquals("/tmp/crops", config.getHybridConfig().getCropOutputDir());
    }
  • Step 6: Run all new tests, expect PASS

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -Dtest=CLIOptionsTest -q Expected: All tests PASS.

  • Step 7: Commit
bash
cd opendataloader-pdf && git add java/opendataloader-pdf-cli
git commit -m "$(cat <<'EOF'
feat(cli): parse --hybrid-hancom-ai-* into HybridConfig

Wires --hybrid-hancom-ai-{regionlist-strategy,ocr-strategy,image-cache,
save-crops,crop-output-dir} through applyHybridOptions(). HybridConfig
setters validate enum values and throw IllegalArgumentException on
invalid input.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 1.4: Add hancom-ai gate validation

Files:

  • Modify: opendataloader-pdf/java/opendataloader-pdf-cli/src/main/java/org/opendataloader/pdf/cli/CLIOptions.javaapplyHybridOptions()

  • Step 1: Write failing test — using the option without --hybrid=hancom-ai should throw

Append to CLIOptionsTest.java:

java
    @Test
    void testCreateConfig_hybridHancomAiOption_withoutHancomAi_throws() {
        String[] args = {"--hybrid-hancom-ai-regionlist-strategy", "list-only",
                         testPdf.getAbsolutePath()};
        // No --hybrid set, defaults to off.
        IllegalArgumentException ex = assertThrows(IllegalArgumentException.class, () -> {
            CommandLine cmd = parser.parse(options, args);
            CLIOptions.createConfigFromCommandLine(cmd);
        });
        assertTrue(ex.getMessage().contains("--hybrid-hancom-ai-"),
                "Error should mention the offending prefix, got: " + ex.getMessage());
        assertTrue(ex.getMessage().contains("hancom-ai"),
                "Error should mention required backend, got: " + ex.getMessage());
    }

    @Test
    void testCreateConfig_hybridHancomAiOption_withDoclingFast_throws() {
        String[] args = {"--hybrid", "docling-fast",
                         "--hybrid-hancom-ai-ocr-strategy", "force",
                         testPdf.getAbsolutePath()};
        assertThrows(IllegalArgumentException.class, () -> {
            CommandLine cmd = parser.parse(options, args);
            CLIOptions.createConfigFromCommandLine(cmd);
        });
    }
  • Step 2: Run, expect FAIL

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -Dtest=CLIOptionsTest#testCreateConfig_hybridHancomAiOption_withoutHancomAi_throws -q Expected: FAIL — currently no validation, so the call succeeds silently.

  • Step 3: Add the gate at the end of applyHybridOptions()

Insert this block at the very end of applyHybridOptions() (after the --to-stdout handling, before the closing }):

java
        boolean usesHancomAiOnly =
                commandLine.hasOption(HYBRID_HANCOM_AI_REGIONLIST_STRATEGY_LONG_OPTION) ||
                commandLine.hasOption(HYBRID_HANCOM_AI_OCR_STRATEGY_LONG_OPTION) ||
                commandLine.hasOption(HYBRID_HANCOM_AI_IMAGE_CACHE_LONG_OPTION) ||
                commandLine.hasOption(HYBRID_HANCOM_AI_SAVE_CROPS_LONG_OPTION) ||
                commandLine.hasOption(HYBRID_HANCOM_AI_CROP_OUTPUT_DIR_LONG_OPTION);
        if (usesHancomAiOnly && !Config.HYBRID_HANCOM_AI.equals(config.getHybrid())) {
            throw new IllegalArgumentException(
                    "Options --hybrid-hancom-ai-* require --hybrid=hancom-ai (got --hybrid="
                    + config.getHybrid() + ")");
        }
  • Step 4: Run both gate tests, expect PASS

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -Dtest=CLIOptionsTest -q Expected: All tests PASS, including both new gate tests AND all earlier ones (which use --hybrid hancom-ai).

  • Step 5: Commit
bash
cd opendataloader-pdf && git add java/opendataloader-pdf-cli
git commit -m "$(cat <<'EOF'
feat(cli): reject --hybrid-hancom-ai-* without --hybrid=hancom-ai

Without this guard, these flags would silently no-op when the user
selects a different backend. Fail fast with a message naming the
offending prefix and required backend.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 1.5: Regenerate options.json and wrapper bindings

Files:

  • Modify: opendataloader-pdf/options.json

  • Modify: opendataloader-pdf/python/..., opendataloader-pdf/node/... (auto-generated)

  • Step 1: Run npm sync

Run: cd opendataloader-pdf && npm run sync Expected: Command completes; git status shows changes to options.json and python/node binding files. The 3 exported new options (regionlist-strategy, ocr-strategy, image-cache) appear; the 2 hidden ones (save-crops, crop-output-dir) do not.

  • Step 2: Spot-check options.json

Verify the diff contains the 3 new options with correct defaults:

bash
cd opendataloader-pdf && grep -A 2 "hybrid-hancom-ai" options.json

Expected: 3 entries for hybrid-hancom-ai-regionlist-strategy, hybrid-hancom-ai-ocr-strategy, hybrid-hancom-ai-image-cache with their defaults table-first / auto / memory. No entry for save-crops or crop-output-dir.

  • Step 3: Commit
bash
cd opendataloader-pdf && git add options.json python node
git commit -m "$(cat <<'EOF'
chore: regenerate bindings for --hybrid-hancom-ai-* options

Auto-generated by npm run sync. Adds the 3 exported hancom-ai
options to options.json, Python bindings, and Node bindings.
The 2 hidden debug options (save-crops, crop-output-dir) stay
out of the public surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Phase 2 — Extract reusable CLIOptions API

This phase is a pure refactor — behavior unchanged.

Task 2.1: Add public addAllTo(Options) method

Files:

  • Modify: opendataloader-pdf/java/opendataloader-pdf-cli/src/main/java/org/opendataloader/pdf/cli/CLIOptions.javadefineOptions() (around line 210)

  • Step 1: Write a test that exercises addAllTo against a fresh Options instance

Append to CLIOptionsTest.java:

java
    @Test
    void testAddAllTo_registersAllOptions() {
        Options ext = new Options();
        CLIOptions.addAllTo(ext);
        assertTrue(ext.hasOption("hybrid"));
        assertTrue(ext.hasOption("hybrid-mode"));
        assertTrue(ext.hasOption("hybrid-hancom-ai-regionlist-strategy"));
        assertTrue(ext.hasOption("format"));
        assertTrue(ext.hasOption("threads"));
    }
  • Step 2: Run, expect FAIL (compile error — method does not exist)

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -Dtest=CLIOptionsTest#testAddAllTo_registersAllOptions -q Expected: COMPILATION FAILURE — cannot find symbol: addAllTo.

  • Step 3: Add addAllTo and refactor defineOptions to use it

Replace the existing defineOptions method (currently lines ~210-217) with:

java
    public static Options defineOptions() {
        Options options = new Options();
        addAllTo(options);
        return options;
    }

    /**
     * Registers every core CLI option onto an external {@link Options} instance.
     * Used by downstream CLIs (e.g. opendataloader-pdfua) that want to inherit
     * the entire core option set and add their own options on top.
     *
     * @param options the Options instance to populate
     */
    public static void addAllTo(Options options) {
        for (OptionDefinition def : OPTION_DEFINITIONS) {
            options.addOption(def.toOption());
        }
    }
  • Step 4: Run, expect PASS

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -Dtest=CLIOptionsTest -q Expected: All tests PASS — addAllTo works and all existing tests still work via defineOptions.

  • Step 5: Commit
bash
cd opendataloader-pdf && git add java/opendataloader-pdf-cli
git commit -m "$(cat <<'EOF'
refactor(cli): extract CLIOptions.addAllTo(Options) public API

Splits Options registration out of defineOptions() so downstream CLIs
can register every core option onto their own Options instance and
add their own options on top. defineOptions() now delegates.

Pure refactor — no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 2.2: Add public applyAllTo(Config, CommandLine) method

Files:

  • Modify: opendataloader-pdf/java/opendataloader-pdf-cli/src/main/java/org/opendataloader/pdf/cli/CLIOptions.javacreateConfigFromCommandLine() (around lines 219-287)

  • Step 1: Write a test that exercises applyAllTo directly

Append to CLIOptionsTest.java:

java
    @Test
    void testApplyAllTo_appliesCoreOptions() throws ParseException {
        Options ext = new Options();
        CLIOptions.addAllTo(ext);
        ext.addOption(null, "downstream-only", true, "Downstream-specific");

        String[] args = {"--hybrid", "hancom-ai",
                         "--threads", "2",
                         "--downstream-only", "value",
                         testPdf.getAbsolutePath()};
        CommandLine cmd = parser.parse(ext, args);

        Config config = new Config();
        config.setOutputFolder(tempDir.toString());
        CLIOptions.applyAllTo(config, cmd);

        assertEquals("hancom-ai", config.getHybrid());
        assertEquals(2, config.getThreads());
        // downstream-only is not consumed by applyAllTo — that is the caller's job.
        assertEquals("value", cmd.getOptionValue("downstream-only"));
    }
  • Step 2: Run, expect FAIL (compile error)

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -Dtest=CLIOptionsTest#testApplyAllTo_appliesCoreOptions -q Expected: COMPILATION FAILURE — cannot find symbol: applyAllTo.

  • Step 3: Refactor createConfigFromCommandLine to delegate to applyAllTo

Replace the existing createConfigFromCommandLine (around lines 219-287) with this two-method form:

java
    public static Config createConfigFromCommandLine(CommandLine commandLine) {
        Config config = new Config();
        if (commandLine.hasOption(CLIOptions.FOLDER_OPTION)) {
            config.setOutputFolder(commandLine.getOptionValue(CLIOptions.FOLDER_OPTION));
        } else {
            String argument = commandLine.getArgs()[0];
            File file = new File(argument);
            file = new File(file.getAbsolutePath());
            config.setOutputFolder(file.isDirectory() ? file.getAbsolutePath() : file.getParent());
        }
        applyAllTo(config, commandLine);
        return config;
    }

    /**
     * Applies every core CLI option from the parsed command line onto the given Config.
     * Caller is responsible for setting required Config state that is not represented
     * by a CLI option (e.g. output folder when no positional input file is provided).
     *
     * Used by downstream CLIs that build their own Options + Config and want core
     * options applied without paying for the positional-arg-based output-folder
     * fallback that {@link #createConfigFromCommandLine} performs.
     *
     * @param config       Config to populate
     * @param commandLine  parsed CommandLine
     */
    public static void applyAllTo(Config config, CommandLine commandLine) {
        if (commandLine.hasOption(CLIOptions.PASSWORD_OPTION)) {
            config.setPassword(commandLine.getOptionValue(CLIOptions.PASSWORD_OPTION));
        }
        if (commandLine.hasOption(CLIOptions.KEEP_LINE_BREAKS_LONG_OPTION)) {
            config.setKeepLineBreaks(true);
        }
        if (commandLine.hasOption(CLIOptions.PDF_REPORT_LONG_OPTION)) {
            config.setGeneratePDF(true);
        }
        if (commandLine.hasOption(CLIOptions.MARKDOWN_REPORT_LONG_OPTION)) {
            config.setGenerateMarkdown(true);
        }
        if (commandLine.hasOption(CLIOptions.HTML_REPORT_LONG_OPTION)) {
            config.setGenerateHtml(true);
        }
        if (commandLine.hasOption(CLIOptions.HTML_IN_MARKDOWN_LONG_OPTION)) {
            config.setUseHTMLInMarkdown(true);
        }
        if (commandLine.hasOption(CLIOptions.MARKDOWN_IMAGE_LONG_OPTION)) {
            config.setAddImageToMarkdown(true);
        }
        if (commandLine.hasOption(CLIOptions.NO_JSON_REPORT_LONG_OPTION)) {
            config.setGenerateJSON(false);
        }
        if (commandLine.hasOption(CLIOptions.REPLACE_INVALID_CHARS_LONG_OPTION)) {
            config.setReplaceInvalidChars(commandLine.getOptionValue(CLIOptions.REPLACE_INVALID_CHARS_LONG_OPTION));
        }
        if (commandLine.hasOption(CLIOptions.USE_STRUCT_TREE_LONG_OPTION)) {
            config.setUseStructTree(true);
        }
        if (commandLine.hasOption(INCLUDE_HEADER_FOOTER_LONG_OPTION)) {
            config.setIncludeHeaderFooter(true);
        }
        if (commandLine.hasOption(DETECT_STRIKETHROUGH_LONG_OPTION)) {
            config.setDetectStrikethrough(true);
        }
        if (commandLine.hasOption(CLIOptions.READING_ORDER_LONG_OPTION)) {
            config.setReadingOrder(commandLine.getOptionValue(CLIOptions.READING_ORDER_LONG_OPTION));
        }
        if (commandLine.hasOption(CLIOptions.MARKDOWN_PAGE_SEPARATOR_LONG_OPTION)) {
            config.setMarkdownPageSeparator(commandLine.getOptionValue(CLIOptions.MARKDOWN_PAGE_SEPARATOR_LONG_OPTION));
        }
        if (commandLine.hasOption(CLIOptions.TEXT_PAGE_SEPARATOR_LONG_OPTION)) {
            config.setTextPageSeparator(commandLine.getOptionValue(CLIOptions.TEXT_PAGE_SEPARATOR_LONG_OPTION));
        }
        if (commandLine.hasOption(CLIOptions.HTML_PAGE_SEPARATOR_LONG_OPTION)) {
            config.setHtmlPageSeparator(commandLine.getOptionValue(CLIOptions.HTML_PAGE_SEPARATOR_LONG_OPTION));
        }
        applyContentSafetyOption(config, commandLine);
        applySanitizeOption(config, commandLine);
        applyFormatOption(config, commandLine);
        applyTableMethodOption(config, commandLine);
        applyImageOptions(config, commandLine);
        applyPagesOption(config, commandLine);
        applyHybridOptions(config, commandLine);
        applyThreadsOption(config, commandLine);
        config.normalize();
    }

The output-folder positional handling stays in createConfigFromCommandLine only — pdfua does not have positional input args, so it should not pay for that branch.

  • Step 4: Run all CLI tests, expect PASS

Run: cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli test -q Expected: All tests PASS — both the new applyAllTo test and every existing test (the latter still go through createConfigFromCommandLine, which now delegates).

  • Step 5: Commit
bash
cd opendataloader-pdf && git add java/opendataloader-pdf-cli
git commit -m "$(cat <<'EOF'
refactor(cli): extract CLIOptions.applyAllTo(Config, CommandLine)

Splits per-option Config wiring out of createConfigFromCommandLine()
so downstream CLIs (opendataloader-pdfua) can apply core options to
their own Config without inheriting the positional-arg output-folder
fallback. createConfigFromCommandLine() delegates.

Pure refactor — no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 2.3: Install core to local Maven for pdfua

Per opendataloader-pdfua/CLAUDE.md, pdfua consumes core via Maven; cross-repo changes require mvn install on core first.

  • Step 1: Install core to local Maven

Run: cd opendataloader-pdf/java && mvn install -DskipTests -q Expected: BUILD SUCCESS. The new core artifact (with addAllTo / applyAllTo) is now available to pdfua.


Phase 3 — Refactor pdfua to consume core CLIOptions

Task 3.1: Add Builder to RemediationConfig

Files:

  • Modify: opendataloader-pdfua/src/main/java/org/opendataloader/pdf/remediation/RemediationConfig.java

This task adds a Builder without removing the existing constructors yet — this lets us migrate call sites incrementally while keeping the test suite green.

  • Step 1: Add coreConfig field, the Builder class, and a builder-based constructor

After the existing private final MockMode mockMode; line (around line 52), add:

java
    private final org.opendataloader.pdf.api.Config coreConfig;

After the last existing constructor (around line 125, ending this.mockMode = ...;), add:

java
    private RemediationConfig(Builder b) {
        this.input = b.input;
        this.output = b.output;
        this.lang = b.lang;
        this.hybrid = b.coreConfig.getHybrid();
        this.hybridUrl = b.coreConfig.getHybridConfig().getUrl();
        this.hybridMode = b.coreConfig.getHybridConfig().getMode();
        this.enrichPictureDescription = b.enrichPictureDescription;
        this.threads = b.threads;
        this.conformances = EnumSet.copyOf(b.conformances);
        this.reportLevel = b.reportLevel;
        this.fontEmbedMode = b.fontEmbedMode;
        this.auditBundleMode = b.auditBundleMode;
        this.orgMeta = b.orgMeta != null ? b.orgMeta : OrgMeta.defaults();
        this.mockMode = b.mockMode != null ? b.mockMode : MockMode.OFF;
        this.coreConfig = b.coreConfig;
    }

    public static Builder builder() { return new Builder(); }

    public org.opendataloader.pdf.api.Config getCoreConfig() { return coreConfig; }

    public static final class Builder {
        private String input;
        private String output;
        private String lang = "en";
        private org.opendataloader.pdf.api.Config coreConfig = new org.opendataloader.pdf.api.Config();
        private boolean enrichPictureDescription = true;
        private int threads = 1;
        private Set<ConformanceLevel> conformances = EnumSet.of(ConformanceLevel.UA1);
        private ReportLevel reportLevel = ReportLevel.NONE;
        private FontEmbedMode fontEmbedMode = FontEmbedMode.OFF;
        private AuditBundleMode auditBundleMode = AuditBundleMode.NONE;
        private OrgMeta orgMeta = OrgMeta.defaults();
        private MockMode mockMode = MockMode.OFF;

        public Builder input(String v) { this.input = v; return this; }
        public Builder output(String v) { this.output = v; return this; }
        public Builder lang(String v) { this.lang = v; return this; }
        public Builder coreConfig(org.opendataloader.pdf.api.Config v) { this.coreConfig = v; return this; }
        public Builder enrichPictureDescription(boolean v) { this.enrichPictureDescription = v; return this; }
        public Builder threads(int v) { this.threads = v; return this; }
        public Builder conformances(Set<ConformanceLevel> v) { this.conformances = v; return this; }
        public Builder reportLevel(ReportLevel v) { this.reportLevel = v; return this; }
        public Builder fontEmbedMode(FontEmbedMode v) { this.fontEmbedMode = v; return this; }
        public Builder auditBundleMode(AuditBundleMode v) { this.auditBundleMode = v; return this; }
        public Builder orgMeta(OrgMeta v) { this.orgMeta = v; return this; }
        public Builder mockMode(MockMode v) { this.mockMode = v; return this; }

        public RemediationConfig build() { return new RemediationConfig(this); }
    }

Also: in each existing constructor (the 5 of them, ending at line 125), add this line after the existing this.mockMode = ...; assignment:

java
        this.coreConfig = buildCoreConfigForLegacyConstructor();

And add this private helper after the constructors (before public String getInput()):

java
    private org.opendataloader.pdf.api.Config buildCoreConfigForLegacyConstructor() {
        org.opendataloader.pdf.api.Config c = new org.opendataloader.pdf.api.Config();
        if (this.hybrid != null) {
            c.setHybrid(this.hybrid);
        }
        if (this.hybridUrl != null) {
            c.getHybridConfig().setUrl(this.hybridUrl);
        }
        if (this.hybridMode != null) {
            c.getHybridConfig().setMode(this.hybridMode);
        }
        return c;
    }

This keeps the existing 5 constructors source-compatible during migration. They will be removed in Task 3.5.

  • Step 2: Verify pdfua still compiles and tests still pass

Run: cd opendataloader-pdfua && mvn test -q Expected: BUILD SUCCESS, all tests pass. (We added members; we did not change existing behavior.)

  • Step 3: Commit
bash
cd opendataloader-pdfua && git add src/main/java/org/opendataloader/pdf/remediation/RemediationConfig.java
git commit -m "$(cat <<'EOF'
feat(remediation): add RemediationConfig.Builder + embedded coreConfig

Adds a Builder API and a coreConfig field carrying the full core
opendataloader-pdf Config. Existing 5 constructors keep working —
they construct a synthetic coreConfig from the flat hybrid fields
to maintain semantics. Migration of call sites and removal of the
legacy constructors follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 3.2: Refactor pdfua/Main.java to use core CLIOptions

Files:

  • Modify: opendataloader-pdfua/src/main/java/org/opendataloader/pdf/Main.java

  • Step 1: Replace buildOptions() body — remove hybrid options, add addAllTo

Replace buildOptions() (lines 94-125) with:

java
    static Options buildOptions() {
        Options options = new Options();
        // Inherit every core opendataloader-pdf CLI option (--hybrid, --hybrid-url,
        // --hybrid-mode, --hybrid-hancom-ai-*, --threads, etc.).
        org.opendataloader.pdf.cli.CLIOptions.addAllTo(options);

        // pdfua-specific options
        options.addRequiredOption(null, "input", true, "PDF file or directory");
        options.addRequiredOption(null, "output", true, "Output directory");
        options.addOption(null, "lang", true, "Document language ISO 639-1 (default: en)");
        options.addOption(null, "enrich-picture-description", false, "Enable AI picture descriptions (default: on)");
        options.addOption(null, "no-enrich-picture-description", false, "Disable AI picture descriptions");
        options.addOption(null, "conformance", true,
            "Conformance levels, comma-separated: ua1, ua2, wtpdf-reuse, wtpdf-accessibility (default: ua1)");
        options.addOption(null, "report", true, "Report level: processing, quality, all, none (default: all)");
        options.addOption(null, "font-embed", true, "Font embedding policy for missing /FontFile: "
            + "substitute (default; fall back to bundled Liberation fonts; minor visual risk for Arial/Times), "
            + "exact-only (system fonts only), off");
        options.addOption(null, "audit-bundle", true,
            "Emit per-PDF audit bundle: full (default), json (machine logs), html, none");
        options.addOption(null, "org-name", true, "Organization name for VPAT/accessibility statement (default: 'Customer Organization')");
        options.addOption(null, "org-contact", true, "Organization contact email (default: [email protected])");
        options.addOption(null, "product-name", true, "Product name for VPAT (default: opendataloader-pdfua)");
        options.addOption(null, "mock", true,
            "Audit-bundle Phase 1.5 trail fill mode. Values: "
            + "all (default; overwrite every Phase 1.5 field with template values, "
            + "preview mode — MOCK badge rendered) | "
            + "fill (fill only unsupported/empty fields; preserve real measurements) | "
            + "off (no mock — placeholder fields stay as unsupported; real evidence run).");
        return options;
    }

Note: --threads is removed here because it now comes from core CLIOptions. The pdfua-specific note about HancomAIClient thread-safety becomes a runtime warning instead — that is handled in BatchProcessor and is out of scope for this plan. (If the warning text was important, file a separate issue.)

  • Step 2: Replace runWithOptions body

Replace runWithOptions (lines 52-92) with:

java
    private static void runWithOptions(String[] args) {
        Options options = buildOptions();

        CommandLineParser parser = new DefaultParser();
        HelpFormatter formatter = new HelpFormatter();
        try {
            CommandLine cmd = parser.parse(options, args);

            // 1. Apply all core options to a Config
            org.opendataloader.pdf.api.Config coreConfig = new org.opendataloader.pdf.api.Config();
            org.opendataloader.pdf.cli.CLIOptions.applyAllTo(coreConfig, cmd);

            // 2. Apply pdfua's hybrid defaults (only when user did not override)
            applyPdfuaDefaults(coreConfig);

            // 3. Read pdfua-specific options
            String lang = cmd.getOptionValue("lang", "en");
            boolean enrichPictureDescription = !cmd.hasOption("no-enrich-picture-description");
            int threads = coreConfig.getThreads();
            Set<ConformanceLevel> conformances = parseConformances(cmd.getOptionValue("conformance", "ua1"));
            RemediationConfig.ReportLevel reportLevel = parseReportLevel(cmd.getOptionValue("report", "all"));
            FontEmbedMode fontEmbedMode = parseFontEmbedMode(cmd.getOptionValue("font-embed", "substitute"));
            AuditBundleMode auditBundleMode = AuditBundleMode.parse(cmd.getOptionValue("audit-bundle", "full"));
            MockMode mockMode = parseMockMode(cmd.getOptionValue("mock", "all"));

            OrgMeta orgMeta = new OrgMeta(
                cmd.getOptionValue("org-name"),
                cmd.getOptionValue("org-contact"),
                cmd.getOptionValue("product-name")
            ).withFallbackDefaults();

            RemediationConfig config = RemediationConfig.builder()
                .input(cmd.getOptionValue("input"))
                .output(cmd.getOptionValue("output"))
                .lang(lang)
                .coreConfig(coreConfig)
                .enrichPictureDescription(enrichPictureDescription)
                .threads(threads)
                .conformances(conformances)
                .reportLevel(reportLevel)
                .fontEmbedMode(fontEmbedMode)
                .auditBundleMode(auditBundleMode)
                .orgMeta(orgMeta)
                .mockMode(mockMode)
                .build();

            new BatchProcessor(config).run();
        } catch (ParseException e) {
            formatter.printHelp("opendataloader-pdfua", options);
            throw new IllegalArgumentException("Invalid arguments: " + e.getMessage(), e);
        }
    }

    /**
     * Apply pdfua's defaults to a core Config. Unlike opendataloader-pdf,
     * pdfua assumes a hybrid backend is always running — so when the user
     * does not specify --hybrid / --hybrid-url / --hybrid-mode, we substitute
     * pdfua's defaults instead of core's.
     */
    private static void applyPdfuaDefaults(org.opendataloader.pdf.api.Config c) {
        if (org.opendataloader.pdf.api.Config.HYBRID_OFF.equals(c.getHybrid())) {
            c.setHybrid(org.opendataloader.pdf.api.Config.HYBRID_HANCOM_AI);
        }
        if (c.getHybridConfig().getUrl() == null) {
            c.getHybridConfig().setUrl("http://localhost:18008");
        }
        if (org.opendataloader.pdf.api.Config.HYBRID_MODE_AUTO.equals(c.getHybridConfig().getMode())) {
            c.getHybridConfig().setMode(org.opendataloader.pdf.api.Config.HYBRID_MODE_FULL);
        }
    }
  • Step 3: Run pdfua tests

Run: cd opendataloader-pdfua && mvn test -q Expected: PASS. MainAuditBundleCliTest exercises CLI parsing and should now route through core CLIOptions transparently.

  • Step 4: Manual smoke test of inherited core options

Run: cd opendataloader-pdfua && mvn -q exec:java -Dexec.mainClass=org.opendataloader.pdf.Main -Dexec.args="--input src/test/resources --output /tmp/pdfua-test --hybrid-hancom-ai-regionlist-strategy list-only --hybrid-hancom-ai-ocr-strategy auto --audit-bundle none --mock off" 2>&1 | head -20

If the project doesn't have exec:java configured, skip this step — the test suite covers wiring.

Expected: Either runs without "Unrecognized option" errors for --hybrid-hancom-ai-*, or fails on a downstream concern (network/file). The point is that argument parsing accepts the flags.

  • Step 5: Commit
bash
cd opendataloader-pdfua && git add src/main/java/org/opendataloader/pdf/Main.java
git commit -m "$(cat <<'EOF'
refactor(pdfua): inherit core CLIOptions instead of redefining

Replace pdfua's hand-rolled --hybrid/--hybrid-url/--hybrid-mode/
--threads definitions with org.opendataloader.pdf.cli.CLIOptions.
addAllTo(). pdfua now automatically inherits every core option,
including the new --hybrid-hancom-ai-* flags.

pdfua's deviating defaults (hybrid=hancom-ai, url=localhost:18008,
mode=full) are now in an explicit applyPdfuaDefaults() helper
instead of buried in cmd.getOptionValue() defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 3.3: Simplify RemediationProcessor's hybrid forwarding

Files:

  • Modify: opendataloader-pdfua/src/main/java/org/opendataloader/pdf/remediation/RemediationProcessor.java (lines 163-170)

  • Step 1: Read the current hybrid block to confirm context

Run: cd opendataloader-pdfua && sed -n '160,175p' src/main/java/org/opendataloader/pdf/remediation/RemediationProcessor.java Expected output (approximately):

java
        Config config = ...;  // may need to inspect actual surrounding code
        ...
        if (remediationConfig.getHybridUrl() != null || !"docling-fast".equals(remediationConfig.getHybrid())) {
            config.setHybrid(remediationConfig.getHybrid());
            if (remediationConfig.getHybridUrl() != null) {
                config.getHybridConfig().setUrl(remediationConfig.getHybridUrl());
            }
            config.getHybridConfig().setMode(remediationConfig.getHybridMode());
        }
  • Step 2: Replace the hybrid block with coreConfig copy

Replace those lines (currently around 163-170) with:

java
        // RemediationConfig now carries the full core Config. Adopt it directly.
        config = remediationConfig.getCoreConfig();

If the local Config config variable is constructed earlier in the method with other settings, you must merge them — read the full method first to verify. The simplest correct rewrite is:

  1. Find where the local Config is first assigned in processSingle or whichever method contains lines 163-170.
  2. Replace Config config = new Config(); with Config config = remediationConfig.getCoreConfig();
  3. Delete the lines 163-170 hybrid block.
  4. Keep any other config.setXxx() calls in the method — they continue to work on the borrowed Config instance.

If coreConfig is mutated by other settings inside processSingle and that mutation should not leak back to the caller, clone it first. But based on the spec's intent ("RemediationConfig embeds Config directly"), in-place mutation is acceptable since RemediationConfig is per-document and short-lived.

  • Step 3: Run pdfua tests

Run: cd opendataloader-pdfua && mvn test -q Expected: PASS.

  • Step 4: Commit
bash
cd opendataloader-pdfua && git add src/main/java/org/opendataloader/pdf/remediation/RemediationProcessor.java
git commit -m "$(cat <<'EOF'
refactor(remediation): forward coreConfig directly to RemediationProcessor

Replace the per-field hybrid forwarding block (setHybrid + setUrl +
setMode) with config = remediationConfig.getCoreConfig(). Now that
RemediationConfig embeds the full core Config, no manual copy is
needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 3.4: Migrate test call sites to Builder

Files:

  • Modify: opendataloader-pdfua/src/test/java/org/opendataloader/pdf/audit/AuditBundleEmitterTest.java

  • Modify: opendataloader-pdfua/src/test/java/org/opendataloader/pdf/audit/certificate/CertificateIssuerTest.java

  • Modify: opendataloader-pdfua/src/test/java/org/opendataloader/pdf/audit/manifest/AuditManifestBuilderTest.java

  • Modify: opendataloader-pdfua/src/test/java/org/opendataloader/pdf/remediation/RemediationConfigAuditBundleTest.java

  • Step 1: Find every new RemediationConfig( call site

Run:

bash
cd opendataloader-pdfua && grep -rn "new RemediationConfig(" src/test

Record each file:line and inspect the constructor arguments.

  • Step 2: For each call site, replace with Builder

For a call like:

java
RemediationConfig cfg = new RemediationConfig(
    "input.pdf", "/tmp/out", "en", "hancom-ai", "http://localhost:18008", "full",
    true, 1, EnumSet.of(ConformanceLevel.UA1), ReportLevel.NONE);

Rewrite as:

java
org.opendataloader.pdf.api.Config core = new org.opendataloader.pdf.api.Config();
core.setHybrid("hancom-ai");
core.getHybridConfig().setUrl("http://localhost:18008");
core.getHybridConfig().setMode("full");

RemediationConfig cfg = RemediationConfig.builder()
    .input("input.pdf")
    .output("/tmp/out")
    .lang("en")
    .coreConfig(core)
    .enrichPictureDescription(true)
    .threads(1)
    .conformances(EnumSet.of(ConformanceLevel.UA1))
    .reportLevel(ReportLevel.NONE)
    .build();

If the test only cares about a subset (e.g. doesn't pass hybrid args), use the relevant builder methods only — Builder defaults handle the rest.

  • Step 3: Run pdfua tests

Run: cd opendataloader-pdfua && mvn test -q Expected: PASS.

  • Step 4: Commit
bash
cd opendataloader-pdfua && git add src/test
git commit -m "$(cat <<'EOF'
test(remediation): migrate RemediationConfig instantiations to Builder

Switches all 4 test files (AuditBundleEmitterTest, CertificateIssuerTest,
AuditManifestBuilderTest, RemediationConfigAuditBundleTest) from
positional constructors to RemediationConfig.builder() + .build().

Prepares for removal of the legacy constructors in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 3.5: Remove legacy constructors and getHybridUrl/getHybridMode

Files:

  • Modify: opendataloader-pdfua/src/main/java/org/opendataloader/pdf/remediation/RemediationConfig.java

  • Step 1: Verify no production code references the legacy constructors

Run:

bash
cd opendataloader-pdfua && grep -rn "new RemediationConfig(" src/main

Expected: No output. (Main.java should now use builder().)

If any matches appear, migrate them to Builder before continuing.

  • Step 2: Verify no code references getHybridUrl / getHybridMode

Run:

bash
cd opendataloader-pdfua && grep -rn "getHybridUrl\|getHybridMode" src

Expected: Only matches in RemediationConfig.java itself (the methods being removed).

If any other call sites appear, replace with getCoreConfig().getHybridConfig().getUrl() / .getMode().

  • Step 3: Remove the 5 legacy constructors and the buildCoreConfigForLegacyConstructor helper

In RemediationConfig.java, delete:

  • All 5 public constructors (currently lines 54-125)
  • The buildCoreConfigForLegacyConstructor() helper added in Task 3.1

Keep the private RemediationConfig(Builder b) constructor.

  • Step 4: Remove getHybridUrl() and getHybridMode() getters; redirect getHybrid() to delegate

Replace the three lines (currently around 130-132):

java
    public String getHybrid()                       { return hybrid; }
    public String getHybridUrl()                    { return hybridUrl; }
    public String getHybridMode()                   { return hybridMode; }

with:

java
    public String getHybrid()                       { return coreConfig.getHybrid(); }
    public org.opendataloader.pdf.hybrid.HybridConfig getHybridConfig() {
        return coreConfig.getHybridConfig();
    }

Also remove the now-unused fields hybrid, hybridUrl, hybridMode (lines ~42-44) and their assignments from the Builder constructor body. Delegate via coreConfig everywhere.

  • Step 5: Verify compile and tests

Run: cd opendataloader-pdfua && mvn compile -q && mvn test -q Expected: BUILD SUCCESS and all tests PASS.

  • Step 6: Commit
bash
cd opendataloader-pdfua && git add src/main/java/org/opendataloader/pdf/remediation/RemediationConfig.java
git commit -m "$(cat <<'EOF'
refactor(remediation)!: remove legacy RemediationConfig constructors

BREAKING: Removes the 5 positional constructors and the flat
hybrid/hybridUrl/hybridMode fields. RemediationConfig now consists
of pdfua-specific fields plus an embedded core Config; hybrid state
is read via getCoreConfig().getHybrid() / getHybridConfig().

getHybridUrl() and getHybridMode() are removed — call sites use
getHybridConfig().getUrl() / .getMode() instead. Verified that no
production or test code outside RemediationConfig itself referenced
those getters.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 3.6: Final integration check

  • Step 1: Full build + test on both repos

Run:

bash
cd opendataloader-pdf/java && mvn install -DskipTests -q
cd opendataloader-pdfua && mvn clean package -q

Expected: Both BUILD SUCCESS.

  • Step 2: Run the full test suite for both

Run:

bash
cd opendataloader-pdf/java && mvn test -q
cd opendataloader-pdfua && mvn test -q

Expected: All tests pass in both repos.

  • Step 3: Verify --help output for both CLIs lists the new options
bash
cd opendataloader-pdf/java && mvn -pl opendataloader-pdf-cli exec:java \
  -Dexec.mainClass=org.opendataloader.pdf.cli.CLIMain -Dexec.args="--help" 2>&1 | \
  grep "hybrid-hancom-ai"

Expected: 3 lines (regionlist-strategy, ocr-strategy, image-cache).

bash
cd opendataloader-pdfua && java -jar target/*-jar-with-dependencies.jar --input dummy --output dummy --help 2>&1 | grep "hybrid-hancom-ai"

Expected: 3 lines (same options inherited).

Note: pdfua's --help output requires --input / --output to be supplied since they're addRequiredOption. If this is awkward, this verification step can be done by inspecting Options programmatically in a quick MainTest.

  • Step 4: No final commit needed — all phase-3 commits already pushed.

Self-Review

Spec coverage check:

Spec sectionTasks
5 new options with --hybrid-hancom-ai-* prefixTask 1.1, 1.2
Validation gate (require --hybrid=hancom-ai)Task 1.4
addAllTo(Options) public APITask 2.1
applyAllTo(Config, CommandLine) public APITask 2.2
pdfua/Main inherits core optionsTask 3.2
applyPdfuaDefaults explicit override blockTask 3.2
RemediationConfig Builder + embedded ConfigTask 3.1, 3.5
Hard break: remove getHybridUrl/Mode, 9 constructorsTask 3.5
RemediationProcessor simplifiedTask 3.3
All test call sites migratedTask 3.4
npm run sync to regenerate bindingsTask 1.5
mvn install -DskipTests (core) before pdfua buildTask 2.3, 3.6

All spec items mapped to at least one task.

Placeholder check: No "TBD", "TODO", "fill in details", or stub steps. Every code step contains the actual code. Test code is complete (assertions, args). Commit messages are pre-written.

Type consistency: coreConfig field name used consistently. getCoreConfig() getter name used consistently. Builder method names match field names. applyPdfuaDefaults signature matches between definition (Task 3.2) and call site (also Task 3.2).


Plan complete and saved to docs/superpowers/plans/2026-04-29-hybrid-hancom-ai-options.md. Two execution options:

1. Subagent-Driven (recommended) - dispatch a fresh subagent per task, review between tasks, fast iteration

2. Inline Execution - execute tasks in this session using executing-plans, batch execution with checkpoints

Which approach?