src/README.md
Configuration Loading (src/Config.cpp)
data/config/*.json), parses segmenter definitions and conversion chains.ocd2, dictionary groups) based on the type field, with support for additional search paths.Converter objects that hold segmenters and conversion chains.Segmentation (src/MaxMatchSegmentation.cpp)
mmseg, i.e., Maximum Forward Matching.Segments; unmatched UTF-8 fragments are preserved by character length.Conversion Chain (src/ConversionChain.cpp, src/Conversion.cpp)
Conversion objects, each node relies on a dictionary to replace segments with target values through longest prefix matching.Dictionary System
Dict unifies prefix matching, all-prefix matching, and dictionary traversal.TextDict (.txt) builds dictionaries from tab-delimited plain text; MarisaDict (.ocd2) provides high-performance trie structures; DictGroup can compose multiple dictionaries into a sequential collection.SerializableDict defines serialization and file loading logic, which command-line tools use to convert between different formats.API Encapsulation
SimpleConverter (high-level C++ interface) encapsulates Config + Converter, providing various overloads for string, pointer buffer, and partial length conversion.opencc.h exposes the C API: opencc_open, opencc_convert_utf8, etc., for language bindings and command-line reuse.opencc (src/tools/CommandLine.cpp) demonstrates batch conversion, stream reading, auto-flushing, and same-file input/output handling.Match and related functions..ocd)..ocd2).