scipy/io/_fast_matrix_market/README.md
Fast and full-featured Matrix Market I/O library for C++ and Python.
Originally written by Adam Lugowski and published at https://github.com/alugowski/fast_matrix_market
The Python bindings are interchangeable with scipy.io._mmio methods to read and write Matrix Market files.
The core implementation is written in C++17 and exposed to Python with pybind11. The C++ code directly reads and writes numpy arrays for a zero-copy interface.
Parallelism is via a thread pool built on C++ threads.
The core speedup over a pure Python implementation comes from two sources: fast sequential parsing/formatting and parallelism.
C++17 offers <charconv> with very fast sequential methods. Integer overloads work very well, but the floating-point overloads are only available in very recent versions of only some compilers.
Note that the standard library methods like strtod internally lock on the system locale.
This locking severely restricts usable parallelism with these methods.
The dependencies below fill the gap until floating-point <charconv> support becomes dependable.
std::from_chars. This algorithm has been incorporated into GCC 12, LLVM as of 2021, and with ongoing work to update MSVC's existing floating-point std::from_chars to fast_float.to_chars implementations.For future maintainers:
Once floating-point <charconv> is available then add the following definitions to enable it:
'-DFMM_FROM_CHARS_DOUBLE_SUPPORTED''-DFMM_FROM_CHARS_LONG_DOUBLE_SUPPORTED''-DFMM_TO_CHARS_DOUBLE_SUPPORTED''-DFMM_TO_CHARS_LONG_DOUBLE_SUPPORTED'and the dependencies may be dropped.
The C++ library is in fast_matrix_market/include/fast_matrix_market/, and the pybind11-based Python bindings are in src/.
Matrix Market files are read and written in two phases: header and body.
The header routines are in header.hpp and types.hpp.
The entrypoints for the body methods are in read_body.hpp and write_body.hpp, with *_threads.hpp versions that
parallelize the core method.
parse_handlers.hpp and formatters.hpp contain methods to define I/O from file to datastructure and datastructure to file, respectively.
These include the logic on how to read or populate dense arrays, coordinate matrices, compressed CSR/CSR, etc.
Both parse handlers and formatters allow breaking down the datastructure into independent chunks for processing in parallel.
thirdparty/task_thread_pool is a simple thread pool class similar to Python's ThreadPoolExecutor.
This works well and avoids a dependency on a heavy package like TBB.
field_conv.hpp contains abstractions to read/write integer, floating point, and complex numbers. This exists primarily
to use the best available method from:
<charconv> methods std::from_chars and std::to_charsstrtod()chunking.hpp enables reading from a stream in large chunks with newline boundaries.
The Python bindings build an extension called _core from src/_core*.cpp.
src/pystreambuf.h adapts Python stream objects to C++ streams.
This enables using BytesIO, GZipFile, etc, from C++.