ADRs/0033-shape-buffer-trie.md
Implemented
Proposed by: Adam Gibson (19-12-2024) Discussed with: Paul Dubs
The libnd4j library requires efficient storage and lookup of shape information for neural network operations. Shape information is usually calculated multilple times per operation and can be expensive to maintain. One goal is to reduce the overhead by getting rid of ShapeDescriptors which were unnecessary extra allocations rather than just using only the shape buffers. This was all stored in a ShapeDescriptor-based cache with an unordered map,
The primary challenges include:
We implement a shape buffer trie data structure (DirectShapeTrie) to manage and
cache shape information, replacing the previous unordered map implementation.
The trie structure is chosen for the following characteristics:
Root
├── 2 (rank)
│ ├── 3,4 (shape values)
│ │ └── [ptr: shape_buffer_1]
│ └── 5,6 (shape values)
│ └── [ptr: shape_buffer_2]
└── 3 (rank)
├── 2,3,4 (shape values)
│ │ └── [ptr: shape_buffer_3]
└── 4,5,6 (shape values)
└── [ptr: shape_buffer_4]
In this example:
The shape buffer cache implements striped locking using an array of mutexes:
mutable std::array<MUTEX_TYPE, NUM_STRIPES> _mutexes;
This design provides:
The striping mechanism:
Memory Efficiency:
Performance:
Thread Safety:
Concurrency:
Implementation Complexity:
Memory Overhead:
Performance Trade-offs:
sd::LongType* createBuffer(int length);
void deleteBuffer(sd::LongType* buffer);
sd::LongType* lookupBuffer(const sd::LongType* shape, int length);
void registerBuffer(const sd::LongType* shape, int length);
void decrementRef(const sd::LongType* buffer);