ADRs/0035-samediff-extended-storage-format.md
Implemented
Proposed by: Adam Gibson (15-04-2025)
The current SameDiff serialization relies on FlatBuffers for graph representation and handles large arrays (>2GB) using a chunking mechanism. However, this approach has several limitations:
We need a more robust serialization format that addresses these challenges while maintaining compatibility with existing systems.
We have implemented a unified container format for SameDiff that encapsulates both graph structure and arrays in a single file, with support for optional externalization and sharding when needed. This format maintains full backward compatibility with the original serialization approach.
Multi-Format Support:
SDNB Format:
SDZ Format:
Metadata Management:
Sharding Support:
Backward Compatibility:
SDNB Format Structure:
MAGIC_BYTES (4 bytes: "SDNB")
VERSION (4 bytes)
MANIFEST_OFFSET (8 bytes)
MANIFEST_LENGTH (8 bytes)
METADATA_OFFSET (8 bytes)
[FLATBUFFER_GRAPH_DATA]
[APPENDED_ARRAYS_DATA]
[SERIALIZED_MANIFEST]
SDZ Format Structure:
ZIP_HEADER
[ENTRY: model.sdnb] # Graph structure shard
[ENTRY: model.shard0-of-N.sdnb] # Alternative naming for graph shard
[ENTRY: model.shard1-of-N.sdnb] # Variable shard 1
[ENTRY: model.shard2-of-N.sdnb] # Variable shard 2
...
[ENTRY: model.shardM-of-N.sdnb] # Variable shard M
ZIP_DIRECTORY
ZIP_END
Sharding Strategy:
API Design:
// SDNB Format API
SameDiffSerializer.save(sameDiff, file, saveUpdaterState, metadata);
SameDiffSerializer.saveAutoShard(sameDiff, baseFile, saveUpdaterState, metadata);
SameDiffSerializer.saveSharded(sameDiff, baseFile, saveUpdaterState, estimatedShards, metadata);
SameDiff model = SameDiffSerializer.load(file, loadUpdaterState);
SameDiff model = SameDiffSerializer.loadSharded(baseFile, loadUpdaterState);
// SDZ Format API
SDZSerializer.save(sameDiff, outputZipFile, saveUpdaterState, metadata);
SameDiff model = SDZSerializer.load(modelZipFile, loadUpdaterState);
The SDZ format addresses the need for single-file distribution of large models through the following implementation:
ZIP Container: The SDZ format uses a standard ZIP archive as its container, enabling compatibility with standard zip tools for inspection and extraction.
Internal Structure:
Sharding Implementation:
SDZSerializer.save() internally calls SameDiffSerializer.saveAutoShard() to create SDNB filesLoading Process``:
SDZSerializer.load() extracts all SDNB files to a temporary directoryZIP Operations:
Optimizations:
The SDZ format balances compression benefits against performance requirements:
Serialization Performance:
Deserialization Performance:
Storage Efficiency:
FlatBuffers Compatibility vs. Unlimited Model Size:
Single File Format vs. Performance:
Metadata Extensibility vs. Format Complexity:
Cross-Platform Support vs. Optimization:
Simplified Deployment:
Enhanced Model Storage:
Better Metadata Management:
First-Class Sharding:
Complete Backward Compatibility:
Implementation Complexity:
Performance Considerations:
Tool Ecosystem:
public static SameDiff load(File file, boolean loadUpdaterState) throws IOException {
// Check if it's a ZIP file first (SDZ format)
if (isZipFile(file)) {
return SDZSerializer.load(file, loadUpdaterState);
}
// Not a ZIP, check if it's a native SDNB file
if (isValidSdnbFile(file)) {
return SameDiffSerializer.load(file, loadUpdaterState);
}
// Check if it's a base name for sharded files
if (hasShardedFiles(file)) {
return SameDiffSerializer.loadSharded(file, loadUpdaterState);
}
// Unsupported format
throw new UnsupportedOperationException("Unrecognized model format");
}
public static void save(SameDiff sameDiff, File outputZipFile, boolean saveUpdaterState,
Map<String, String> metadata) throws IOException {
// Create temporary directory for SDNB files
Path tempDir = Files.createTempDirectory("sdz-serializer-save-");
try {
// Save using SDNB serializer to temporary directory
File internalSavePath = new File(tempDir.toFile(), "model");
SameDiffSerializer.saveAutoShard(sameDiff, internalSavePath, saveUpdaterState, metadata);
// Collect all files to add to ZIP
List<File> filesToZip = new ArrayList<>();
findAllFilesRecursively(tempDir.toFile(), filesToZip);
// Create ZIP archive
createZipArchive(outputZipFile, filesToZip);
} finally {
// Clean up temporary directory
FileUtils.deleteDirectory(tempDir.toFile());
}
}
public static SameDiff load(File modelZipFile, boolean loadUpdaterState) throws IOException {
// Extract ZIP to temporary directory
Path tempDir = Files.createTempDirectory("sdz-serializer-load-");
try {
// Extract ZIP contents
extractZip(modelZipFile, tempDir.toFile());
// Determine the path to load from
File loadPath = determineLoadPath(tempDir.toFile());
// Load using SDNB serializer
return SameDiffSerializer.load(loadPath, loadUpdaterState);
} finally {
// Clean up temporary directory
FileUtils.deleteDirectory(tempDir.toFile());
}
}
For existing users:
Loading Existing Models:
Converting to SDZ Format:
SDZSerializer.save() with existing SameDiff instancesWhen to Use Each Format: