ADRs/0015 - Unified Resource Manager.md
Discussion
Proposed by: Adam Gibson (12th Jan 2022)
TODO:
A number of current downloaders exist for various resources deeplearning4j needs to function. These include the following:
These have accumulated over the years and have made maintenance of download related logic complex.
Relevant ADRs include: Omnihub zoo download Omnihub zoo download implementations Omnihub replace old model zoo
All resources are hosted on github LFS. A resource abstraction for binding the various resource types in to 1 abstraction and downloader.
A Resource is how we handle this. It is be aware of the following concepts:
A Resource manages a remote resource like a file. Similar to the current resource types in deeplearning4j-common. These resources are mostly be stored on git LFS.
As part of this introduction of a unified resource abstraction is cache aware exposing the cache so users can delete if they wish.
For existing datasets we use the old sources but have a common abstraction for knowing which dataset we want to download.
Another problem is file verification.
The legacy model zoo uses simpler adler checksums for verification. Some download cache verification implementations use md5sum.
We use md5sum and standardize this for all resources.
Note that in order to avoid maintenance burdens md5 checksum verification is optional. By default, if a resource returns null or an empty string verification is not performed. This distinction is important for resource types such as test resources vs end user assets like pretrained model weights.
This is also important for compatibility. Due to the legacy checksum verification in the zoo module, md5 checksum verification can come later.
This leads us to 5 resource types: