docs/cpp/source/api/data/datasets.md
The dataset abstraction defines how to access individual samples in your data.
All datasets inherit from Dataset and must implement get() and size().
:members:
:undoc-members:
:members:
:undoc-members:
A dataset that manages its own state across batches (e.g., position in a stream).
Unlike Dataset, it produces batches directly without external samplers.
:members:
:undoc-members:
Interface for reading chunks of data from a data source. Used with
ChunkDataset for large-scale data loading.
:members:
:undoc-members:
class CustomDataset : public torch::data::datasets::Dataset<CustomDataset> {
public:
explicit CustomDataset(const std::string& root) {
// Load data from root directory
}
torch::data::Example<> get(size_t index) override {
return {images_[index], labels_[index]};
}
torch::optional<size_t> size() const override {
return images_.size(0);
}
private:
torch::Tensor images_, labels_;
};
:members:
:undoc-members:
:members:
:undoc-members:
:members:
:undoc-members:
:members:
:undoc-members:
Example:
auto dataset = torch::data::datasets::MNIST("./data")
.map(torch::data::transforms::Normalize<>(0.1307, 0.3081))
.map(torch::data::transforms::Stack<>());
:members:
:undoc-members: