Back to Pytorch

Datasets

docs/cpp/source/api/data/datasets.md

2.12.02.0 KB
Original Source

Datasets

The dataset abstraction defines how to access individual samples in your data. All datasets inherit from Dataset and must implement get() and size().

Dataset Base Class

{doxygenclass}
:members:
:undoc-members:
{doxygenclass}
:members:
:undoc-members:

StatefulDataset

A dataset that manages its own state across batches (e.g., position in a stream). Unlike Dataset, it produces batches directly without external samplers.

{doxygenclass}
:members:
:undoc-members:

ChunkDataReader

Interface for reading chunks of data from a data source. Used with ChunkDataset for large-scale data loading.

{doxygenclass}
:members:
:undoc-members:

Custom Dataset Example

cpp
class CustomDataset : public torch::data::datasets::Dataset<CustomDataset> {
 public:
  explicit CustomDataset(const std::string& root) {
    // Load data from root directory
  }

  torch::data::Example<> get(size_t index) override {
    return {images_[index], labels_[index]};
  }

  torch::optional<size_t> size() const override {
    return images_.size(0);
  }

 private:
  torch::Tensor images_, labels_;
};

MapDataset

{doxygenclass}
:members:
:undoc-members:

ChunkDataset

{doxygenclass}
:members:
:undoc-members:

SharedBatchDataset

{doxygenclass}
:members:
:undoc-members:

Built-in Datasets

MNIST

{doxygenclass}
:members:
:undoc-members:

Example:

cpp
auto dataset = torch::data::datasets::MNIST("./data")
    .map(torch::data::transforms::Normalize<>(0.1307, 0.3081))
    .map(torch::data::transforms::Stack<>());

Example Struct

{doxygenstruct}
:members:
:undoc-members: