Back to Pytorch Lightning

Remote Filesystems

docs/source-pytorch/common/remote_fs.rst

2.6.42.8 KB
Original Source

.. _remote_fs:

################## Remote Filesystems ##################

PyTorch Lightning enables working with data from a variety of filesystems, including local filesystems and several cloud storage providers such as S3 <https://aws.amazon.com/s3/>_ on AWS <https://aws.amazon.com/>, GCS <https://cloud.google.com/storage> on Google Cloud <https://cloud.google.com/>, or ADL <https://azure.microsoft.com/solutions/data-lake/> on Azure <https://azure.microsoft.com/>_.

This applies to saving and writing checkpoints, as well as for logging. Working with different filesystems can be accomplished by appending a protocol like "s3:/" to file paths for writing and reading data.

.. code-block:: python

# `default_root_dir` is the default path used for logs and checkpoints
trainer = Trainer(default_root_dir="s3://my_bucket/data/")
trainer.fit(model)

For logging, remote filesystem support depends on the particular logger integration being used. Consult :ref:the documentation of the individual logger <loggers-api-references> for more details.

.. code-block:: python

from lightning.pytorch.loggers import TensorBoardLogger

logger = TensorBoardLogger(save_dir="s3://my_bucket/logs/")

trainer = Trainer(logger=logger)
trainer.fit(model)

Additionally, you could also resume training with a checkpoint stored at a remote filesystem.

.. code-block:: python

trainer = Trainer(default_root_dir=tmpdir, max_steps=3)
trainer.fit(model, ckpt_path="s3://my_bucket/ckpts/classifier.ckpt")

PyTorch Lightning uses fsspec <https://filesystem-spec.readthedocs.io/>_ internally to handle all filesystem operations.

The most common filesystems supported by Lightning are:

  • Local filesystem: file:// - It's the default and doesn't need any protocol to be used. It's installed by default in Lightning.
  • Amazon S3: s3:// - Amazon S3 remote binary store, using the library s3fs <https://s3fs.readthedocs.io/>__. Run pip install fsspec[s3] to install it.
  • Google Cloud Storage: gcs:// or gs:// - Google Cloud Storage, using gcsfs <https://gcsfs.readthedocs.io/en/stable/>__. Run pip install fsspec[gcs] to install it.
  • Microsoft Azure Storage: adl://, abfs:// or az:// - Microsoft Azure Storage, using adlfs <https://github.com/fsspec/adlfs>__. Run pip install fsspec[adl] to install it.
  • Hadoop File System: hdfs:// - Hadoop Distributed File System. This uses PyArrow <https://arrow.apache.org/docs/python/>__ as the backend. Run pip install fsspec[hdfs] to install it.

You could learn more about the available filesystems with:

.. code-block:: python

from fsspec.registry import known_implementations

print(known_implementations)

You could also look into :ref:CheckpointIO Plugin <checkpointing_expert> for more details on how to customize saving and loading checkpoints.