docs/source/accelerator/index.md
Since PyTorch 2.1, the community has made significant progress in streamlining the process of integrating new accelerators into the PyTorch ecosystem. These improvements include, but are not limited to: refinements to the PrivateUse1 Dispatch Key, the introduction and enhancement of core subsystem extension mechanisms, and the device-agnostic refactoring of key modules (e.g., torch.accelerator, memory management). Taken together, these advances provide the foundation for a robust, flexible, and developer-friendly pathway for accelerator integration.
This guide is a work in progress. For more details, please refer to the [roadmap](https://github.com/pytorch/pytorch/issues/158917).
This integration pathway offers several major benefits:
This document is intended for:
This guide aims to provide a comprehensive overview of the modern integration pathway for new accelerator in PyTorch. It walks through the full integration surface, from low-level device primitives to higher-level domain modules like compilation and quantization. The structure follows a modular and scenario-driven approach, where each topic is paired with corresponding code examples from torch_openreg, an official reference implementation, and this series is structured around four major axes:
AMP, Compiler, ONNX, and Distributed and so on.The goal is to help developers:
Next, we will delve into each chapter of this guide. Each chapter focuses on a key aspect of integration, providing detailed explanations and illustrative examples. Since some chapters build upon previous ones, readers are encouraged to follow the sequence to achieve a more coherent understanding.
:glob:
:maxdepth: 1
device
hooks
guard
autoload
operators
amp
profiler