docs/cpp/source/api/autograd/modes.md
PyTorch provides RAII guards to control gradient computation behavior.
RAII guard that disables gradient computation within its scope.
Disables gradient computation.
Restores previous gradient mode.
Example:
{
torch::NoGradGuard no_grad;
// No gradients computed in this scope
auto result = model->forward(input);
}
c10::InferenceMode is a RAII guard analogous to NoGradMode designed for use
when you are certain your operations will have no interactions with autograd
(e.g., model inference). Compared to NoGradMode, code run under this mode gets
better performance by disabling autograd-related work like view tracking and version
counter bumps. However, tensors created inside InferenceMode have more limitations
when interacting with the autograd system.
RAII guard that enables inference mode for optimized inference.
This is more efficient than NoGradGuard for inference-only workloads.
Enables or disables inference mode.
Inference Tensors:
InferenceMode can be enabled for a given block of code. Inside InferenceMode,
all newly allocated (non-view) tensors are marked as inference tensors. Inference tensors:
Do not have a version counter, so an error will be raised if you try to read their version (e.g., because you saved this tensor for backward).
Are immutable outside InferenceMode. An error will be raised if you try to:
requires_grad=True outside InferenceMode.InferenceMode to get a normal tensor before mutating.A non-view tensor is an inference tensor if and only if it was allocated inside InferenceMode.
A view tensor is an inference tensor if and only if it is a view of an inference tensor.
Performance Guarantees:
Inside an InferenceMode block:
NoGradMode, all operations do not record grad_fn even if their inputs have
requires_grad=True. This applies to both inference tensors and normal tensors.For more implementation details, see the RFC-0011-InferenceMode.
Basic Example:
{
c10::InferenceMode guard;
// Optimized inference without gradient tracking
auto result = model->forward(input);
}
Inference Workload Example:
c10::InferenceMode guard;
model.load_jit(saved_model);
auto inputs = preprocess_tensors(data);
auto out = model.forward(inputs);
auto outputs = postprocess_tensors(out);
Nested InferenceMode:
Unlike some other guards, InferenceMode can be nested with different enabled/disabled states:
{
c10::InferenceMode guard(true);
// InferenceMode is on
{
c10::InferenceMode guard(false);
// InferenceMode is off
}
// InferenceMode is on
}
// InferenceMode is off
InferenceMode is preferred over NoGradMode for pure inference workloads because
it provides better performance. Key differences:
InferenceMode also affects tensor creation while NoGradMode doesn't.InferenceMode are marked as inference tensors with
certain limitations that apply after exiting InferenceMode.InferenceMode can be nested with enabled/disabled states.The legacy AutoNonVariableTypeMode guard (now renamed to
AutoDispatchBelowADInplaceOrView) was commonly used for inference workloads
but is unsafe — it can silently bypass safety checks and produce wrong results.
For inference-only workloads (e.g. loading a pretrained JIT model and
running inference in C++ runtime), use c10::InferenceMode as a drop-in
replacement. It preserves the performance characteristics while providing
correctness guarantees.
For custom autograd kernels that need to redispatch below the Autograd
dispatch key, use AutoDispatchBelowADInplaceOrView instead:
class ROIAlignFunction : public torch::autograd::Function<ROIAlignFunction> {
public:
static torch::autograd::variable_list forward(
torch::autograd::AutogradContext* ctx,
const torch::autograd::Variable& input,
const torch::autograd::Variable& rois,
double spatial_scale, int64_t pooled_height,
int64_t pooled_width, int64_t sampling_ratio, bool aligned) {
ctx->saved_data["spatial_scale"] = spatial_scale;
ctx->save_for_backward({rois});
at::AutoDispatchBelowADInplaceOrView guard;
auto result = roi_align(input, rois, spatial_scale,
pooled_height, pooled_width, sampling_ratio, aligned);
return {result};
}
};