docs/cpp/source/api/optim/gradient_descent.md
These optimizers use gradient descent with optional enhancements like momentum. They are the foundation of neural network training and work well when you can afford careful hyperparameter tuning.
The classic optimization algorithm. SGD with momentum is often the best choice for convolutional neural networks when properly tuned. While requiring more careful learning rate selection than adaptive methods, it frequently achieves the best final accuracy.
When to use:
Key parameters:
lr: Learning rate (typical: 0.01-0.1 for CNNs)momentum: Accelerates convergence (typical: 0.9)weight_decay: L2 regularization coefficientnesterov: Use Nesterov momentum (often improves convergence):members:
:undoc-members:
Example:
// Standard SGD with momentum - good for CNNs
auto optimizer = torch::optim::SGD(
model->parameters(),
torch::optim::SGDOptions(0.01) // learning rate
.momentum(0.9) // momentum factor
.weight_decay(1e-4) // L2 regularization
.nesterov(true)); // Nesterov momentum