base/BFGS.md
The smile.math.BFGS class implements three closely related quasi-Newton
minimization algorithms, all accessible through overloaded minimize() static
methods:
| Variant | Method signature | Memory | Constraints |
|---|---|---|---|
| BFGS | minimize(func, x, gtol, maxIter) | O(n²) | none |
| L-BFGS | minimize(func, m, x, gtol, maxIter) | O(m·n) | none |
| L-BFGS-B | minimize(func, m, x, l, u, gtol, maxIter) | O(m·n) | box bounds |
The Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is the gold standard for moderate-sized unconstrained smooth minimization. It belongs to the class of quasi-Newton methods: instead of computing the exact Hessian ∇²f(x) (expensive) it maintains a running approximation H of the inverse Hessian, updated cheaply at each iteration using only gradient differences.
Each BFGS iteration performs three steps:
$$ H_{k+1} = H_k
where sₖ = xₖ₊₁ − xₖ and yₖ = ∇f(xₖ₊₁) − ∇f(xₖ).
Because H is stored as a dense n × n matrix, the memory cost is O(n²), which makes BFGS impractical for problems with more than a few thousand variables.
Limited-memory BFGS (L-BFGS) avoids storing the full inverse Hessian approximation. Instead it keeps a rolling window of the last m pairs of (position difference s, gradient difference y) vectors and uses a two-loop recursion to compute the matrix–vector product Hₖ g implicitly.
Memory is O(m·n) — independent of the problem dimension. A history size of
m = 5 (the default suggested range is 3–7) is sufficient for most problems.
Limited-memory BFGS with Bounds (L-BFGS-B) extends L-BFGS to handle box constraints:
lᵢ ≤ xᵢ ≤ uᵢ for each variable xᵢ
At each iteration the algorithm:
All variants share the same pattern: the function value at the minimum is
returned, and the solution vector x is updated in-place.
All three variants require a
smile.util.function.DifferentiableMultivariateFunction, which has two methods:
/** Returns f(x). */
double f(double[] x);
/** Computes the gradient g = ∇f(x) and returns f(x). */
double g(double[] x, double[] g);
The g() method must fill the supplied array g with the partial derivatives
and also return the function value. This dual-purpose design avoids redundant
evaluations when the value and gradient are computed together.
double minimum = BFGS.minimize(func, x, gtol, maxIter);
| Parameter | Meaning |
|---|---|
func | DifferentiableMultivariateFunction to minimize |
x | initial guess; overwritten with the solution on return |
gtol | gradient convergence tolerance (e.g. 1e-5); must be > 0 |
maxIter | maximum number of outer iterations |
Returns the function value at the best point found.
Convergence is declared when either:
gtol, or4 × ε (machine epsilon × 10⁻⁸).If maxIter is reached without convergence a warning is logged and the best
value found so far is returned.
double minimum = BFGS.minimize(func, m, x, gtol, maxIter);
| Parameter | Meaning |
|---|---|
m | number of correction pairs to retain (3 ≤ m ≤ 7 recommended; m = 5 is a common choice) |
| (others) | same as BFGS |
Use L-BFGS instead of BFGS when n > ~1000 or when memory is limited. For small n the dense BFGS method typically converges in fewer iterations.
double minimum = BFGS.minimize(func, m, x, l, u, gtol, maxIter);
| Parameter | Meaning |
|---|---|
l | lower bounds array, length n |
u | upper bounds array, length n |
| (others) | same as L-BFGS |
Both l and u must have the same length as x. The initial point x need
not satisfy the bounds; x is clamped to [l, u] before the first iteration.
Use Double.NEGATIVE_INFINITY / Double.POSITIVE_INFINITY to leave individual
components unbounded.
The internal convergence constants can be tuned via JVM system properties:
| Property | Default | Meaning |
|---|---|---|
smile.bfgs.epsilon | 1e-8 | Base epsilon used to derive TOLX and TOLF |
Set on the command line with -Dsmile.bfgs.epsilon=1e-10.
The Rosenbrock banana function f(x, y) = (1-x)² + 100(y-x²)² has a
narrow curved valley; its global minimum is 0 at (1, 1).
import smile.math.BFGS;
import smile.util.function.DifferentiableMultivariateFunction;
DifferentiableMultivariateFunction rosenbrock = new DifferentiableMultivariateFunction() {
@Override
public double f(double[] x) {
double t1 = 1.0 - x[0];
double t2 = 10.0 * (x[1] - x[0] * x[0]);
return t1 * t1 + t2 * t2;
}
@Override
public double g(double[] x, double[] g) {
double t1 = 1.0 - x[0];
double t2 = 10.0 * (x[1] - x[0] * x[0]);
g[1] = 20.0 * t2;
g[0] = -2.0 * (x[0] * g[1] + t1);
return t1 * t1 + t2 * t2;
}
};
double[] x = {-1.2, 1.0}; // starting point
double fmin = BFGS.minimize(rosenbrock, x, 1e-5, 500);
System.out.printf("Minimum: f = %.2e at (%.6f, %.6f)%n", fmin, x[0], x[1]);
// Minimum: f = 3.45e-10 at (1.000000, 1.000000)
int n = 10_000;
// Quadratic bowl: f(x) = Σ i·xᵢ² — minimum is 0 at the origin
DifferentiableMultivariateFunction bowl = new DifferentiableMultivariateFunction() {
@Override
public double f(double[] x) {
double sum = 0;
for (int i = 0; i < x.length; i++) sum += (i + 1.0) * x[i] * x[i];
return sum;
}
@Override
public double g(double[] x, double[] g) {
double sum = 0;
for (int i = 0; i < x.length; i++) {
g[i] = 2.0 * (i + 1.0) * x[i];
sum += (i + 1.0) * x[i] * x[i];
}
return sum;
}
};
double[] x = new double[n];
java.util.Arrays.fill(x, 1.0); // start at all-ones
// m = 5 history pairs — dense BFGS would need a 10000×10000 matrix
double fmin = BFGS.minimize(bowl, 5, x, 1e-8, 1000);
System.out.printf("Minimum: %.6e%n", fmin);
Find the minimum of f(x₀, x₁) = (x₀ − 3)² + (x₁ − 4)² + 1 subject to
3.5 ≤ x₀ ≤ 5 and 3.5 ≤ x₁ ≤ 5. The unconstrained minimum (3, 4) is
outside the box, so the constrained minimum is at the boundary (3.5, 4).
DifferentiableMultivariateFunction func = new DifferentiableMultivariateFunction() {
@Override
public double f(double[] x) {
double d0 = x[0] - 3, d1 = x[1] - 4;
return d0 * d0 + d1 * d1 + 1;
}
@Override
public double g(double[] x, double[] g) {
double d0 = x[0] - 3, d1 = x[1] - 4;
g[0] = 2 * d0;
g[1] = 2 * d1;
return d0 * d0 + d1 * d1 + 1;
}
};
double[] x = {0.0, 0.0}; // starting point (outside bounds is fine)
double[] l = {3.5, 3.5};
double[] u = {5.0, 5.0};
double fmin = BFGS.minimize(func, 5, x, l, u, 1e-8, 500);
System.out.printf("x = (%.4f, %.4f), f = %.4f%n", x[0], x[1], fmin);
// x = (3.5000, 4.0000), f = 1.2500
Problem size Constraints Recommended variant
──────────────────────────────────────────────────────────
n ≤ 1 000 none BFGS
n > 1 000 none L-BFGS (m = 5)
any n box bounds L-BFGS-B (m = 5)
General guidelines:
n > 1000 to avoid the O(n²) Hessian storage.DifferentiableMultivariateFunction.All variants use the same two stopping tests:
Step-length test — converged when the largest relative step component
falls below TOLX = 4ε:
max_i |Δxᵢ| / max(|xᵢ|, 1) < TOLX
Gradient test — converged when the scaled infinity-norm of the gradient
falls below gtol:
max_i |∂f/∂xᵢ| · max(|xᵢ|, 1) / max(f, 1) < gtol
L-BFGS-B uses the projected gradient norm instead, which respects the box constraints:
max_i |proj_gradient_i| < gtol
where proj_gradient_i = clip(xᵢ − gᵢ, lᵢ, uᵢ) − xᵢ.
All three variants share the same Wolfe-condition backtracking line search. Starting from a full step α = 1, the step is reduced using:
with the minimum step bounded below at 0.1 × α to prevent steps from
shrinking too fast, and bounded above at stpmax = 100 × max(‖x‖, n) to
prevent excessively large jumps into undefined regions.
The sufficient decrease condition (Armijo) is checked with ftol = 1e-4.
SMILE — © 2010-2026 Haifeng Li. GNU GPL licensed.