Back to Smile

SMILE — Regression User Guide

core/REGRESSION.md

6.1.037.5 KB
Original Source

SMILE — Regression User Guide

The smile.regression package provides a comprehensive suite of regression algorithms. This guide describes the core abstractions, explains each algorithm with its API, key parameters, and usage patterns, and ends with a comparative summary and practical tips.


Table of Contents

  1. Package Overview
  2. Core Abstractions
  3. Linear Models
  4. Generalized Linear Models
  5. Tree and Ensemble Regressors
  6. Support Vector Regression
  7. Gaussian Process Regression
  8. Neural Network and RBF
  9. Algorithm Comparison
  10. Common Patterns and Tips

1. Package Overview

The smile.regression package provides a complete suite of supervised regression algorithms for predicting a continuous response variable. The package follows a consistent design:

  • Static fit(…) factory methods — every algorithm exposes one or more fit methods; there is no mutable builder to misconfigure.
  • Uniform Regression<T> interface — a single predict(T x) for scalar output, plus batch, list, and dataset overloads; optional online update.
  • DataFrame integration — algorithms that work with named columns implement DataFrameRegression and accept Formula + DataFrame.
  • Serializable — every model implements java.io.Serializable for persistence.
  • Properties round-trip — hyperparameters can be read from and written to java.util.Properties via nested Options records, enabling configuration-file workflows.

The package also contains two sub-packages:

  • smile.regression.glm — GLM distribution families (link functions, variance, deviance, log-likelihood).
  • smile.regression.gam — B-spline and smoothing spline primitives for GAM.

2. Core Abstractions

2.1 Regression Interface

java
public interface Regression<T> extends ToDoubleFunction<T>, Serializable

The central contract that every regression model implements.

MethodDescription
double predict(T x)Predicts the response value for a single instance.
double[] predict(T[] x)Batch prediction over an array.
double[] predict(List<T> x)Batch prediction over a list.
double[] predict(Dataset<T,?> x)Batch prediction over a Dataset.
boolean online()Returns true if the model supports incremental updates.
void update(T x, double y)Online update with one sample (default: throws).
void update(T[] x, double[] y)Batch online update.
void update(Dataset<T,Double> batch)Batch update from a Dataset.
double applyAsDouble(T x)Alias for predict; satisfies ToDoubleFunction<T>.

The interface also declares a nested Trainer<T, M>:

java
interface Trainer<T, M extends Regression<T>> {
    M fit(T[] x, double[] y);
    M fit(T[] x, double[] y, Properties params);
}

Use Trainer when writing generic code that works with any regression algorithm or when hyperparameters come from a configuration file.

Ensemble average. The static factory method Regression.ensemble(models...) returns a model whose prediction is the arithmetic mean of all constituent models. The ensemble is an online learner only if every base model is.

java
Regression<double[]> avg = Regression.ensemble(model1, model2, model3);
double y = avg.predict(sample);  // mean of three predictions

Online learning check.

java
if (model.online()) {
    model.update(newX, newY);
}

2.2 DataFrameRegression

java
public interface DataFrameRegression extends Regression<Tuple>

An extension for models trained on DataFrame objects via a Formula. It adds:

MethodDescription
Formula formula()The formula used at training time.
StructType schema()The feature schema of the design matrix.
double predict(Tuple x)Predicts from a named-column Tuple.
double[] predict(DataFrame df)Predicts all rows of a DataFrame.

All tree-based models (RegressionTree, RandomForest, GradientTreeBoost) and GLM, GAM, LinearModel implement DataFrameRegression.

The static helper DataFrameRegression.of(formula, model) adapts any Regression<double[]> into a DataFrameRegression, applying formula extraction automatically:

java
Regression<double[]> base = RidgeRegression.fit(x, y, lambda);
DataFrameRegression dfModel = DataFrameRegression.of(formula, data, base::fit);

2.3 LinearModel

The shared result type produced by OLS, RidgeRegression, LASSO, and ElasticNet. Implements DataFrameRegression and provides comprehensive statistical diagnostics.

Key accessors:

MethodDescription
double[] coefficients()Fitted weights (last element is the intercept when included in the design).
double intercept()The intercept term.
double RSquared()Coefficient of determination R².
double adjustedRSquared()R² adjusted for the number of predictors.
double RSS()Residual sum of squares.
double TSS()Total sum of squares.
double error()Root mean square error of the residuals (σ̂).
double[][] ttest()t-tests of coefficients: estimate, std error, t-statistic, p-value.
double F()F-statistic for overall significance.
double pvalue()p-value of the F-test.
double[] fittedValues()Training fitted values.
double[] residuals()Training residuals.
double predict(Tuple x)Predict from a Tuple.
double predict(double[] x)Predict from a raw feature array.
double[] predict(DataFrame df)Batch prediction.
void update(double[] x, double y)Recursive Least Squares (RLS) update when enabled.
String toString()Human-readable summary resembling R's lm output.

Recursive Least Squares (online updates) is available when OLS.Options.recursive = true (the default). Each call to update(x, y) adjusts the model using the Sherman– Morrison–Woodbury formula; no matrix factorization is required per update.


3. Linear Models

3.1 OLS — Ordinary Least Squares

Ordinary least squares minimises the sum of squared residuals ‖y - Xβ‖². The design matrix is built automatically from the Formula — you do not add an intercept column manually.

java
LinearModel model = OLS.fit(formula, data);
LinearModel model = OLS.fit(formula, data, options);

Options

OptionDefaultDescription
methodQRFactorization method: QR (default, fast, numerically stable) or SVD (handles rank-deficient X).
stderrtrueWhether to compute standard errors, t-statistics, and p-values for all coefficients.
recursivetrueWhether to enable recursive least squares via model.update(x, y).
java
// Properties API
Properties props = new Properties();
props.setProperty("smile.ols.method", "QR");
props.setProperty("smile.ols.standard_error", "true");
props.setProperty("smile.ols.recursive", "false");
LinearModel model = OLS.fit(formula, data, OLS.Options.of(props));

Full diagnostics example:

java
LinearModel model = OLS.fit(Formula.lhs("y"), data);
System.out.println(model);  // prints coefficients, std errors, t-tests, R², F-stat

double r2   = model.RSquared();
double rmse = model.error();

double[][] ttest = model.ttest();
// ttest[i][0] = coefficient estimate
// ttest[i][1] = standard error
// ttest[i][2] = t-statistic
// ttest[i][3] = two-sided p-value

RLS update:

java
LinearModel model = OLS.fit(formula, data);  // recursive=true by default
model.update(newFeatures, newTarget);         // Sherman-Morrison-Woodbury

Limitations. Fails when n ≤ p (more predictors than observations) — use RidgeRegression or LASSO in that regime. With multicollinearity the QR solver automatically falls back to SVD.

3.2 RidgeRegression

Ridge regression adds an L2 penalty λ‖β‖² to the least-squares objective, shrinking all coefficients toward zero. Features are standardised internally so the penalty is applied fairly across different scales.

java
LinearModel model = RidgeRegression.fit(formula, data, lambda);
LinearModel model = RidgeRegression.fit(x, y, lambda);

// Via Options record
LinearModel model = RidgeRegression.fit(formula, data, options);

Parameters.

ParameterDescription
lambdaRegularization strength (λ > 0). Larger = more shrinkage. Typical range: 0.001–1000. Select via cross-validation.
java
// Convenience: fit with explicit lambda
LinearModel model = RidgeRegression.fit(formula, boston, 0.1);

double[] coef = model.coefficients();
double   rmse = model.error();

When to use. When you have many moderately-relevant predictors and need stable estimation under multicollinearity. Ridge does not zero any coefficient exactly, so it does not perform variable selection.

3.3 LASSO

LASSO (Least Absolute Shrinkage and Selection Operator) adds an L1 penalty λ‖β‖₁, producing sparse solutions where some coefficients are exactly zero. It is solved via an interior-point PCG algorithm on the standardised design matrix.

java
LinearModel model = LASSO.fit(formula, data, lambda);
LinearModel model = LASSO.fit(x, y, lambda);
LinearModel model = LASSO.fit(x, y, lambda, tol, maxIter);

Parameters.

ParameterDefaultDescription
lambdaL1 penalty. Larger = sparser solution.
tol1E-4Convergence tolerance for the duality gap.
maxIter500Maximum interior-point iterations.

When to use. When you believe only a subset of predictors is truly relevant (sparse signal), or when you need an interpretable model with automatic variable selection.

3.4 ElasticNet

Elastic net combines L1 and L2 penalties, expressed as λ₁‖β‖₁ + λ₂‖β‖². It is implemented by augmenting the design matrix and delegating to the LASSO solver.

java
LinearModel model = ElasticNet.fit(formula, data, lambda1, lambda2);
LinearModel model = ElasticNet.fit(x, y, lambda1, lambda2);
LinearModel model = ElasticNet.fit(x, y, lambda1, lambda2, tol, maxIter);

Parameters.

ParameterDescription
lambda1L1 penalty coefficient (> 0). Controls sparsity.
lambda2L2 penalty coefficient (> 0). Controls grouping effect.

When to use. When predictors are correlated in groups. LASSO tends to pick one from each group arbitrarily; elastic net tends to include or exclude the group as a whole. Also useful when p > n.

Choosing λ₁ and λ₂. A common approach is to fix the mixing ratio α = λ₁/(λ₁+λ₂) ∈ [0,1] (0 = pure ridge, 1 = pure LASSO) and select total penalty by cross-validation.


4. Generalized Linear Models

4.1 GLM — Generalized Linear Model

GLM extends linear regression to non-normal response distributions through a link function and a variance function, fitted by Iteratively Weighted Least Squares (IWLS).

java
GLM model = GLM.fit(formula, data, model);
GLM model = GLM.fit(formula, data, model, options);

Where model is a smile.regression.glm.Model instance (a distribution family with link function). See Section 4.2 for available families.

Options

OptionDefaultDescription
tol1E-5Convergence tolerance on change in deviance between IWLS iterations.
maxIter50Maximum IWLS iterations.
java
// Properties API
Properties props = new Properties();
props.setProperty("smile.glm.tolerance",  "1E-6");
props.setProperty("smile.glm.iterations", "100");
GLM model = GLM.fit(formula, data, Bernoulli.logit(), GLM.Options.of(props));

API.

MethodDescription
double[] coefficients()Fitted linear predictor coefficients (including intercept).
double[][] ztest()Coefficient z-tests: estimate, std error, z-score, p-value.
double[] fittedValues()Fitted mean values on the response scale (after invlink).
double[] devianceResiduals()Deviance residuals for each observation.
double deviance()Residual deviance of the fitted model.
double logLikelihood()Log-likelihood of the fitted model.
double AIC()Akaike information criterion 2k − 2ℓ.
double BIC()Bayesian information criterion.
double predict(Tuple x)Predicts on the response scale via invlink(Xβ).
double[] predict(DataFrame df)Batch prediction.
String toString()R-style summary with deviance table, coefficients, z-tests.

Interpreting output:

java
GLM model = GLM.fit(Formula.lhs("y"), data, Bernoulli.logit());
System.out.println(model);        // prints summary

double[][] z = model.ztest();
// z[i][0] = coefficient  (log-odds scale for logit link)
// z[i][1] = std error
// z[i][2] = z-score
// z[i][3] = p-value (two-sided Wald test)

// Prediction returns probability (0–1) for Bernoulli/Binomial
double prob = model.predict(testTuple);

Requirements. n > p (more observations than predictors). Multinomial response is not supported; use smile.classification.LogisticRegression instead.

4.2 GLM Families (smile.regression.glm)

Each family is a Java interface (Bernoulli, Binomial, Poisson, Gaussian) exposing a static factory that returns a Model.

FamilyFactoryLinkResponse type
BernoulliBernoulli.logit()logitBinary (0/1) outcomes
BinomialBinomial.logit(int[] n)logitSample proportions (counts / n), each with trial count n[i]
PoissonPoisson.log()logNon-negative count data
GaussianGaussian.identity()identityContinuous with additive Gaussian noise
GaussianGaussian.log()logPositive continuous; multiplicative errors
GaussianGaussian.inverse()inversePositive continuous; inverse relationship

Bernoulli (binary logistic regression):

java
// y must be 0 or 1
GLM model = GLM.fit(Formula.lhs("default"), data, Bernoulli.logit());
double prob = model.predict(tuple);  // P(y = 1 | x)

Binomial (grouped binary outcomes):

java
// y[i] = k[i]/n[i] (proportion of successes), n[i] = number of trials
int[] n = {10, 20, 15, ...};
GLM model = GLM.fit(Formula.lhs("proportion"), data, Binomial.logit(n));

Poisson (count regression):

java
// y must be non-negative integers
GLM model = GLM.fit(Formula.lhs("count"), data, Poisson.log());
double rate = model.predict(tuple);  // estimated rate λ

Gaussian with non-default link (e.g. log):

java
// Use when the multiplicative model is more natural than additive
GLM model = GLM.fit(Formula.lhs("price"), data, Gaussian.log());

4.3 GAM — Generalized Additive Model

GAM models the response as the sum of smooth non-parametric functions of each predictor: g(μ) = β₀ + f₁(x₁) + f₂(x₂) + …. Each fᵢ is a P-spline fitted by penalized iteratively reweighted least squares (PIRLS) with backfitting.

java
GAM model = GAM.fit(formula, data);                    // Gaussian identity by default
GAM model = GAM.fit(formula, data, glmModel);           // custom family
GAM model = GAM.fit(formula, data, glmModel, options);  // with hyperparameters

Options

OptionDefaultDescription
k5Number of B-spline knots per predictor. More knots = more flexible.
lambdasnullSmoothing penalties per predictor. Estimated from data if null.
tol1E-4Backfitting convergence tolerance.
maxIter50Maximum backfitting iterations.

API.

java
GAM model = GAM.fit(Formula.lhs("price"), data);

double[] edf   = model.effectiveDF();   // effective degrees of freedom per predictor
double   aic   = model.AIC();
double   bic   = model.BIC();
double   dev   = model.deviance();
double   logL  = model.logLikelihood();

double yhat = model.predict(tuple);
double[] yhats = model.predict(df);

Restrictions. Only numeric predictors are supported; categorical variables must be encoded before use. The formula must produce a purely numeric design matrix.


5. Tree and Ensemble Regressors

All three models implement DataFrameRegression and accept Formula + DataFrame.

5.1 RegressionTree

A single CART regression tree that minimises squared-error loss (or a custom Loss function when used as a boosting base learner).

java
RegressionTree model = RegressionTree.fit(formula, data);
RegressionTree model = RegressionTree.fit(formula, data, props);

Options (via Properties)

PropertyDefaultDescription
smile.cart.node.size5Minimum leaf node size.
smile.cart.max.depth20Maximum tree depth.
smile.cart.max.nodes0 (unlimited)Maximum leaf nodes.
smile.regression_tree.bins-1 (auto)Number of histogram bins for continuous features.

API.

java
double   yhat   = model.predict(tuple);
double[] yhats  = model.predict(df);
double[] imp    = model.importance();   // impurity-based feature importance
String   text   = model.toString();     // printable tree
Formula  f      = model.formula();

Use cases. A quick non-parametric baseline. Prone to overfitting — prefer RandomForest or GradientTreeBoost in production.

5.2 RandomForest

An ensemble of regression trees, each trained on a bootstrap sample with random feature subsets at each split. Aggregates via the arithmetic mean.

java
RandomForest model = RandomForest.fit(formula, data);
RandomForest model = RandomForest.fit(formula, data, props);

Options (via Properties)

PropertyDefaultDescription
smile.random.forest.trees500Number of trees.
smile.random.forest.mtrymax(p/3, 1)Feature count sampled per split.
smile.random.forest.max.depth20Maximum tree depth.
smile.random.forest.node.size5Minimum node size.
smile.random.forest.sampling.rate1.0Bootstrap fraction (< 1 for subsampling).

API.

java
double   yhat    = model.predict(tuple);
double[] yhats   = model.predict(df);

RegressionMetrics metrics = model.metrics();  // OOB RMSE, MAD, etc.
double   oobRMSE = model.metrics().rmse();

double[] imp     = model.importance();         // feature importance
double[][] shap  = model.shap(df);             // SHAP values
Formula  formula = model.formula();

// Merge two forests (e.g. parallel training)
RandomForest merged = model1.merge(model2);

// Trim the least accurate trees
model.trim(200);  // keep best 200 trees

OOB estimation. Each tree is evaluated on the samples not used in its bootstrap, providing an almost-free unbiased generalization error estimate:

java
System.out.println("OOB RMSE: " + model.metrics().rmse());

5.3 GradientTreeBoost

Gradient boosting with regression trees: each tree fits the pseudo-residuals (negative gradient of the loss) of the current ensemble. Supports multiple loss functions.

java
GradientTreeBoost model = GradientTreeBoost.fit(formula, data);
GradientTreeBoost model = GradientTreeBoost.fit(formula, data, props);

Options (via Properties)

PropertyDefaultDescription
smile.gbt.trees500Number of boosting rounds.
smile.gbt.lossLeastSquaresLoss function: LeastSquares, LeastAbsoluteDeviation, Huber.
smile.gbt.shrinkage0.05Learning rate. Smaller = more trees needed but better generalization.
smile.gbt.max.depth5Maximum tree depth.
smile.gbt.node.size5Minimum node size.
smile.gbt.sampling.rate0.7Subsample fraction per tree (stochastic GBT).

API.

java
double   yhat   = model.predict(tuple);
double[] yhats  = model.predict(df);

double[] imp    = model.importance();   // split-gain feature importance
double[][] shap = model.shap(df);       // SHAP values

// Inspect cumulative test error as trees are added
int[][] test = model.test(testDf);      // [n_trees][n_samples]

Loss functions.

LossBest for
LeastSquaresStandard regression; assumes Gaussian noise
LeastAbsoluteDeviationRobust to outliers; fits the conditional median
HuberCombines advantages of both; needs a delta quantile threshold

Tuning. Shrinkage and tree count are a trade-off: smaller shrinkage (0.01–0.05) usually gives the best results but requires more trees. Subsampling (sampling.rate < 1.0) adds stochasticity that reduces overfitting.


6. Support Vector Regression

6.1 SVM — Epsilon-SVR

The SVM class is a static factory for epsilon-insensitive support vector regression. It wraps smile.model.svm.SVR and returns a Regression model. No penalty is imposed on training samples whose prediction error is within ±ε of the true value.

java
// Dense Euclidean / linear kernel
Regression<double[]>  model = SVM.fit(x, y, options);

// Dense arbitrary kernel
KernelMachine<T>      model = SVM.fit(x, y, kernel, options);

// Dense sparse (SparseArray)
SparseLinearSVM       model = SVM.fit(x, y, p, options);   // x: SparseArray[]

// Binary sparse (0/1 feature arrays)
BinarySparseLinearSVM model = SVM.fit(x, y, p, options);   // x: int[][]

// From Properties (auto-selects kernel from smile.svm.kernel)
Regression<double[]>  model = SVM.fit(x, y, props);

SVM.Options record.

ParameterDefaultDescription
eps1.0Epsilon tube half-width. Smaller = tighter fit.
C1.0Soft margin penalty. Larger = lower training error, less regularization.
tol1E-3SMO convergence tolerance.
java
SVM.Options opts = new SVM.Options(0.5, 10.0);         // eps=0.5, C=10
SVM.Options opts = new SVM.Options(0.5, 10.0, 1E-4);  // explicit tolerance

// Round-trip via Properties
Properties props = opts.toProperties();
SVM.Options restored = SVM.Options.of(props);

Kernel selection.

KernelClassTypical use
LinearLinearKernelHigh-dimensional sparse / text data
Gaussian (RBF)GaussianKernel(sigma)General nonlinear; most popular
PolynomialPolynomialKernel(d)Polynomial-shaped responses
LaplacianLaplacianKernel(sigma)Robust to outliers
java
// Nonlinear SVR with a Gaussian kernel
KernelMachine<double[]> model = SVM.fit(x, y,
        new GaussianKernel(5.0),
        new SVM.Options(1.5, 100.0));

double yhat = model.predict(sample);

Properties-driven workflow:

java
Properties props = new Properties();
props.setProperty("smile.svm.kernel",    "rbf");
props.setProperty("smile.svm.epsilon",   "0.5");
props.setProperty("smile.svm.C",         "100");
props.setProperty("smile.svm.tolerance", "1E-3");
Regression<double[]> model = SVM.fit(x, y, props);

6.2 LinearSVM, SparseLinearSVM, BinarySparseLinearSVM

Thin wrappers around a trained LinearKernelMachine that expose the weight vector and intercept. All three return the continuous regression score — not a classification label.

ClassInput typeUse case
LinearSVMdouble[]Dense linear SVR
SparseLinearSVMSparseArraySparse feature vectors (NLP)
BinarySparseLinearSVMint[]Binary (0/1) sparse features
java
LinearSVM model = (LinearSVM) SVM.fit(x, y, new SVM.Options(1.0, 10.0));

double[] w = model.weights();    // linear coefficient vector
double   b = model.intercept();  // bias term
double   y = model.predict(x);  // w·x + b (continuous regression output)

6.3 KernelMachine

A wrapper implementing Regression<T> around smile.model.svm.KernelMachine<T>, used as the return type for nonlinear SVR. predict(x) computes the kernel expansion:

ŷ = Σᵢ αᵢ · K(xᵢ, x) + b

7. Gaussian Process Regression

Gaussian Process Regression (GPR) provides Bayesian inference over functions: given training data it produces both a predictive mean and a predictive variance (uncertainty estimate). Predictions are exact (cubic time and quadratic space in n), or approximate via Subset of Regressors (SR) / Nyström for large datasets.

java
// Exact GP (n < ~5000)
GaussianProcessRegression<double[]> model =
        GaussianProcessRegression.fit(x, y, kernel, lambda);

// Exact with kernel hyperparameter optimization (BFGS)
GaussianProcessRegression<double[]> model =
        GaussianProcessRegression.fit(x, y, kernel, lambda, true);

// Subset of Regressors (SR) approximation
double[][] inducing = ...;  // m << n inducing points
GaussianProcessRegression<double[]> model =
        GaussianProcessRegression.fit(x, y, inducing, kernel, lambda);

// Nyström approximation
GaussianProcessRegression<double[]> model =
        GaussianProcessRegression.nystrom(x, y, inducing, kernel, lambda);

Parameters.

ParameterDescription
kernelA MercerKernel<T> (e.g., GaussianKernel, MaternKernel).
lambdaNoise variance σ² (regularization). Must be > 0.
optimizeIf true, optimise kernel hyperparameters via marginal log-likelihood (BFGS).

Prediction with uncertainty:

java
// Predictive mean only
double yhat = model.predict(query);

// Predictive mean + variance/std
double[] estimation = new double[2];  // [0] = mean, [1] = std dev
model.predict(query, estimation);
double mean    = estimation[0];
double stddev  = estimation[1];

// 95% prediction interval (assuming Gaussian posterior)
double lower = mean - 1.96 * stddev;
double upper = mean + 1.96 * stddev;

Joint predictive distribution over multiple test points:

java
MultivariateGaussianDistribution joint = model.predict(queries);
double[] means = joint.mean();
DenseMatrix cov = joint.cov();

Scaling. Exact GP is O(n³) to train and O(n) to predict. For n > 10,000, use SR or Nyström approximations. SR predictions are still exact functions of the inducing set; Nyström may produce negative variance — SR is generally preferred.

Kernel hyperparameter tuning. Pass optimize = true or explicitly call model.fit(kernel) with updated parameters:

java
GaussianProcessRegression<double[]> model =
        GaussianProcessRegression.fit(x, y, new GaussianKernel(1.0), 0.01, true);
// The kernel parameters are automatically optimised during fit.

8. Neural Network and RBF

8.1 MLP — Multilayer Perceptron

A fully connected feed-forward network trained by mini-batch stochastic back-propagation. Regression output is a single linear neuron (no activation on the output layer).

java
MLP model = MLP.fit(x, y,
        new Layer.ReLU(128),
        new Layer.ReLU(64)
);

// With optional output scaler (rescales targets for better training)
MLP model = MLP.fit(x, y, scaler,
        new Layer.ReLU(128),
        new Layer.ReLU(64)
);

// From Properties
MLP model = MLP.fit(x, y, props);

Layer builders (from smile.model.mlp):

TypeDescription
Layer.ReLU(units)Rectified linear — most common for regression hidden layers
Layer.Sigmoid(units)Logistic sigmoid
Layer.Tanh(units)Hyperbolic tangent
Layer.Mish(units)Smooth ReLU variant
Layer.SELU(units)Self-normalizing; works well without batch norm

Options (via Properties)

PropertyDefaultDescription
smile.mlp.learning.rate0.01SGD learning rate.
smile.mlp.momentum0.0Momentum coefficient.
smile.mlp.weight.decay0.0L2 weight regularization.
smile.mlp.epochs10Number of training epochs.
smile.mlp.mini.batch32Mini-batch size.

Online learning:

java
model.update(sample, target);           // single-sample SGD step
model.update(batchX, batchY);           // mini-batch step

8.2 RBFNetwork — Radial Basis Function Network

A two-layer network: the first layer computes RBF activations from a set of centres (typically found by k-means), the second layer is a linear least-squares fit.

java
// Automatic centres from k-means
RBFNetwork<double[]> model = RBFNetwork.fit(x, y, rbf);

// Pre-computed centres
RBFNetwork<double[]> model = RBFNetwork.fit(x, y, centers, rbf, normalized);
  • rbf: array of RadialBasisFunction objects (one per centre). Common choice: GaussianRadialBasis(sigma).
  • normalized: if true, activations are normalized to sum to 1 (partition-of-unity property; often improves extrapolation).
java
double yhat = model.predict(sample);

RBFNetwork does not support online updates.


9. Algorithm Comparison

AlgorithmInputOnlineDataFrameNotes
OLSdouble[] via Formula✓ (RLS)Full statistical diagnostics; exact solution
RidgeRegressiondouble[] / Formula✓ (RLS)L2 penalty; handles multicollinearity
LASSOdouble[] / FormulaL1 penalty; sparse coefficients
ElasticNetdouble[] / FormulaL1+L2 penalty; grouped sparsity
GLMTuple via FormulaNon-Gaussian families; IWLS
GAMTuple via FormulaNon-parametric smooth terms
RegressionTreeTuple via FormulaInterpretable; baseline only
RandomForestTuple via FormulaBest general-purpose; OOB error
GradientTreeBoostTuple via FormulaHighest accuracy; robust losses
SVM (linear)double[]Margin-based; ε-insensitive
SVM (kernel)T (generic)Nonlinear kernel; support vectors
SparseLinearSVMSparseArraySparse high-dim data
BinarySparseLinearSVMint[]Binary feature vectors
GaussianProcessRegressionT (generic)Uncertainty estimates; O(n³)
MLPdouble[]Deep learning; highly expressive
RBFNetworkT (generic)Two-layer; k-means centres

10. Common Patterns and Tips

Serialization

All models implement java.io.Serializable. Use smile.io.Read and smile.io.Write:

java
import smile.io.Write;
import smile.io.Read;

Write.object(model, Path.of("model.ser"));
Regression<?> loaded = (Regression<?>) Read.object(Path.of("model.ser"));

Configuration via Properties

Every algorithm with tunable hyperparameters exposes a fit(…, Properties) overload and a nested Options record with toProperties() / of(Properties) for serialization:

java
// Save configuration
Properties props = new SVM.Options(0.5, 10.0).toProperties();
try (FileWriter fw = new FileWriter("svm.properties")) { props.store(fw, null); }

// Restore and train
Properties loaded = new Properties();
try (FileReader fr = new FileReader("svm.properties")) { loaded.load(fr); }
Regression<double[]> model = SVM.fit(x, y, loaded);

Reproducibility

Random sampling (random forests, GBT subsampling, k-means RBF centres) depends on the global RNG. Seed it before training:

java
smile.math.MathEx.setSeed(12345);
RandomForest model = RandomForest.fit(formula, data);

Online (Incremental) Learning

Only OLS (via RLS), RidgeRegression (via RLS when V is set), and MLP support online updates. Always check before calling:

java
if (model.online()) {
    model.update(newX, newY);
}

Feature Scaling

Scale-invariant models (trees, random forests, GBT) work well without normalization. Scale-sensitive models require it:

Scale-insensitiveScale-sensitive (normalize first)
RegressionTree, RandomForest, GradientTreeBoostOLS, RidgeRegression, LASSO, ElasticNet
SVM, GaussianProcessRegression, RBFNetwork, MLP
java
// Standardize columns in-place before fitting SVM
MathEx.standardize(x);
Regression<double[]> model = SVM.fit(x, y, new SVM.Options(1.0, 10.0));

Choosing between linear regularisation methods

SituationRecommendation
All predictors relevant, multicollinearity presentRidgeRegression
Sparse signal — most predictors irrelevantLASSO
Correlated predictor groups, sparse overallElasticNet
n < p (under-determined system)LASSO or ElasticNet

Choosing between ensemble methods

SituationRecommendation
Fast baseline, interpretable rulesRegressionTree
Balanced accuracy + speed + OOB estimateRandomForest
Highest accuracy, can tuneGradientTreeBoost
Robustness to outliersGradientTreeBoost with Huber or LeastAbsoluteDeviation loss

GLM family selection

Response typeFamilyLink
Continuous, additive noiseGaussianidentity
Positive continuous, multiplicative noiseGaussianlog
Proportions or ratesGaussianinverse
Binary 0/1Bernoullilogit
Grouped binary (k successes out of n)Binomiallogit
Non-negative integer countsPoissonlog

Gaussian Process: when to use exact vs approximate

nRecommendation
n ≤ 1,000Exact GP; optimise kernel hyperparameters
1,000 < n ≤ 10,000Exact GP without hyperparameter optimization
n > 10,000Subset of Regressors (SR) approximation; choose m ≈ √n
n > 100,000Consider switching to GBT or MLP

Inspecting GLM and OLS models programmatically

java
// Coefficient p-values (identify significant predictors)
LinearModel ols = OLS.fit(formula, data);
double[][] ttest = ols.ttest();
for (int i = 0; i < ttest.length; i++) {
    if (ttest[i][3] < 0.05) {
        System.out.printf("%-20s coef=%+.4f  p=%.4f%n",
                ols.schema().field(i).name(), ttest[i][0], ttest[i][3]);
    }
}

// GLM coefficient interpretation (logit scale for Bernoulli/Binomial)
GLM glm = GLM.fit(formula, data, Bernoulli.logit());
double[][] ztest = glm.ztest();
for (int i = 0; i < ztest.length; i++) {
    double logOddsRatio = ztest[i][0];
    double oddsRatio    = Math.exp(logOddsRatio);
    System.out.printf("OR=%.3f  p=%.4f%n", oddsRatio, ztest[i][3]);
}

Memory and scale considerations

ConcernGuidance
n > 1M rowsRandomForest, GradientTreeBoost; avoid exact GP.
p > n (wide data)LASSO, ElasticNet, SparseLinearSVM.
Streaming / incrementalMLP (online), OLS with RLS option.
Sparse high-dimensional textSparseLinearSVM, BinarySparseLinearSVM.
Uncertainty quantificationGaussianProcessRegression (exact or SR).
Interpretable modelOLS, RidgeRegression, LASSO, RegressionTree.

SMILE — Copyright © 2010–2026 Haifeng Li. GNU GPL licensed.