core/REGRESSION.md
The smile.regression package provides a comprehensive suite of regression algorithms.
This guide describes the core abstractions, explains each algorithm with its API,
key parameters, and usage patterns, and ends with a comparative summary and practical tips.
The smile.regression package provides a complete suite of supervised regression
algorithms for predicting a continuous response variable. The package follows a
consistent design:
fit(…) factory methods — every algorithm exposes one or more fit
methods; there is no mutable builder to misconfigure.Regression<T> interface — a single predict(T x) for scalar output,
plus batch, list, and dataset overloads; optional online update.DataFrameRegression and accept Formula + DataFrame.java.io.Serializable for persistence.java.util.Properties via nested Options records, enabling configuration-file
workflows.The package also contains two sub-packages:
smile.regression.glm — GLM distribution families (link functions, variance,
deviance, log-likelihood).smile.regression.gam — B-spline and smoothing spline primitives for GAM.Regression Interfacepublic interface Regression<T> extends ToDoubleFunction<T>, Serializable
The central contract that every regression model implements.
| Method | Description |
|---|---|
double predict(T x) | Predicts the response value for a single instance. |
double[] predict(T[] x) | Batch prediction over an array. |
double[] predict(List<T> x) | Batch prediction over a list. |
double[] predict(Dataset<T,?> x) | Batch prediction over a Dataset. |
boolean online() | Returns true if the model supports incremental updates. |
void update(T x, double y) | Online update with one sample (default: throws). |
void update(T[] x, double[] y) | Batch online update. |
void update(Dataset<T,Double> batch) | Batch update from a Dataset. |
double applyAsDouble(T x) | Alias for predict; satisfies ToDoubleFunction<T>. |
The interface also declares a nested Trainer<T, M>:
interface Trainer<T, M extends Regression<T>> {
M fit(T[] x, double[] y);
M fit(T[] x, double[] y, Properties params);
}
Use Trainer when writing generic code that works with any regression algorithm or
when hyperparameters come from a configuration file.
Ensemble average. The static factory method Regression.ensemble(models...) returns
a model whose prediction is the arithmetic mean of all constituent models. The ensemble
is an online learner only if every base model is.
Regression<double[]> avg = Regression.ensemble(model1, model2, model3);
double y = avg.predict(sample); // mean of three predictions
Online learning check.
if (model.online()) {
model.update(newX, newY);
}
DataFrameRegressionpublic interface DataFrameRegression extends Regression<Tuple>
An extension for models trained on DataFrame objects via a Formula. It adds:
| Method | Description |
|---|---|
Formula formula() | The formula used at training time. |
StructType schema() | The feature schema of the design matrix. |
double predict(Tuple x) | Predicts from a named-column Tuple. |
double[] predict(DataFrame df) | Predicts all rows of a DataFrame. |
All tree-based models (RegressionTree, RandomForest, GradientTreeBoost) and
GLM, GAM, LinearModel implement DataFrameRegression.
The static helper DataFrameRegression.of(formula, model) adapts any
Regression<double[]> into a DataFrameRegression, applying formula extraction
automatically:
Regression<double[]> base = RidgeRegression.fit(x, y, lambda);
DataFrameRegression dfModel = DataFrameRegression.of(formula, data, base::fit);
LinearModelThe shared result type produced by OLS, RidgeRegression, LASSO, and ElasticNet.
Implements DataFrameRegression and provides comprehensive statistical diagnostics.
Key accessors:
| Method | Description |
|---|---|
double[] coefficients() | Fitted weights (last element is the intercept when included in the design). |
double intercept() | The intercept term. |
double RSquared() | Coefficient of determination R². |
double adjustedRSquared() | R² adjusted for the number of predictors. |
double RSS() | Residual sum of squares. |
double TSS() | Total sum of squares. |
double error() | Root mean square error of the residuals (σ̂). |
double[][] ttest() | t-tests of coefficients: estimate, std error, t-statistic, p-value. |
double F() | F-statistic for overall significance. |
double pvalue() | p-value of the F-test. |
double[] fittedValues() | Training fitted values. |
double[] residuals() | Training residuals. |
double predict(Tuple x) | Predict from a Tuple. |
double predict(double[] x) | Predict from a raw feature array. |
double[] predict(DataFrame df) | Batch prediction. |
void update(double[] x, double y) | Recursive Least Squares (RLS) update when enabled. |
String toString() | Human-readable summary resembling R's lm output. |
Recursive Least Squares (online updates) is available when OLS.Options.recursive = true (the default). Each call to update(x, y) adjusts the model using the Sherman–
Morrison–Woodbury formula; no matrix factorization is required per update.
Ordinary least squares minimises the sum of squared residuals ‖y - Xβ‖². The design
matrix is built automatically from the Formula — you do not add an intercept column
manually.
LinearModel model = OLS.fit(formula, data);
LinearModel model = OLS.fit(formula, data, options);
Options
| Option | Default | Description |
|---|---|---|
method | QR | Factorization method: QR (default, fast, numerically stable) or SVD (handles rank-deficient X). |
stderr | true | Whether to compute standard errors, t-statistics, and p-values for all coefficients. |
recursive | true | Whether to enable recursive least squares via model.update(x, y). |
// Properties API
Properties props = new Properties();
props.setProperty("smile.ols.method", "QR");
props.setProperty("smile.ols.standard_error", "true");
props.setProperty("smile.ols.recursive", "false");
LinearModel model = OLS.fit(formula, data, OLS.Options.of(props));
Full diagnostics example:
LinearModel model = OLS.fit(Formula.lhs("y"), data);
System.out.println(model); // prints coefficients, std errors, t-tests, R², F-stat
double r2 = model.RSquared();
double rmse = model.error();
double[][] ttest = model.ttest();
// ttest[i][0] = coefficient estimate
// ttest[i][1] = standard error
// ttest[i][2] = t-statistic
// ttest[i][3] = two-sided p-value
RLS update:
LinearModel model = OLS.fit(formula, data); // recursive=true by default
model.update(newFeatures, newTarget); // Sherman-Morrison-Woodbury
Limitations. Fails when n ≤ p (more predictors than observations) — use
RidgeRegression or LASSO in that regime. With multicollinearity the QR solver
automatically falls back to SVD.
RidgeRegressionRidge regression adds an L2 penalty λ‖β‖² to the least-squares objective, shrinking
all coefficients toward zero. Features are standardised internally so the penalty is
applied fairly across different scales.
LinearModel model = RidgeRegression.fit(formula, data, lambda);
LinearModel model = RidgeRegression.fit(x, y, lambda);
// Via Options record
LinearModel model = RidgeRegression.fit(formula, data, options);
Parameters.
| Parameter | Description |
|---|---|
lambda | Regularization strength (λ > 0). Larger = more shrinkage. Typical range: 0.001–1000. Select via cross-validation. |
// Convenience: fit with explicit lambda
LinearModel model = RidgeRegression.fit(formula, boston, 0.1);
double[] coef = model.coefficients();
double rmse = model.error();
When to use. When you have many moderately-relevant predictors and need stable estimation under multicollinearity. Ridge does not zero any coefficient exactly, so it does not perform variable selection.
LASSOLASSO (Least Absolute Shrinkage and Selection Operator) adds an L1 penalty λ‖β‖₁,
producing sparse solutions where some coefficients are exactly zero. It is solved via
an interior-point PCG algorithm on the standardised design matrix.
LinearModel model = LASSO.fit(formula, data, lambda);
LinearModel model = LASSO.fit(x, y, lambda);
LinearModel model = LASSO.fit(x, y, lambda, tol, maxIter);
Parameters.
| Parameter | Default | Description |
|---|---|---|
lambda | — | L1 penalty. Larger = sparser solution. |
tol | 1E-4 | Convergence tolerance for the duality gap. |
maxIter | 500 | Maximum interior-point iterations. |
When to use. When you believe only a subset of predictors is truly relevant (sparse signal), or when you need an interpretable model with automatic variable selection.
ElasticNetElastic net combines L1 and L2 penalties, expressed as λ₁‖β‖₁ + λ₂‖β‖². It is
implemented by augmenting the design matrix and delegating to the LASSO solver.
LinearModel model = ElasticNet.fit(formula, data, lambda1, lambda2);
LinearModel model = ElasticNet.fit(x, y, lambda1, lambda2);
LinearModel model = ElasticNet.fit(x, y, lambda1, lambda2, tol, maxIter);
Parameters.
| Parameter | Description |
|---|---|
lambda1 | L1 penalty coefficient (> 0). Controls sparsity. |
lambda2 | L2 penalty coefficient (> 0). Controls grouping effect. |
When to use. When predictors are correlated in groups. LASSO tends to pick one from
each group arbitrarily; elastic net tends to include or exclude the group as a whole.
Also useful when p > n.
Choosing λ₁ and λ₂. A common approach is to fix the mixing ratio α = λ₁/(λ₁+λ₂) ∈ [0,1] (0 = pure ridge, 1 = pure LASSO) and select total penalty by cross-validation.
GLM — Generalized Linear ModelGLM extends linear regression to non-normal response distributions through a link function and a variance function, fitted by Iteratively Weighted Least Squares (IWLS).
GLM model = GLM.fit(formula, data, model);
GLM model = GLM.fit(formula, data, model, options);
Where model is a smile.regression.glm.Model instance (a distribution family with
link function). See Section 4.2 for available
families.
Options
| Option | Default | Description |
|---|---|---|
tol | 1E-5 | Convergence tolerance on change in deviance between IWLS iterations. |
maxIter | 50 | Maximum IWLS iterations. |
// Properties API
Properties props = new Properties();
props.setProperty("smile.glm.tolerance", "1E-6");
props.setProperty("smile.glm.iterations", "100");
GLM model = GLM.fit(formula, data, Bernoulli.logit(), GLM.Options.of(props));
API.
| Method | Description |
|---|---|
double[] coefficients() | Fitted linear predictor coefficients (including intercept). |
double[][] ztest() | Coefficient z-tests: estimate, std error, z-score, p-value. |
double[] fittedValues() | Fitted mean values on the response scale (after invlink). |
double[] devianceResiduals() | Deviance residuals for each observation. |
double deviance() | Residual deviance of the fitted model. |
double logLikelihood() | Log-likelihood of the fitted model. |
double AIC() | Akaike information criterion 2k − 2ℓ. |
double BIC() | Bayesian information criterion. |
double predict(Tuple x) | Predicts on the response scale via invlink(Xβ). |
double[] predict(DataFrame df) | Batch prediction. |
String toString() | R-style summary with deviance table, coefficients, z-tests. |
Interpreting output:
GLM model = GLM.fit(Formula.lhs("y"), data, Bernoulli.logit());
System.out.println(model); // prints summary
double[][] z = model.ztest();
// z[i][0] = coefficient (log-odds scale for logit link)
// z[i][1] = std error
// z[i][2] = z-score
// z[i][3] = p-value (two-sided Wald test)
// Prediction returns probability (0–1) for Bernoulli/Binomial
double prob = model.predict(testTuple);
Requirements. n > p (more observations than predictors). Multinomial response is
not supported; use smile.classification.LogisticRegression instead.
smile.regression.glm)Each family is a Java interface (Bernoulli, Binomial, Poisson, Gaussian)
exposing a static factory that returns a Model.
| Family | Factory | Link | Response type |
|---|---|---|---|
| Bernoulli | Bernoulli.logit() | logit | Binary (0/1) outcomes |
| Binomial | Binomial.logit(int[] n) | logit | Sample proportions (counts / n), each with trial count n[i] |
| Poisson | Poisson.log() | log | Non-negative count data |
| Gaussian | Gaussian.identity() | identity | Continuous with additive Gaussian noise |
| Gaussian | Gaussian.log() | log | Positive continuous; multiplicative errors |
| Gaussian | Gaussian.inverse() | inverse | Positive continuous; inverse relationship |
Bernoulli (binary logistic regression):
// y must be 0 or 1
GLM model = GLM.fit(Formula.lhs("default"), data, Bernoulli.logit());
double prob = model.predict(tuple); // P(y = 1 | x)
Binomial (grouped binary outcomes):
// y[i] = k[i]/n[i] (proportion of successes), n[i] = number of trials
int[] n = {10, 20, 15, ...};
GLM model = GLM.fit(Formula.lhs("proportion"), data, Binomial.logit(n));
Poisson (count regression):
// y must be non-negative integers
GLM model = GLM.fit(Formula.lhs("count"), data, Poisson.log());
double rate = model.predict(tuple); // estimated rate λ
Gaussian with non-default link (e.g. log):
// Use when the multiplicative model is more natural than additive
GLM model = GLM.fit(Formula.lhs("price"), data, Gaussian.log());
GAM models the response as the sum of smooth non-parametric functions of each predictor:
g(μ) = β₀ + f₁(x₁) + f₂(x₂) + …. Each fᵢ is a P-spline fitted by penalized
iteratively reweighted least squares (PIRLS) with backfitting.
GAM model = GAM.fit(formula, data); // Gaussian identity by default
GAM model = GAM.fit(formula, data, glmModel); // custom family
GAM model = GAM.fit(formula, data, glmModel, options); // with hyperparameters
Options
| Option | Default | Description |
|---|---|---|
k | 5 | Number of B-spline knots per predictor. More knots = more flexible. |
lambdas | null | Smoothing penalties per predictor. Estimated from data if null. |
tol | 1E-4 | Backfitting convergence tolerance. |
maxIter | 50 | Maximum backfitting iterations. |
API.
GAM model = GAM.fit(Formula.lhs("price"), data);
double[] edf = model.effectiveDF(); // effective degrees of freedom per predictor
double aic = model.AIC();
double bic = model.BIC();
double dev = model.deviance();
double logL = model.logLikelihood();
double yhat = model.predict(tuple);
double[] yhats = model.predict(df);
Restrictions. Only numeric predictors are supported; categorical variables must be encoded before use. The formula must produce a purely numeric design matrix.
All three models implement DataFrameRegression and accept Formula + DataFrame.
RegressionTreeA single CART regression tree that minimises squared-error loss (or a custom Loss
function when used as a boosting base learner).
RegressionTree model = RegressionTree.fit(formula, data);
RegressionTree model = RegressionTree.fit(formula, data, props);
Options (via Properties)
| Property | Default | Description |
|---|---|---|
smile.cart.node.size | 5 | Minimum leaf node size. |
smile.cart.max.depth | 20 | Maximum tree depth. |
smile.cart.max.nodes | 0 (unlimited) | Maximum leaf nodes. |
smile.regression_tree.bins | -1 (auto) | Number of histogram bins for continuous features. |
API.
double yhat = model.predict(tuple);
double[] yhats = model.predict(df);
double[] imp = model.importance(); // impurity-based feature importance
String text = model.toString(); // printable tree
Formula f = model.formula();
Use cases. A quick non-parametric baseline. Prone to overfitting — prefer
RandomForest or GradientTreeBoost in production.
RandomForestAn ensemble of regression trees, each trained on a bootstrap sample with random feature subsets at each split. Aggregates via the arithmetic mean.
RandomForest model = RandomForest.fit(formula, data);
RandomForest model = RandomForest.fit(formula, data, props);
Options (via Properties)
| Property | Default | Description |
|---|---|---|
smile.random.forest.trees | 500 | Number of trees. |
smile.random.forest.mtry | max(p/3, 1) | Feature count sampled per split. |
smile.random.forest.max.depth | 20 | Maximum tree depth. |
smile.random.forest.node.size | 5 | Minimum node size. |
smile.random.forest.sampling.rate | 1.0 | Bootstrap fraction (< 1 for subsampling). |
API.
double yhat = model.predict(tuple);
double[] yhats = model.predict(df);
RegressionMetrics metrics = model.metrics(); // OOB RMSE, MAD, etc.
double oobRMSE = model.metrics().rmse();
double[] imp = model.importance(); // feature importance
double[][] shap = model.shap(df); // SHAP values
Formula formula = model.formula();
// Merge two forests (e.g. parallel training)
RandomForest merged = model1.merge(model2);
// Trim the least accurate trees
model.trim(200); // keep best 200 trees
OOB estimation. Each tree is evaluated on the samples not used in its bootstrap, providing an almost-free unbiased generalization error estimate:
System.out.println("OOB RMSE: " + model.metrics().rmse());
GradientTreeBoostGradient boosting with regression trees: each tree fits the pseudo-residuals (negative gradient of the loss) of the current ensemble. Supports multiple loss functions.
GradientTreeBoost model = GradientTreeBoost.fit(formula, data);
GradientTreeBoost model = GradientTreeBoost.fit(formula, data, props);
Options (via Properties)
| Property | Default | Description |
|---|---|---|
smile.gbt.trees | 500 | Number of boosting rounds. |
smile.gbt.loss | LeastSquares | Loss function: LeastSquares, LeastAbsoluteDeviation, Huber. |
smile.gbt.shrinkage | 0.05 | Learning rate. Smaller = more trees needed but better generalization. |
smile.gbt.max.depth | 5 | Maximum tree depth. |
smile.gbt.node.size | 5 | Minimum node size. |
smile.gbt.sampling.rate | 0.7 | Subsample fraction per tree (stochastic GBT). |
API.
double yhat = model.predict(tuple);
double[] yhats = model.predict(df);
double[] imp = model.importance(); // split-gain feature importance
double[][] shap = model.shap(df); // SHAP values
// Inspect cumulative test error as trees are added
int[][] test = model.test(testDf); // [n_trees][n_samples]
Loss functions.
| Loss | Best for |
|---|---|
LeastSquares | Standard regression; assumes Gaussian noise |
LeastAbsoluteDeviation | Robust to outliers; fits the conditional median |
Huber | Combines advantages of both; needs a delta quantile threshold |
Tuning. Shrinkage and tree count are a trade-off: smaller shrinkage (0.01–0.05)
usually gives the best results but requires more trees. Subsampling
(sampling.rate < 1.0) adds stochasticity that reduces overfitting.
SVM — Epsilon-SVRThe SVM class is a static factory for epsilon-insensitive support vector regression.
It wraps smile.model.svm.SVR and returns a Regression model. No penalty is imposed
on training samples whose prediction error is within ±ε of the true value.
// Dense Euclidean / linear kernel
Regression<double[]> model = SVM.fit(x, y, options);
// Dense arbitrary kernel
KernelMachine<T> model = SVM.fit(x, y, kernel, options);
// Dense sparse (SparseArray)
SparseLinearSVM model = SVM.fit(x, y, p, options); // x: SparseArray[]
// Binary sparse (0/1 feature arrays)
BinarySparseLinearSVM model = SVM.fit(x, y, p, options); // x: int[][]
// From Properties (auto-selects kernel from smile.svm.kernel)
Regression<double[]> model = SVM.fit(x, y, props);
SVM.Options record.
| Parameter | Default | Description |
|---|---|---|
eps | 1.0 | Epsilon tube half-width. Smaller = tighter fit. |
C | 1.0 | Soft margin penalty. Larger = lower training error, less regularization. |
tol | 1E-3 | SMO convergence tolerance. |
SVM.Options opts = new SVM.Options(0.5, 10.0); // eps=0.5, C=10
SVM.Options opts = new SVM.Options(0.5, 10.0, 1E-4); // explicit tolerance
// Round-trip via Properties
Properties props = opts.toProperties();
SVM.Options restored = SVM.Options.of(props);
Kernel selection.
| Kernel | Class | Typical use |
|---|---|---|
| Linear | LinearKernel | High-dimensional sparse / text data |
| Gaussian (RBF) | GaussianKernel(sigma) | General nonlinear; most popular |
| Polynomial | PolynomialKernel(d) | Polynomial-shaped responses |
| Laplacian | LaplacianKernel(sigma) | Robust to outliers |
// Nonlinear SVR with a Gaussian kernel
KernelMachine<double[]> model = SVM.fit(x, y,
new GaussianKernel(5.0),
new SVM.Options(1.5, 100.0));
double yhat = model.predict(sample);
Properties-driven workflow:
Properties props = new Properties();
props.setProperty("smile.svm.kernel", "rbf");
props.setProperty("smile.svm.epsilon", "0.5");
props.setProperty("smile.svm.C", "100");
props.setProperty("smile.svm.tolerance", "1E-3");
Regression<double[]> model = SVM.fit(x, y, props);
LinearSVM, SparseLinearSVM, BinarySparseLinearSVMThin wrappers around a trained LinearKernelMachine that expose the weight vector and
intercept. All three return the continuous regression score — not a classification
label.
| Class | Input type | Use case |
|---|---|---|
LinearSVM | double[] | Dense linear SVR |
SparseLinearSVM | SparseArray | Sparse feature vectors (NLP) |
BinarySparseLinearSVM | int[] | Binary (0/1) sparse features |
LinearSVM model = (LinearSVM) SVM.fit(x, y, new SVM.Options(1.0, 10.0));
double[] w = model.weights(); // linear coefficient vector
double b = model.intercept(); // bias term
double y = model.predict(x); // w·x + b (continuous regression output)
KernelMachineA wrapper implementing Regression<T> around smile.model.svm.KernelMachine<T>, used
as the return type for nonlinear SVR. predict(x) computes the kernel expansion:
ŷ = Σᵢ αᵢ · K(xᵢ, x) + b
Gaussian Process Regression (GPR) provides Bayesian inference over functions: given
training data it produces both a predictive mean and a predictive variance (uncertainty
estimate). Predictions are exact (cubic time and quadratic space in n), or approximate
via Subset of Regressors (SR) / Nyström for large datasets.
// Exact GP (n < ~5000)
GaussianProcessRegression<double[]> model =
GaussianProcessRegression.fit(x, y, kernel, lambda);
// Exact with kernel hyperparameter optimization (BFGS)
GaussianProcessRegression<double[]> model =
GaussianProcessRegression.fit(x, y, kernel, lambda, true);
// Subset of Regressors (SR) approximation
double[][] inducing = ...; // m << n inducing points
GaussianProcessRegression<double[]> model =
GaussianProcessRegression.fit(x, y, inducing, kernel, lambda);
// Nyström approximation
GaussianProcessRegression<double[]> model =
GaussianProcessRegression.nystrom(x, y, inducing, kernel, lambda);
Parameters.
| Parameter | Description |
|---|---|
kernel | A MercerKernel<T> (e.g., GaussianKernel, MaternKernel). |
lambda | Noise variance σ² (regularization). Must be > 0. |
optimize | If true, optimise kernel hyperparameters via marginal log-likelihood (BFGS). |
Prediction with uncertainty:
// Predictive mean only
double yhat = model.predict(query);
// Predictive mean + variance/std
double[] estimation = new double[2]; // [0] = mean, [1] = std dev
model.predict(query, estimation);
double mean = estimation[0];
double stddev = estimation[1];
// 95% prediction interval (assuming Gaussian posterior)
double lower = mean - 1.96 * stddev;
double upper = mean + 1.96 * stddev;
Joint predictive distribution over multiple test points:
MultivariateGaussianDistribution joint = model.predict(queries);
double[] means = joint.mean();
DenseMatrix cov = joint.cov();
Scaling. Exact GP is O(n³) to train and O(n) to predict. For n > 10,000, use
SR or Nyström approximations. SR predictions are still exact functions of the inducing
set; Nyström may produce negative variance — SR is generally preferred.
Kernel hyperparameter tuning. Pass optimize = true or explicitly call
model.fit(kernel) with updated parameters:
GaussianProcessRegression<double[]> model =
GaussianProcessRegression.fit(x, y, new GaussianKernel(1.0), 0.01, true);
// The kernel parameters are automatically optimised during fit.
MLP — Multilayer PerceptronA fully connected feed-forward network trained by mini-batch stochastic back-propagation. Regression output is a single linear neuron (no activation on the output layer).
MLP model = MLP.fit(x, y,
new Layer.ReLU(128),
new Layer.ReLU(64)
);
// With optional output scaler (rescales targets for better training)
MLP model = MLP.fit(x, y, scaler,
new Layer.ReLU(128),
new Layer.ReLU(64)
);
// From Properties
MLP model = MLP.fit(x, y, props);
Layer builders (from smile.model.mlp):
| Type | Description |
|---|---|
Layer.ReLU(units) | Rectified linear — most common for regression hidden layers |
Layer.Sigmoid(units) | Logistic sigmoid |
Layer.Tanh(units) | Hyperbolic tangent |
Layer.Mish(units) | Smooth ReLU variant |
Layer.SELU(units) | Self-normalizing; works well without batch norm |
Options (via Properties)
| Property | Default | Description |
|---|---|---|
smile.mlp.learning.rate | 0.01 | SGD learning rate. |
smile.mlp.momentum | 0.0 | Momentum coefficient. |
smile.mlp.weight.decay | 0.0 | L2 weight regularization. |
smile.mlp.epochs | 10 | Number of training epochs. |
smile.mlp.mini.batch | 32 | Mini-batch size. |
Online learning:
model.update(sample, target); // single-sample SGD step
model.update(batchX, batchY); // mini-batch step
RBFNetwork — Radial Basis Function NetworkA two-layer network: the first layer computes RBF activations from a set of centres (typically found by k-means), the second layer is a linear least-squares fit.
// Automatic centres from k-means
RBFNetwork<double[]> model = RBFNetwork.fit(x, y, rbf);
// Pre-computed centres
RBFNetwork<double[]> model = RBFNetwork.fit(x, y, centers, rbf, normalized);
rbf: array of RadialBasisFunction objects (one per centre). Common choice:
GaussianRadialBasis(sigma).normalized: if true, activations are normalized to sum to 1 (partition-of-unity
property; often improves extrapolation).double yhat = model.predict(sample);
RBFNetwork does not support online updates.
| Algorithm | Input | Online | DataFrame | Notes |
|---|---|---|---|---|
| OLS | double[] via Formula | ✓ (RLS) | ✓ | Full statistical diagnostics; exact solution |
| RidgeRegression | double[] / Formula | ✓ (RLS) | ✓ | L2 penalty; handles multicollinearity |
| LASSO | double[] / Formula | ✗ | ✓ | L1 penalty; sparse coefficients |
| ElasticNet | double[] / Formula | ✗ | ✓ | L1+L2 penalty; grouped sparsity |
| GLM | Tuple via Formula | ✗ | ✓ | Non-Gaussian families; IWLS |
| GAM | Tuple via Formula | ✗ | ✓ | Non-parametric smooth terms |
| RegressionTree | Tuple via Formula | ✗ | ✓ | Interpretable; baseline only |
| RandomForest | Tuple via Formula | ✗ | ✓ | Best general-purpose; OOB error |
| GradientTreeBoost | Tuple via Formula | ✗ | ✓ | Highest accuracy; robust losses |
| SVM (linear) | double[] | ✗ | ✗ | Margin-based; ε-insensitive |
| SVM (kernel) | T (generic) | ✗ | ✗ | Nonlinear kernel; support vectors |
| SparseLinearSVM | SparseArray | ✗ | ✗ | Sparse high-dim data |
| BinarySparseLinearSVM | int[] | ✗ | ✗ | Binary feature vectors |
| GaussianProcessRegression | T (generic) | ✗ | ✗ | Uncertainty estimates; O(n³) |
| MLP | double[] | ✓ | ✗ | Deep learning; highly expressive |
| RBFNetwork | T (generic) | ✗ | ✗ | Two-layer; k-means centres |
All models implement java.io.Serializable. Use smile.io.Read and smile.io.Write:
import smile.io.Write;
import smile.io.Read;
Write.object(model, Path.of("model.ser"));
Regression<?> loaded = (Regression<?>) Read.object(Path.of("model.ser"));
Every algorithm with tunable hyperparameters exposes a fit(…, Properties) overload
and a nested Options record with toProperties() / of(Properties) for serialization:
// Save configuration
Properties props = new SVM.Options(0.5, 10.0).toProperties();
try (FileWriter fw = new FileWriter("svm.properties")) { props.store(fw, null); }
// Restore and train
Properties loaded = new Properties();
try (FileReader fr = new FileReader("svm.properties")) { loaded.load(fr); }
Regression<double[]> model = SVM.fit(x, y, loaded);
Random sampling (random forests, GBT subsampling, k-means RBF centres) depends on the global RNG. Seed it before training:
smile.math.MathEx.setSeed(12345);
RandomForest model = RandomForest.fit(formula, data);
Only OLS (via RLS), RidgeRegression (via RLS when V is set), and MLP support
online updates. Always check before calling:
if (model.online()) {
model.update(newX, newY);
}
Scale-invariant models (trees, random forests, GBT) work well without normalization. Scale-sensitive models require it:
| Scale-insensitive | Scale-sensitive (normalize first) |
|---|---|
RegressionTree, RandomForest, GradientTreeBoost | OLS, RidgeRegression, LASSO, ElasticNet |
| — | SVM, GaussianProcessRegression, RBFNetwork, MLP |
// Standardize columns in-place before fitting SVM
MathEx.standardize(x);
Regression<double[]> model = SVM.fit(x, y, new SVM.Options(1.0, 10.0));
| Situation | Recommendation |
|---|---|
| All predictors relevant, multicollinearity present | RidgeRegression |
| Sparse signal — most predictors irrelevant | LASSO |
| Correlated predictor groups, sparse overall | ElasticNet |
| n < p (under-determined system) | LASSO or ElasticNet |
| Situation | Recommendation |
|---|---|
| Fast baseline, interpretable rules | RegressionTree |
| Balanced accuracy + speed + OOB estimate | RandomForest |
| Highest accuracy, can tune | GradientTreeBoost |
| Robustness to outliers | GradientTreeBoost with Huber or LeastAbsoluteDeviation loss |
| Response type | Family | Link |
|---|---|---|
| Continuous, additive noise | Gaussian | identity |
| Positive continuous, multiplicative noise | Gaussian | log |
| Proportions or rates | Gaussian | inverse |
| Binary 0/1 | Bernoulli | logit |
| Grouped binary (k successes out of n) | Binomial | logit |
| Non-negative integer counts | Poisson | log |
| n | Recommendation |
|---|---|
| n ≤ 1,000 | Exact GP; optimise kernel hyperparameters |
| 1,000 < n ≤ 10,000 | Exact GP without hyperparameter optimization |
| n > 10,000 | Subset of Regressors (SR) approximation; choose m ≈ √n |
| n > 100,000 | Consider switching to GBT or MLP |
// Coefficient p-values (identify significant predictors)
LinearModel ols = OLS.fit(formula, data);
double[][] ttest = ols.ttest();
for (int i = 0; i < ttest.length; i++) {
if (ttest[i][3] < 0.05) {
System.out.printf("%-20s coef=%+.4f p=%.4f%n",
ols.schema().field(i).name(), ttest[i][0], ttest[i][3]);
}
}
// GLM coefficient interpretation (logit scale for Bernoulli/Binomial)
GLM glm = GLM.fit(formula, data, Bernoulli.logit());
double[][] ztest = glm.ztest();
for (int i = 0; i < ztest.length; i++) {
double logOddsRatio = ztest[i][0];
double oddsRatio = Math.exp(logOddsRatio);
System.out.printf("OR=%.3f p=%.4f%n", oddsRatio, ztest[i][3]);
}
| Concern | Guidance |
|---|---|
n > 1M rows | RandomForest, GradientTreeBoost; avoid exact GP. |
p > n (wide data) | LASSO, ElasticNet, SparseLinearSVM. |
| Streaming / incremental | MLP (online), OLS with RLS option. |
| Sparse high-dimensional text | SparseLinearSVM, BinarySparseLinearSVM. |
| Uncertainty quantification | GaussianProcessRegression (exact or SR). |
| Interpretable model | OLS, RidgeRegression, LASSO, RegressionTree. |
SMILE — Copyright © 2010–2026 Haifeng Li. GNU GPL licensed.