base/README.md
smile-base is the foundation module of the SMILE
(Statistical Machine Intelligence and Learning Engine) library.
It provides all the data structures, mathematical primitives,
statistical utilities, and I/O facilities that the rest of SMILE
is built upon.
Package: smile.data, smile.data.type, smile.data.measure, smile.data.vector
Guide: DATA_FRAME.md
DataFrame is SMILE's primary in-memory tabular data structure — a typed,
column-oriented table with named fields, rich type metadata, and a streaming
API. Tuple is a single row of a DataFrame. The smile.data.type package
defines a complete type system (DataType, StructType, …), and
smile.data.measure provides nominal, ordinal, interval, and ratio measurement
scales. Strongly-typed column vectors (IntVector, DoubleVector,
StringVector, …) live in smile.data.vector.
DataFrame iris = Read.arff(Paths.getTestData("weka/iris.arff"));
DataFrame subset = iris.select("sepallength", "sepalwidth", "class");
double[] lengths = iris.column("sepallength").toDoubleArray();
Package: smile.data.formula
Guide: FORMULA.md
A compact, symbolic language for specifying model matrices (design matrices)
from a DataFrame. Inspired by R's formula notation.
// All predictors except class
Formula f = Formula.lhs("class");
// Explicit terms with an interaction
Formula f = Formula.of("y", "x1", "x2", "x1:x2");
double[][] X = f.x(dataFrame).toArray();
double[] y = f.y(dataFrame).toDoubleArray();
Package: smile.data.transform
Guide: DATA_TRANSFORMATION.md
Composable, serializable feature-transformation pipelines. Each transformer
is fitted on training data and then applied consistently to new data.
Includes scaling (Standardizer, MaxAbsScaler, MinMaxScaler),
encoding (OneHotEncoder, KBinsDiscretizer), imputation, and more.
var scaler = Standardizer.fit(trainDF);
DataFrame normalized = scaler.apply(trainDF);
DataFrame normalizedTest = scaler.apply(testDF);
Package: smile.datasets
Guide: DATASET.md
Ready-to-use loaders for over 30 standard machine-learning benchmark datasets (Iris, MNIST, USPS, Breast Cancer, CPU, Abalone, …). Each loader returns structured data suitable for immediate use with SMILE classifiers, regressors, or clustering algorithms.
var iris = Iris.load();
var mnist = MNIST.load();
var breastCancer = BreastCancer.load();
Package: smile.io
Guide: DATA_IO.md
Unified read/write API for many tabular and hierarchical formats:
| Format | Read | Write |
|---|---|---|
| CSV / TSV / DSV | ✓ | ✓ |
| JSON (row-oriented) | ✓ | ✓ |
| Apache Parquet | ✓ | ✓ |
| Apache Avro | ✓ | — |
| Apache Arrow | ✓ | ✓ |
| Weka ARFF | ✓ | — |
| LibSVM sparse | ✓ | — |
| Matrix Market | ✓ | — |
DataFrame df = Read.csv("data.csv");
DataFrame df = Read.parquet("data.parquet");
Write.csv(df, Paths.get("out.csv"));
Package: smile.tensor, smile.linalg
Guide: TENSOR.md
Core numerical data structures used throughout SMILE:
DenseMatrix — double-precision dense matrix with BLAS/LAPACK backendSparseMatrix — CSR compressed sparse matrixBandMatrix, SymmMatrix, Cholesky, LU, QR, SVD, EVD — decompositionsIMatrix / FloatMatrix — integer and single-precision variantsTensor — general multi-dimensional array (float32 / float64)DenseMatrix A = DenseMatrix.of(new double[][]{{1,2},{3,4}});
DenseMatrix B = A.mm(A); // matrix multiply
var svd = A.svd(); // full SVD
double[] x = A.solve(b); // least-squares solve
Package: smile.math
MathEx is a large static utility class with fast implementations of
common mathematical operations: log-sum-exp, softmax, entropy, distance
helpers, combinatorics, special functions (gamma, beta, erf, …), and
more. smile.math.special provides the underlying special-function
implementations.
double lse = MathEx.logSumExp(logProbs);
MathEx.softmax(scores);
double h = MathEx.entropy(probs);
Package: smile.math.distance
Guide: DISTANCES.md
A rich collection of distance and similarity metrics implementing the
Distance<T> interface:
| Metric | Class |
|---|---|
| Euclidean | EuclideanDistance |
| Manhattan (L1) | ManhattanDistance |
| Chebyshev (L∞) | ChebyshevDistance |
| Minkowski (Lp) | MinkowskiDistance |
| Cosine | CosineDistance |
| Mahalanobis | MahalanobisDistance |
| Hamming | HammingDistance |
| Levenshtein (edit) | EditDistance |
| Jaccard | JaccardDistance |
| Dynamic Time Warping | DynamicTimeWarping |
| Sparse vectors | SparseEuclideanDistance, SparseManhattanDistance, … |
Package: smile.math.kernel
Guide: KERNELS.md
Mercer kernel functions used in support vector machines, Gaussian processes,
and kernel methods. All implement MercerKernel<T>.
| Kernel | Class |
|---|---|
| Linear | LinearKernel |
| Polynomial | PolynomialKernel |
| Gaussian (RBF) | GaussianKernel |
| Laplacian | LaplacianKernel |
| Matérn 3/2, 5/2 | Matern32Kernel, Matern52Kernel |
| Hyperbolic Tangent | HyperbolicTangentKernel |
| Sparse variants | SparseGaussianKernel, SparseLaplacianKernel, … |
Package: smile.math.rbf
Guide: RBF.md
Radial basis functions used for interpolation, RBF networks, and other kernel methods.
| RBF | Class |
|---|---|
| Gaussian | GaussianRadialBasis |
| Multiquadric | MultiquadricRadialBasis |
| Inverse Multiquadric | InverseMultiquadricRadialBasis |
| Thin-plate spline | ThinPlateSplineRadialBasis |
Package: smile.math.random
Guide: RNG.md
High-quality pseudorandom number generators and distributions:
MersenneTwister — MT19937, the default general-purpose PRNGXoShiRo256StarStar, XoRoShiRo128StarStar — fast modern RNGsHalton, SobolSequence — low-discrepancy quasi-random sequencesvar rng = new MersenneTwister(42);
double u = rng.nextDouble();
int[] perm = rng.permutation(100);
Package: smile.math
Guide: BFGS.md
Quasi-Newton BFGS and L-BFGS unconstrained optimisers. The BFGS class
minimises a smooth, differentiable objective function given its gradient.
L-BFGS uses a limited-memory approximation suitable for high-dimensional
problems.
double[] x0 = {0.0, 0.0};
double[] xMin = BFGS.minimize(f, g, x0, 1e-6, 200);
Package: smile.cs
Guide: COMPRESSED_SENSING.md
Sparse signal recovery from underdetermined linear measurements using the Basis Pursuit (BP) and LASSO formulations. Useful for compressive imaging, sparse regression, and dictionary learning.
Package: smile.stat.distribution
Guide: DISTRIBUTIONS.md
A comprehensive library of univariate and multivariate distributions, each
implementing Distribution (or DiscreteDistribution). Every distribution
supports PDF/PMF, CDF, quantile (inverse CDF), mean, variance, entropy, and
random sampling.
Continuous: GaussianDistribution, ExponentialDistribution,
GammaDistribution, BetaDistribution, WeibullDistribution,
LogNormalDistribution, TDistribution, FDistribution,
ChiSquaredDistribution, CauchyDistribution, LogisticDistribution,
UniformDistribution, EmpiricalDistribution, KernelDensityEstimation, …
Discrete: BinomialDistribution, PoissonDistribution,
NegativeBinomialDistribution, GeometricDistribution,
HypergeometricDistribution, DiscreteUniformDistribution, …
Multivariate: MultivariateGaussianDistribution,
DirichletDistribution, MultivariateExponentialFamilyMixture, …
var gauss = new GaussianDistribution(0, 1);
double p = gauss.cdf(1.96); // ≈ 0.975
double x = gauss.quantile(0.975); // ≈ 1.96
double[] samples = gauss.rand(1000);
Package: smile.stat.hypothesis
Guide: HYPOTHESIS_TESTING.md
Parametric and non-parametric tests for comparing means, variances, distributions, and correlations.
| Test | Class / Method |
|---|---|
| One-sample t-test | TTest.test(double[] x, double μ) |
| Two-sample t-test | TTest.test(double[] x, double[] y) |
| Paired t-test | TTest.pairedTest(double[] x, double[] y) |
| F-test (variance) | FTest.test(double[] x, double[] y) |
| Chi-squared goodness-of-fit | ChiSqTest.test(int[] counts, double[] prob) |
| Chi-squared independence | ChiSqTest.test(int[][] table) |
| Kolmogorov-Smirnov | KSTest.test(double[] x, Distribution d) |
| Two-sample KS | KSTest.test(double[] x, double[] y) |
| Spearman correlation | SpearmanTest |
| Kendall's τ | KendallTest |
All test objects expose a p-value field for decision making.
Package: smile.interpolation
Guide: INTERPOLATION.md
Smooth function reconstruction from a discrete set of sample points.
1-D: LinearInterpolation, PolynomialInterpolation,
SplineInterpolation (natural cubic spline), CubicSplineInterpolation,
RBFInterpolation1D, KrigingInterpolation1D
2-D: BilinearInterpolation, BicubicInterpolation,
CubicSplineInterpolation2D, RBFInterpolation2D,
KrigingInterpolation2D, ShepardInterpolation2D
double[] x = {0, 1, 2, 3};
double[] y = {0, 1, 4, 9};
var spline = new CubicSplineInterpolation(x, y);
double v = spline.interpolate(1.5); // ≈ 2.25
Package: smile.neighbor
Guide: NEAREST_NEIGHBOR.md
Exact and approximate nearest-neighbor data structures.
| Structure | Class | Best for |
|---|---|---|
| KD-tree | KDTree | Low-dimensional Euclidean data |
| Ball tree | BallTree | Arbitrary metric spaces |
| Cover tree | CoverTree | Expandable, metric spaces |
| LSH | LSH | High-dimensional approximate NN |
| MPLSH | MPLSH | Multi-probe LSH |
All implement NearestNeighborSearch<K,V> and support k-NN and
range queries.
var kdtree = KDTree.of(data);
Neighbor<double[], double[]>[] nn = kdtree.knn(query, 5);
Package: smile.graph
Guide: GRAPH.md
Directed and undirected weighted graphs with adjacency-list and adjacency-matrix representations, plus classic graph algorithms.
AdjacencyList — sparse graph (recommended for most uses)AdjacencyMatrix — dense graphvar g = new AdjacencyList(6, false); // 6 vertices, undirected
g.addEdge(0, 1, 1.0);
int[] order = g.bfs(0);
double[] dist = g.dijkstra(0);
Package: smile.gap
Guide: GAP.md
A flexible genetic algorithm framework (GAP — Genetic Algorithm Platform)
for combinatorial and continuous optimization. Users implement a
Chromosome with fitness(), crossover(), and mutate() to define
the problem; the GeneticAlgorithm driver handles selection, recombination,
and termination.
var ga = new GeneticAlgorithm<>(population, 0.5, 0.01);
Chromosome best = ga.evolve(1000, 1e-4);
Package: smile.ica
Guide: ICA.md
FastICA for blind source separation. Recovers statistically independent components from a linear mixture of signals — widely used in signal processing, fMRI analysis, and feature extraction.
var ica = ICA.fit(X, 3); // extract 3 independent components
double[][] S = ica.transform(X);
Package: smile.hash
Guide: HASH.md
High-performance, non-cryptographic hash functions and locality-sensitive hashing (LSH) families.
| Hash | Class | Notes |
|---|---|---|
| MurmurHash3 | MurmurHash3 | 32-bit / 128-bit; fast, high quality |
| SHA-1 | SHA | Cryptographic (SHA-1 used for LSH) |
| MinHash | MinHash | Jaccard similarity LSH |
| SimHash | SimHash | Cosine similarity LSH |
| Random projection LSH | RandomProjectionHash | Euclidean LSH |
| Cross-polytope LSH | CrossPolytope | Efficient Euclidean LSH |
Package: smile.sort
Guide: SORT.md
Allocation-free, cache-friendly sorting and selection algorithms used internally throughout SMILE.
| Algorithm | Class / Method |
|---|---|
| Introsort (O(n log n)) | IntroSort |
| Heap sort | HeapSort |
| Shell sort | ShellSort |
| Quickselect (k-th element) | QuickSelect |
| Indexed sort | QuickSort (with index array) |
| Partition | QuickSelect.select(a, k) |
Package: smile.wavelet
Guide: WAVELET.md
Discrete Wavelet Transform (DWT) and Wavelet Packet Transform (WPT) for signal processing, compression, and multi-resolution analysis.
Families: Haar, Daubechies (D4–D20), Symlet (S8–S20), Coiflet (C6–C30), BestLocalized (BL14–BL20), Vaidyanathan.
double[] signal = ...;
var wavelet = new DaubechiesWavelet(4); // D4
wavelet.transform(signal); // in-place DWT
wavelet.inverse(signal); // reconstruction
# Build the module
./gradlew :base:build
# Run tests
./gradlew :base:test
# Generate Javadoc
./gradlew :base:javadoc
Dependencies: Java 25+, BLAS/LAPACK native libraries (optional — pure-Java fallback included).
SMILE — © 2010-2026 Haifeng Li. GNU GPL licensed.