Back to Smile

Statistical Machine Intelligence & Learning Engine

README.md

6.1.023.1 KB
Original Source

Statistical Machine Intelligence & Learning Engine

SMILE (Statistical Machine Intelligence & Learning Engine) is a comprehensive, high-performance machine learning framework for the JVM. SMILE v5+ requires Java 25; v4.x requires Java 21; all previous versions require Java 8. SMILE also provides idiomatic APIs for Scala and Kotlin. With advanced data structures and algorithms, SMILE delivers state-of-the-art performance across every aspect of machine learning.


Table of Contents

  1. Features
  2. Module Map
  3. Installation
  4. Quick Start
  5. SMILE Studio & Shell
  6. Model Serialization
  7. Visualization
  8. License
  9. Issues & Discussions
  10. Contributing
  11. Maintainers
  12. Gallery

Features

AreaHighlights
LLMLLaMA-3 inference, tiktoken BPE tokenizer, OpenAI-compatible REST server, SSE chat streaming
Deep LearningLibTorch/GPU backend, EfficientNet-V2 image classification, custom layer API
ClassificationSVM, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, Logistic Regression, Neural Networks, RBF Networks, MaxEnt, KNN, Naïve Bayes, LDA/QDA/RDA
RegressionSVR, Gaussian Process, Regression Trees, GBDT, Random Forest, RBF, OLS, LASSO, ElasticNet, Ridge
ClusteringBIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical, SIB, SOM, Spectral, Min-Entropy
Manifold LearningIsoMap, LLE, Laplacian Eigenmap, t-SNE, UMAP, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection, ICA
Feature EngineeringGenetic Algorithm selection, Ensemble selection, TreeSHAP, SNR, Sum-Squares ratio, data transformations, formula API
NLPSentence / word tokenization, Bigram test, Phrase & Keyword extraction, Stemmer, POS tagging, Relevance ranking
Association RulesFP-growth frequent itemset mining
Sequence LearningHidden Markov Model, Conditional Random Field
Nearest NeighborBK-Tree, Cover Tree, KD-Tree, SimHash, LSH
Numerical MethodsLinear algebra, numerical optimization (BFGS, L-BFGS), interpolation, wavelets, RBF, distributions, hypothesis tests
VisualizationSwing plots (scatter, line, bar, box, histogram, surface, heatmap, contour, …) and declarative Vega-Lite charts

Module Map

Each module has its own detailed user guide. Click the README link for the module overview, or drill into individual topic guides.

base/ — Foundation

Data structures, math, linear algebra, statistical utilities, I/O

DocumentTopics
READMEModule overview and dependency setup
DATA_FRAME.mdDataFrame API — creation, selection, transformation
DATA_IO.mdCSV, JSON, Parquet, Arrow, JDBC, Avro readers/writers
DATA_TRANSFORMATION.mdScalers, encoders, imputers, feature transforms
DATASET.mdBuilt-in benchmark and real-world datasets
FORMULA.mdR-style formula language for model matrices
DISTRIBUTIONS.mdProbability distributions (Normal, Poisson, Beta, …)
HYPOTHESIS_TESTING.mdt-test, chi-squared, ANOVA, KS-test, …
DISTANCES.mdEuclidean, Mahalanobis, Hamming, edit distance, …
NEAREST_NEIGHBOR.mdKD-Tree, Cover Tree, BK-Tree, LSH
KERNELS.mdGaussian, polynomial, Laplacian, and other kernel functions
RBF.mdRadial basis function networks
INTERPOLATION.mdLinear, cubic spline, bilinear, bicubic
GRAPH.mdAdjacency list/matrix graph, BFS/DFS, spanning trees
SORT.mdQuick sort, heap sort, counting sort, index sort
HASH.mdLocality-sensitive hashing, SimHash
RNG.mdRandom number generators, sampling, permutations
BFGS.mdL-BFGS and BFGS numerical optimizers
ICA.mdIndependent Component Analysis
TENSOR.mdN-dimensional array (CPU tensor without LibTorch)
WAVELET.mdDWT, CWT, and wavelet families
GAP.mdGAP statistic for optimal cluster count estimation
COMPRESSED_SENSING.mdCompressed sensing and basis pursuit

core/ — Machine Learning Algorithms

Classification, regression, clustering, manifold learning, and more

DocumentTopics
READMEModule overview
CLASSIFICATION.mdSVM, Random Forest, AdaBoost, GBDT, KNN, Naïve Bayes, LDA, …
REGRESSION.mdSVR, Gaussian Process, LASSO, Ridge, ElasticNet, GBDT, …
CLUSTERING.mdK-Means, DBSCAN, BIRCH, SOM, Spectral Clustering, …
FEATURE_ENGINEERING.mdFeature selection, PCA, ICA, projection, encoding
MANIFOLD.mdt-SNE, UMAP, IsoMap, LLE, Laplacian Eigenmap
ANOMALY_DETECTION.mdIsolationForest, one-class SVM, local outlier factor
ASSOCIATION_RULE_MINING.mdFP-growth, association rules, frequent itemsets
SEQUENCE.mdHMM (Baum-Welch, Viterbi), CRF
TIME_SERIES.mdARIMA, box-plots, autocorrelation
REGRESSION.mdFull regression API reference
TRAINING.mdCross-validation, bootstrap, hyper-parameter search
VALIDATION.mdHold-out, k-fold, leave-one-out evaluation
VALIDATION_METRICS.mdAccuracy, AUC, F1, RMSE, MAE, confusion matrix
HYPER_PARAMETER_OPTIMIZATION.mdGrid search, random search, Bayesian optimization
VECTOR_QUANTIZATION.mdLVQ, Neural Gas, SOM as vector quantizers
ONNX.mdExporting and importing models via ONNX

deep/ — Deep Learning & LLMs

LibTorch-backed GPU/CPU tensor operations, neural network layers, LLaMA-3 inference, EfficientNet

DocumentTopics
READMEFull deep-learning & LLM user guide (tensors, layers, loss, optimizer, EfficientNet, LLaMA)

The deep/README.md covers:

  • smile.deep.tensor — Tensor factory, indexing, arithmetic, AutoScope memory management, dtype/device
  • smile.deep.layer — Linear, Conv2d, pooling, normalization (BN/GN/RMS), dropout, embedding, sequential blocks
  • smile.deep.activation — ReLU, GELU, SiLU, Tanh, Sigmoid, Softmax, GLU, HardShrink, …
  • smile.deep.Loss — MSE, cross-entropy, BCE, Huber, KL, hinge, and more
  • smile.deep.Optimizer — SGD, Adam, AdamW, RMSprop
  • smile.deep.Model — Abstract base class + training loop
  • smile.deep.metric — Accuracy, Precision, Recall, F1Score with macro/micro/weighted averaging
  • smile.llmMessage, Role, FinishReason, ChatCompletion records; sinusoidal & RoPE positional encodings
  • smile.llm.tokenizerTokenizer interface, Tiktoken BPE implementation (LLaMA-3 compatible)
  • smile.llm.llama — Full LLaMA-3 stack: Llama.build(), generate(), chat(), streaming via SubmissionPublisher
  • smile.visionVisionModel, ImageDataset, EfficientNet.V2S/M/L() pretrained models, ImageNet labels
  • smile.vision.transformTransform interface, ImageClassification pipeline, resize/crop/toTensor helpers

nlp/ — Natural Language Processing

Text normalization, tokenization, POS tagging, stemming, relevance ranking

DocumentTopics
READMEModule overview
TOKENIZER.mdSentence splitter, word tokenizer, regex tokenizer
POS.mdPart-of-speech tagging (Brill tagger, HMM tagger)
STEM.mdPorter, Lancaster, Lovins stemmers; lemmatization
COLLOCATION.mdBigram/trigram statistical tests, phrase extraction
RELEVANCE.mdTF-IDF, BM25, keyword extraction
TAXONOMY.mdWordNet integration, synsets, hypernyms

plot/ — Data Visualization

Swing-based interactive plots and declarative Vega-Lite charts

DocumentTopics
READMESwing plotting API — scatter, line, bar, box, histogram, heatmap, surface, contour, wireframe
VEGA.mdDeclarative smile.plot.vega (Vega-Lite) — JSON spec generation, web/Jupyter rendering

serve/ — Inference Server

Quarkus-based REST inference service with OpenAI-compatible API and SSE streaming

DocumentTopics
READMEBuilding and running the server, /chat/completions endpoint, SSE streaming, configuration

studio/ — Interactive Shell & Desktop IDE

REPL / notebook environment for Java, Scala, and Kotlin

DocumentTopics
README.mdDesktop Studio notebook UI, cell types, output rendering
CLICLI entry points (smile, smile shell, smile scala, smile kotlin, smile server)

scala/ — Scala API

Idiomatic Scala shim — concise wrappers, symbolic operators, Scala collections integration

DocumentTopics
READMEAPI overview, smile.classification, smile.regression, smile.clustering, smile.plot in Scala

kotlin/ — Kotlin API

Idiomatic Kotlin shim — extension functions, named parameters, builder DSLs

DocumentTopics
READMEAPI overview, extension functions, Kotlin-style builders
packages.mdFull package-by-package listing of all Kotlin extension functions

json/ — JSON Library (Scala)

Lightweight zero-dependency JSON library for Scala with a clean DSL

DocumentTopics
READMEParsing, building, pattern matching, path navigation, serialization

spark/ — Apache Spark Integration

Use SMILE models inside Spark ML pipelines

DocumentTopics
READMESmileTransformer, SmileClassifier, SmileRegressor; training and scoring in Spark DataFrames

Installation

Maven

xml
<!-- Core ML algorithms -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-core</artifactId>
  <version>6.1.0</version>
</dependency>

<!-- Deep learning + LLMs (requires LibTorch) -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-deep</artifactId>
  <version>6.1.0</version>
</dependency>

<!-- Natural language processing -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-nlp</artifactId>
  <version>6.1.0</version>
</dependency>

<!-- Data visualization -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-plot</artifactId>
  <version>6.1.0</version>
</dependency>

SBT (Scala)

scala
libraryDependencies += "com.github.haifengl" %% "smile-scala" % "6.1.0"

Gradle (Kotlin)

kotlin
dependencies {
    implementation("com.github.haifengl:smile-kotlin:6.1.0")
}

Native Libraries (BLAS / LAPACK)

Several algorithms (manifold learning, Gaussian Process, MLP, some clustering) require BLAS and LAPACK.

Linux (Ubuntu / Debian)

shell
sudo apt update
sudo apt install libopenblas-dev libarpack2-dev

macOS (Homebrew)

shell
brew install arpack
# If macOS SIP strips DYLD_LIBRARY_PATH, copy the dylib to your working dir:
cp /opt/homebrew/lib/libarpack.dylib .

Windows — pre-built DLLs are included in the bin/ directory of the release package. Add that directory to PATH.

GPU (CUDA) — make sure the LibTorch CUDA native libraries are on java.library.path and that your Bytedeco pytorch classifier matches your CUDA version (e.g., linux-x86_64-gpu-cuda12.4).


Quick Start

java
import smile.classification.RandomForest;
import smile.data.formula.Formula;
import smile.io.Read;

// Load data
var data = Read.csv("src/test/resources/iris.csv");

// Train a random forest
var forest = RandomForest.fit(Formula.lhs("species"), data);

// Predict
int label = forest.predict(data.get(0));
System.out.println("Predicted class: " + label);

For deep learning and LLM examples, see deep/README.md. For visualization examples, see plot/README.md.


SMILE Studio & Shell

SMILE ships with an interactive desktop Studio (notebook-style) and a set of CLI shells. See studio/README.md for full documentation.

Download a pre-packaged release from the releases page, then:

shell
cd bin
path/to/smile/bin/setup      # install required native dependencies
path/to/smile/bin/smile      # launch SMILE Studio from your project directory

Other entry points:

CommandDescription
smileDesktop notebook IDE
smile shellJava REPL with all SMILE packages pre-imported
smile scalaScala REPL
smile trainTrain a supervised learning model
smile predictPredict on a file using a saved model
smile serveStart the LLM inference server

To increase the JVM heap:

shell
path/to/smile/bin/smile -J-Xmx30G

Model Serialization

Most SMILE models implement java.io.Serializable. You can serialize a trained model to disk and load it in a production environment or inside a Spark job:

java
// Save
try (var out = new ObjectOutputStream(new FileOutputStream("model.ser"))) {
    out.writeObject(forest);
}

// Load
try (var in = new ObjectInputStream(new FileInputStream("model.ser"))) {
    var loaded = (RandomForest) in.readObject();
}

Visualization

SMILE provides two visualization layers:

  • smile.plot.swing — Swing-based interactive 2D/3D plots. See plot/README.md.
  • smile.plot.vega — Declarative Vega-Lite charts for browsers and Jupyter. See plot/VEGA.md.
xml
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-plot</artifactId>
  <version>6.1.0</version>
</dependency>

License

SMILE employs a dual license model designed to meet the development and distribution needs of both commercial distributors (OEMs, ISVs, VARs) and open source projects. For details, see LICENSE. To acquire a commercial license, contact [email protected].


Issues & Discussions

ChannelPurpose
GitHub DiscussionsQuestions, ideas, show-and-tell
Stack Overflow [smile]Technical Q&A
Issue TrackerBug reports and feature requests
Online DocsTutorials and programming guides
Java API · Scala API · Kotlin API · Clojure APIAPI Javadoc

Contributing

Please read CONTRIBUTING.md for build and test instructions.


Maintainers


<table class="center" style="width:100%;"> <tr> <td colspan="3"> <figure> <a href="/website/src/images/splom.png"></a> <figcaption style="text-align: center;"><h3>Scatterplot Matrix</h3></figcaption> </figure> </td> </tr> <tr> <td> <figure> <a href="/website/src/images/pca.png"></a> <figcaption style="text-align: center;"><h3>Scatter Plot</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/heart.png"></a> <figcaption style="text-align: center;"><h3>Line Plot</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/surface.png"></a> <figcaption style="text-align: center;"><h3>Surface Plot</h3></figcaption> </figure> </td> </tr> <tr> <td> <figure> <a href="/website/src/images/bar.png"></a> <figcaption style="text-align: center;"><h3>Bar Plot</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/box.png"></a> <figcaption style="text-align: center;"><h3>Box Plot</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/histogram2d.png"></a> <figcaption style="text-align: center;"><h3>Histogram Heatmap</h3></figcaption> </figure> </td> </tr> <tr> <td> <figure> <a href="/website/src/images/rolling.png"></a> <figcaption style="text-align: center;"><h3>Rolling Average</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/map.png"></a> <figcaption style="text-align: center;"><h3>Geo Map</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/umap.png"></a> <figcaption style="text-align: center;"><h3>UMAP</h3></figcaption> </figure> </td> </tr> <tr> <td> <figure> <a href="/website/src/images/text.png"></a> <figcaption style="text-align: center;"><h3>Text Plot</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/contour.png"></a> <figcaption style="text-align: center;"><h3>Heatmap with Contour</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/hexmap.png"></a> <figcaption style="text-align: center;"><h3>Hexmap</h3></figcaption> </figure> </td> </tr> <tr> <td> <figure> <a href="/website/src/images/isomap.png"></a> <figcaption style="text-align: center;"><h3>IsoMap</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/umap.png"></a> <figcaption style="text-align: center;"><h3>LLE</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/gallery/smile-demo-kpca.png"></a> <figcaption style="text-align: center;"><h3>Kernel PCA</h3></figcaption> </figure> </td> </tr> <tr> <td> <figure> <a href="/website/src/gallery/smile-demo-ann.png"></a> <figcaption style="text-align: center;"><h3>Neural Network</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/gallery/smile-demo-svm.png"></a> <figcaption style="text-align: center;"><h3>SVM</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/gallery/smile-demo-agglomerative-clustering.png"></a> <figcaption style="text-align: center;"><h3>Hierarchical Clustering</h3></figcaption> </figure> </td> </tr> <tr> <td> <figure> <a href="/website/src/gallery/smile-demo-som.png"></a> <figcaption style="text-align: center;"><h3>SOM</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/gallery/smile-demo-dbscan.png"></a> <figcaption style="text-align: center;"><h3>DBSCAN</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/gallery/smile-demo-neural-gas.png"></a> <figcaption style="text-align: center;"><h3>Neural Gas</h3></figcaption> </figure> </td> </tr> <tr> <td> <figure> <a href="/website/src/gallery/smile-demo-wavelet.png"></a> <figcaption style="text-align: center;"><h3>Wavelet</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/gallery/smile-demo-mixture.png"></a> <figcaption style="text-align: center;"><h3>Exponential Family Mixture</h3></figcaption> </figure> </td> <td> <figure> <a href="/website/src/images/teapot.png"></a> <figcaption style="text-align: center;"><h3>Teapot Wireframe</h3></figcaption> </figure> </td> </tr> <tr> <td colspan="3"> <figure> <a href="/website/src/images/grid-interpolation2d.png"></a> <figcaption style="text-align: center;"><h3>Grid Interpolation</h3></figcaption> </figure> </td> </tr> </table>