SMILE — Vector Quantization

Overview
The VectorQuantizer Interface
BIRCH — CF-Tree Quantizer
Neural Gas
Growing Neural Gas
NeuralMap
Self-Organizing Map (SOM)
The smile.vq.hebb Package
Neighborhood Functions (SOM)
Algorithm Comparison and Selection Guide
Common Patterns and Tips

1. Overview

Vector quantization (VQ) is a lossy compression technique that maps an input vector to the nearest element of a finite set of code vectors (also called prototypes or centroids). Each algorithm in smile.vq learns these prototypes from data through online, competitive learning: prototypes compete for each incoming signal, and the winner (or winners) are updated to better represent that signal.

All algorithms in this package implement the VectorQuantizer interface and support incremental (online) learning via update(double[] x). This makes them suitable for streaming data, large datasets that do not fit in memory, or scenarios where the data distribution may shift over time.

The package contains five quantizer implementations:

Class	Topology	Fixed size?	Key strength
`BIRCH`	CF-tree (hierarchical)	No	Fast, memory-bounded, handles very large datasets
`NeuralGas`	None (soft competition)	Yes	Robust convergence, good coverage
`GrowingNeuralGas`	Graph (dynamic)	No	Adapts topology and size to data
`NeuralMap`	Graph (dynamic, radius-gated)	No	Memory-efficient, noise-resistant
`SOM`	2-D rectangular lattice	Yes	Topology-preserving, visual interpretation

2. The `VectorQuantizer` Interface

java

public interface VectorQuantizer extends Serializable {
    int OUTLIER = Integer.MAX_VALUE;

    void update(double[] x);

    double[] quantize(double[] x);
}

`update(double[] x)`

Presents a new data point to the model. The model updates its internal prototype(s) using competitive learning. Calling update repeatedly over the training set (typically for multiple epochs in random order) trains the model.

`quantize(double[] x)`

Returns the weight vector of the nearest prototype for a given input. For all implementations this is an internal reference — do not modify the returned array, or you will corrupt the model state. If you need the value for further computation, copy it first:

java

double[] proto = quantizer.quantize(x);
double[] copy = proto.clone();

`OUTLIER`

The constant Integer.MAX_VALUE is reserved as a label for outliers or noise in higher-level clustering algorithms that build on top of VQ models. The VQ algorithms themselves do not use it internally.

Serialization

All implementations are Serializable, so a trained model can be saved and restored with standard Java serialization.

3. BIRCH — CF-Tree Quantizer

Algorithm

BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) builds a CF-tree — a height-balanced tree of Clustering Features (CF). Each CF is a compact statistical summary (N, LS, SS):

N: number of data points in the sub-cluster
LS: linear sum of the points (sum of each coordinate)
SS: square sum of the points (sum of squared coordinates)

From these three statistics the centroid and radius of a sub-cluster can be computed exactly and cheaply. When a new data point arrives, it is absorbed into the closest leaf CF if the resulting radius stays within the threshold T; otherwise a new CF is created. Leaves that exceed capacity L are split; internal nodes that exceed branching factor B are likewise split, growing the tree.

quantize descends the tree to the nearest leaf CF and returns its centroid.

Construction

java

BIRCH(int d, int B, int L, double T)

Parameter	Meaning	Typical range
`d`	Dimensionality of data	—
`B`	Maximum children per internal node	5–50
`L`	Maximum CFs per leaf node	5–50
`T`	Radius threshold for absorbing a point into an existing CF	data-dependent

Constraints: B ≥ 2, L ≥ 2, T > 0.

The radius threshold T is the most important parameter. A smaller T creates more, finer sub-clusters; a larger T creates fewer, coarser ones. Inspect your data's pairwise distances to calibrate: T should be on the order of the expected intra-cluster spread.

Usage

java

// 256-dimensional data (e.g. image patches), branching=5, leaf capacity=5, radius=6.0
BIRCH birch = new BIRCH(256, 5, 5, 6.0);

// Online ingestion — single pass over training data
for (double[] xi : trainingData) {
    birch.update(xi);
}

// Quantize new points
double[] prototype = birch.quantize(queryPoint);

// Retrieve all leaf-level sub-cluster centroids for downstream clustering
double[][] centroids = birch.centroids();

`centroids()`

Returns the centroid of every CF at the leaf level. These centroids can be fed into a hierarchical clustering algorithm (e.g., smile.clustering.HierarchicalClustering) to produce the final cluster assignments in a second pass — this is phase 2 of the original BIRCH paper.

Key properties

Single-pass: one scan through the data is enough to build the tree.
Memory-bounded: tree depth is controlled by T, B, and L; BIRCH will never create more leaf CFs than the data variance requires given the radius constraint.
Insertions are O(log n) in the height of the tree.
quantize returns a new array (the centroid is computed on the fly from the CF statistics), so it is safe to modify.

4. Neural Gas

Algorithm

Neural Gas (Martinetz & Schulten, 1991–1994) uses soft competition: for every input signal x, all neurons are ranked by distance and updated, with adaptation strength decaying exponentially with rank. The neuron closest to x (rank 0) moves the most; the neuron furthest away (rank N−1) moves the least. Over time, the prototypes distribute themselves to match the data density.

Additionally, a competitive Hebbian edge is drawn between the best-matching unit (BMU) and the second-best-matching unit at each step. These edges record which neurons are topologically adjacent in data space. Old edges can be pruned via network().

The adaptation rule for rank k at iteration t is:

Δw_k = α(t) · exp(−k / θ(t)) · (x − w_k)

where α(t) is the learning rate and θ(t) is the neighbourhood width, both typically decaying over time.

Construction

java

NeuralGas(double[][] neurons, TimeFunction alpha, TimeFunction theta, TimeFunction lifetime)

Parameter	Meaning
`neurons`	Initial weight vectors (at least 2); commonly drawn with `CentroidClustering.seeds`
`alpha`	Learning-rate schedule `α(t)`
`theta`	Neighbourhood-width schedule `θ(t)`
`lifetime`	Edge lifetime: edges older than `lifetime(t)` steps are zeroed by `network()`

TimeFunction is a @FunctionalInterface in smile.util.function. Common schedules:

java

TimeFunction.constant(value)              // fixed value
TimeFunction.exp(initialValue, decayTime) // exponential decay: v0 * exp(-t / T)

Usage

java

double[][] x = trainingData;
int epochs = 20;
int T = x.length * epochs;

// 400 prototypes seeded by k-means++
double[][] seeds = CentroidClustering.seeds(x, 400);
NeuralGas gas = new NeuralGas(
        seeds,
        TimeFunction.exp(0.3, T / 2.0),   // alpha: 0.3 → ~0 over half the run
        TimeFunction.exp(30,  T / 8.0),   // theta: 30 → ~0 over one eighth of run
        TimeFunction.constant(x.length * 2) // edges alive for ~2 epochs
);

for (int epoch = 0; epoch < epochs; epoch++) {
    for (int j : MathEx.permutate(x.length)) {
        gas.update(x[j]);
    }
}

// Access trained prototypes in original index order
double[][] prototypes = gas.neurons();

// Retrieve topology graph (prunes expired edges)
Graph graph = gas.network();

// Quantize
double[] nearest = gas.quantize(queryPoint);

`neurons()`

Returns the weight vectors sorted by original index (the order in which they were passed to the constructor). This ordering is stable across calls.

`network()`

Returns the AdjacencyMatrix of competitive Hebbian edges. Edges whose timestamp is more than lifetime(t) steps old have their weight zeroed. The returned graph can be used for topology analysis or to identify disconnected clusters.

Key properties

Fixed number of prototypes: set at construction time.
Robust, smooth convergence due to soft competition (no sharp winner-takes-all).
Requires schedule tuning: alpha and theta must decay to near-zero over the training run for good results.
Parallelism: distance computation uses IntStream.parallel() internally.

5. Growing Neural Gas

Algorithm

Growing Neural Gas (GNG, Fritzke 1995) extends Neural Gas with the ability to dynamically add and remove neurons during training. It uses hard competition (only BMU and second-best are updated) combined with competitive Hebbian edges and age-based edge pruning.

Key additions over Neural Gas:

Local error accumulation: each neuron tracks a running sum of its squared distances to signals it wins.
Edge aging: every edge emanating from the BMU is aged after each step; the edge to the second-best unit is refreshed (age=0).
Edge pruning: edges older than edgeLifetime are removed. Neurons that become isolated (no edges) are also removed.
Neuron insertion: every lambda signals, a new neuron is inserted halfway between the neuron with the highest accumulated error (q) and its neighbor with the highest error (f). The edge (q, f) is replaced by edges (q, r) and (r, f).
Global error decay: all error variables are multiplied by beta after every signal.

Construction

Two constructors are provided:

java

// Default hyperparameters
GrowingNeuralGas(int d)

// Full control
GrowingNeuralGas(int d, double epsBest, double epsNeighbor,
                 int edgeLifetime, int lambda, double alpha, double beta)

Parameter	Default	Meaning
`d`	—	Dimensionality
`epsBest`	0.2	Learning rate for BMU
`epsNeighbor`	0.006	Learning rate for BMU's neighbors
`edgeLifetime`	50	Maximum edge age before pruning
`lambda`	100	Neuron insertion interval (signals between insertions)
`alpha`	0.5	Error reduction factor for `q` and `f` after insertion
`beta`	0.995	Global error decay factor (applied after every signal)

Bootstrap behavior

The first two update calls do not trigger learning — they simply add the first two neurons. Learning begins from the third call onward. A neuron insertion at a lambda boundary is skipped if the highest-error neuron has no topological neighbors (a rare but safe condition).

Usage

java

GrowingNeuralGas model = new GrowingNeuralGas(x[0].length);

for (int epoch = 0; epoch < 10; epoch++) {
    for (int j : MathEx.permutate(x.length)) {
        model.update(x[j]);
    }
    System.out.printf("%d neurons after epoch %d%n", model.neurons().length, epoch + 1);
}

// Inspect the learned prototypes and topology
Neuron[] neurons = model.neurons();
for (Neuron n : neurons) {
    double[] w = n.w;        // prototype weight vector
    double error = n.counter; // accumulated local error
    List<Edge> nbrs = n.edges; // topological neighbors
}

double[] nearest = model.quantize(queryPoint);

Hyperparameter guidance

Goal	Adjustment
Fewer, coarser prototypes	Increase `lambda`, decrease `edgeLifetime`
More, finer prototypes	Decrease `lambda`, increase `edgeLifetime`
Faster convergence	Increase `epsBest` and `epsNeighbor`
Smoother topology	Decrease `epsBest`, increase `edgeLifetime`
Adapt to drifting distributions	Smaller `beta` (faster error decay), smaller `edgeLifetime`

Key properties

No need to specify the number of prototypes in advance.
Topology is encoded in the edge graph: disconnected components correspond to separate clusters.
Uses HeapSelect for efficient O(n) BMU+second-best search.

6. NeuralMap

Algorithm

NeuralMap is a hybrid algorithm combining ideas from GNG and BIRCH. Like GNG it uses competitive Hebbian learning and edge aging. Unlike GNG, neuron creation is controlled by a distance threshold r rather than a fixed insertion interval:

If the nearest existing neuron is farther than r, a new neuron is created at x.
If only the nearest is within r but not the second-nearest, a new neuron is created at x and connected to the nearest.
If both the nearest and second-nearest are within r, the normal Hebbian update runs: the nearest neuron is moved toward x, its neighbors are lightly updated, the edge to the second-nearest is refreshed, and stale edges are pruned.

A freshness counter (updated by counter += 1 for the winning neuron, decayed by counter *= beta for all neurons after each step) tracks how recently each neuron has been active. The clear(double eps) method removes neurons whose freshness falls below eps and also prunes any remaining stale edges.

Construction

java

NeuralMap(int d, double r, double epsBest, double epsNeighbor, int edgeLifetime, double beta)

Parameter	Meaning
`d`	Dimensionality
`r`	Distance threshold for neuron activation
`epsBest`	Learning rate for the nearest neuron
`epsNeighbor`	Learning rate for topological neighbors
`edgeLifetime`	Maximum edge age before pruning
`beta`	Global freshness decay per step (0 < beta ≤ 1)

Usage

java

// 256-dimensional signals; neurons must be within radius 8 to be activated
NeuralMap model = new NeuralMap(256, 8.0, 0.01, 0.002, 50, 0.995);

for (int epoch = 0; epoch < 5; epoch++) {
    for (int j : MathEx.permutate(x.length)) {
        model.update(x[j]);
    }
    // Prune stale neurons (freshness < 1e-7) and old edges
    model.clear(1E-7);
    System.out.printf("%d neurons after epoch %d%n", model.neurons().length, epoch + 1);
}

double[] nearest = model.quantize(queryPoint);

`clear(double eps)`

Removes all neurons whose counter < eps AND whose edge list is empty (either because they were never connected, or all their edges were pruned). Neurons with counter ≥ eps also have their edges pruned of any that exceed edgeLifetime. Call clear between epochs to remove noise and keep the model compact.

Important: eps must be non-negative. Passing a negative value throws IllegalArgumentException.

Key properties

The radius r provides natural noise resistance: isolated spurious signals far from any existing neuron create their own neuron, but that neuron quickly becomes stale and is removed by clear.
Unlike GNG, the network size is bounded by how many distinct r-balls cover the data — not by a fixed insertion rate.
Suitable for streaming/online settings where the data distribution may have outliers.

7. Self-Organizing Map (SOM)

Algorithm

The Self-Organizing Map (Kohonen, 2000) organizes a fixed set of neurons on a 2-D rectangular lattice. Each neuron has a grid position (i, j) and a weight vector w in input space. Training proceeds as follows:

Find the best matching unit (BMU): the neuron whose weight is closest to the input x.
Update all neurons toward x, with the update magnitude controlled by both the learning rate α(t) and a neighborhood function θ(di, dj, t) where (di, dj) is the lattice displacement from the BMU:

Δw = α(t) · θ(i − i_bmu, j − j_bmu, t) · (x − w)

As training progresses, α and the neighborhood width shrink, causing the map to first organize globally and then fine-tune locally.

The result is a map where nearby neurons respond to similar inputs — a topology-preserving projection of the input space onto the 2-D lattice.

Construction

java

SOM(double[][][] neurons, TimeFunction alpha, Neighborhood theta)

Parameter	Meaning
`neurons`	Initial `[nrow][ncol][d]` weight array
`alpha`	Learning-rate schedule `α(t)`
`theta`	Neighborhood function

All rows must have the same number of columns; all weight vectors must have the same length d. The lattice must be non-empty and have non-zero dimensionality.

Lattice initialization with `SOM.lattice`

A good initialization seeds the lattice by projecting k-means++ seeds onto a 2-D MDS embedding, which preserves the global data structure from the start:

java

double[][][] lattice = SOM.lattice(nrow, ncol, trainingData);

This uses CentroidClustering.seeds for the initial seeds, then a pairwise distance matrix and MDS.fit to compute 2-D coordinates. The MDS coordinates are sorted so that the row and column grid positions respect the embedded distances. This is the recommended way to initialize large SOMs.

Usage

java

int epochs = 20;
int nrow = 20, ncol = 20; // 400-neuron map

double[][][] lattice = SOM.lattice(nrow, ncol, trainingData);

SOM som = new SOM(
        lattice,
        TimeFunction.constant(0.1),
        Neighborhood.Gaussian(1.0, trainingData.length * epochs / 4.0)
);

for (int epoch = 0; epoch < epochs; epoch++) {
    for (int j : MathEx.permutate(trainingData.length)) {
        som.update(trainingData[j]);
    }
}

// Access the learned lattice (internal references, do not modify)
double[][][] learned = som.neurons(); // [nrow][ncol][d]

// Compute the U-matrix for visualization
double[][] umatrix = som.umatrix(); // [nrow][ncol]

// Quantize
double[] nearest = som.quantize(queryPoint);

U-matrix

umatrix() computes the Unified Distance Matrix: for each neuron, its value is the maximum Euclidean distance to its orthogonal (4-connected) grid neighbors. Large U-matrix values indicate cluster boundaries; low values indicate dense regions. The U-matrix is the standard way to visualize SOM cluster structure as a heat map.

umatrix[i][j] = max distance from neuron (i,j) to its grid neighbors

Corner and edge cells only consider their existing neighbors (the grid is not toroidal).

Neighborhood functions

Two built-in neighborhood functions are provided (see Section 9):

Function	Behavior
`Neighborhood.bubble(radius)`	Flat 1.0 within Chebyshev distance < `radius`, 0 outside
`Neighborhood.Gaussian(sigma, T)`	Gaussian, width decays from `sigma` to ~0 over `T` iterations

The Gaussian neighborhood is recommended for good maps; the bubble function is faster and a reasonable approximation.

Key properties

Fixed-size 2-D topology: the map structure is determined at construction time.
Topology-preserving: similar inputs activate neighboring neurons.
The neurons() method returns internal weight references — do not modify the returned arrays.
Neurons are only skipped in update if α(t) * θ < 1E-5 (the internal tol threshold), making updates efficient for large lattices.
Update is parallelized over the full neuron array via Arrays.stream(neurons).parallel().

8. The `smile.vq.hebb` Package

The sub-package smile.vq.hebb provides the graph primitives used by GNG and NeuralMap.

`Neuron`

java

public class Neuron implements Comparable<Neuron>, Serializable {
    public final double[] w;        // weight / prototype vector
    public final List<Edge> edges;  // topological neighbors (undirected edges stored bidirectionally)
    public transient double distance; // last computed distance to an input signal
    public double counter;          // local counter (error accumulation or freshness)
}

Key methods:

Method	Description
`update(double[] x, double eps)`	`w += eps * (x − w)` — moves the prototype toward `x`
`distance(double[] x)`	Sets `this.distance = MathEx.distance(w, x)`
`addEdge(Neuron neighbor)`	Adds an undirected edge with age 0
`addEdge(Neuron neighbor, int age)`	Adds an edge with a specified initial age
`removeEdge(Neuron neighbor)`	Removes the edge to `neighbor`
`setEdgeAge(Neuron neighbor, int age)`	Updates the age of the edge to `neighbor`
`age()`	Increments the age of all emanating edges by 1
`compareTo(Neuron o)`	Orders by `distance` (ascending) — used with `HeapSelect`

distance is declared transient so it is not serialized (it is a working variable, not persistent state).

`Edge`

java

public class Edge implements Serializable {
    public final Neuron neighbor; // the connected neuron
    public int age;               // number of steps since this edge was last refreshed
}

Edges are stored bidirectionally: if neuron A has an edge to B, B also has an edge to A. The algorithms maintain this invariant themselves (addEdge/removeEdge must be called on both endpoints when creating or removing a connection).

9. Neighborhood Functions (SOM)

Neighborhood is a @FunctionalInterface:

java

double of(int i, int j, int t)

i and j are the row/column displacements from the BMU (signed); t is the current iteration count.

Bubble neighborhood

java

Neighborhood.bubble(int radius)
// Returns 1.0 if |i| < radius AND |j| < radius, else 0.0

All neurons within the Chebyshev square of side 2*radius - 1 centered on the BMU are updated equally. The boundary itself (distance exactly radius) is not included (strict inequality).

java

Neighborhood bubble = Neighborhood.bubble(2);
bubble.of(0, 0, t)  // → 1.0 (winner)
bubble.of(1, 1, t)  // → 1.0 (inside)
bubble.of(2, 0, t)  // → 0.0 (on boundary, excluded)

Gaussian neighborhood

java

Neighborhood.Gaussian(double sigma, double T)
// Returns exp(−0.5 * (i² + j²) / s²) where s = sigma * exp(−t / T)

The neighbourhood width shrinks exponentially from sigma to near zero over T iterations. The winner always evaluates to exactly 1.0 (displacement 0,0 gives exponent 0).

java

// Gaussian that shrinks over 8000 iterations
Neighborhood gauss = Neighborhood.Gaussian(2.0, 8000.0);
gauss.of(0, 0, 0)    // → 1.0
gauss.of(1, 0, 0)    // → exp(-0.5 / 4) ≈ 0.882
gauss.of(1, 0, 4000) // → narrower, smaller value

Custom neighborhood functions

Because Neighborhood is a functional interface, you can provide any lambda:

java

// Linearly decaying step function
int initialRadius = 5;
Neighborhood custom = (i, j, t) -> {
    int radius = Math.max(1, initialRadius - t / 1000);
    return Math.abs(i) < radius && Math.abs(j) < radius ? 1.0 : 0.0;
};

10. Algorithm Comparison and Selection Guide

Quantization error on USPS handwritten digits (256-dimensional, ~7,000 training, ~2,000 test)

These figures are from the integration test suite with fixed seed 19650218:

Algorithm	Config	Train error	Test error
K-Means (baseline)	k=400	5.829	6.631
Neural Gas	400 neurons, 20 epochs	5.699	6.531
Growing Neural Gas	default params, 10 epochs	5.593	6.432
NeuralMap	r=8, 5 epochs + clear	6.023	6.939
SOM	20×20 grid, 20 epochs	—	6.582
BIRCH	B=5, L=5, T=6	5.902	7.350

Lower is better. Errors are mean Euclidean distances.

Decision guide

Use BIRCH when:

You have very large datasets and need a single fast pass.
Memory is constrained (the CF-tree is compact).
You plan to run hierarchical or k-means clustering on the resulting centroids (two-phase BIRCH).
You don't need to control the exact number of output centroids ahead of time.

Use Neural Gas when:

You know the number of prototypes you need.
You want reliable, smooth convergence without topology constraints.
You need the competition Hebbian topology graph for downstream analysis.
You have sufficient time for multi-epoch training.

Use Growing Neural Gas when:

You don't know the right number of prototypes.
The data has unknown or complex cluster structure.
You want the topology graph to emerge naturally.
Prototype count flexibility is more important than training speed.

Use NeuralMap when:

Data arrives as a stream with possible outliers or noise.
You want radius-based noise rejection (sporadic far-outlier signals don't corrupt existing prototypes).
You need a compact, periodically pruned model (call clear between epochs).
Memory footprint matters more than training throughput.

Use SOM when:

Topology preservation is a goal — you want similar inputs to map to adjacent neurons.
You want to visualize the data structure via the U-matrix.
You have a fixed budget for the number of neurons arranged in a 2-D grid.
The downstream task benefits from the ordered, regular layout (e.g., navigable maps, colorable grids).

11. Common Patterns and Tips

Multi-epoch training with random shuffling

All online VQ algorithms benefit from processing data in a different random order each epoch, which avoids artifacts from fixed ordering:

java

for (int epoch = 0; epoch < numEpochs; epoch++) {
    for (int j : MathEx.permutate(x.length)) {
        model.update(x[j]);
    }
}

MathEx.permutate(n) returns a random permutation of 0..n-1.

Computing mean quantization error

java

double error = 0.0;
for (double[] xi : testData) {
    double[] proto = model.quantize(xi);
    error += MathEx.distance(xi, proto);
}
error /= testData.length;
System.out.printf("Mean quantization error = %.4f%n", error);

Do not modify the returned prototype

quantize and neurons() return internal references for GNG, NeuralMap, NeuralGas, and SOM. Modifying them corrupts the model:

java

// Dangerous — modifies the internal weight vector:
double[] proto = gas.quantize(x);
proto[0] = 0.0; // BUG

// Safe:
double[] proto = gas.quantize(x).clone();

BIRCH's quantize is safe: it returns a freshly computed centroid array.

Seeding prototypes intelligently

For Neural Gas and SOM, random uniform initialization wastes early training iterations on global reorganization. Using k-means++ seeds (CentroidClustering.seeds) or the MDS-based SOM.lattice helper gives a much better starting point:

java

// Neural Gas: seed with 200 representative points
double[][] seeds = CentroidClustering.seeds(trainingData, 200);
NeuralGas gas = new NeuralGas(seeds, alpha, theta, lifetime);

// SOM: MDS-ordered lattice
double[][][] lattice = SOM.lattice(10, 20, trainingData);
SOM som = new SOM(lattice, alpha, Neighborhood.Gaussian(1.0, T));

Tuning the learning-rate schedule for Neural Gas and SOM

A common rule of thumb for exponential schedules over N total update steps:

java

int N = trainingData.length * numEpochs;
TimeFunction alpha = TimeFunction.exp(0.3,  N / 2.0); // learning rate
TimeFunction theta = TimeFunction.exp(30.0, N / 8.0); // neighbourhood width

Both should reach near-zero by the end of training. Too large a final value leaves the map restless; too small wastes early training.

GNG / NeuralMap: monitor neuron count between epochs

java

for (int epoch = 0; epoch < 10; epoch++) {
    for (int j : MathEx.permutate(x.length)) { model.update(x[j]); }
    // NeuralMap only:
    ((NeuralMap) model).clear(1E-7);
    System.out.printf("Epoch %d: %d neurons%n", epoch, model.neurons().length);
}

If neuron count grows without bound, decrease edgeLifetime (GNG/NeuralMap) or decrease lambda (GNG) to trigger more pruning.

SOM U-matrix interpretation

java

double[][] u = som.umatrix();
// u[i][j] is large at cluster boundaries, small within dense regions.
// Use this as a heat map: high values → inter-cluster gaps.

Plot u as an image (nrow × ncol) using your preferred visualization library to identify cluster structure without labelling.

BIRCH radius threshold calibration

If you have no prior knowledge of the data scale, estimate a reasonable T from a sample:

java

double sampleDist = 0.0;
int pairs = 0;
for (int i = 0; i < Math.min(200, x.length); i++) {
    for (int j = i + 1; j < Math.min(200, x.length); j++) {
        sampleDist += MathEx.distance(x[i], x[j]);
        pairs++;
    }
}
double meanPairwise = sampleDist / pairs;
// Set T to roughly 10–20% of the mean pairwise distance
double T = meanPairwise * 0.15;
BIRCH birch = new BIRCH(d, 5, 5, T);

Serialization

All implementations are serializable. Save and restore a trained model with standard Java object streams:

java

// Save
try (ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("model.ser"))) {
    oos.writeObject(trainedModel);
}

// Restore
try (ObjectInputStream ois = new ObjectInputStream(new FileInputStream("model.ser"))) {
    VectorQuantizer restored = (VectorQuantizer) ois.readObject();
}

Note: Neuron.distance is transient and will be reset to Double.MAX_VALUE after deserialization. This is intentional — it is a working variable that is overwritten before use.

See also: smile.clustering.KMeans, smile.clustering.HierarchicalClustering, smile.manifold.MDS, smile.util.function.TimeFunction

SMILE — Vector Quantization

SMILE — Vector Quantization

Table of Contents

1. Overview

2. The VectorQuantizer Interface

update(double[] x)

quantize(double[] x)

OUTLIER

Serialization

3. BIRCH — CF-Tree Quantizer

Algorithm

Construction

Usage

centroids()

Key properties

4. Neural Gas

Algorithm

Construction

Usage

neurons()

network()

Key properties

5. Growing Neural Gas

Algorithm

Construction

Bootstrap behavior

Usage

Hyperparameter guidance

Key properties

6. NeuralMap

Algorithm

Construction

Usage

clear(double eps)

Key properties

7. Self-Organizing Map (SOM)

Algorithm

Construction

Lattice initialization with SOM.lattice

Usage

U-matrix

Neighborhood functions

Key properties

8. The smile.vq.hebb Package

Neuron

Edge

9. Neighborhood Functions (SOM)

Bubble neighborhood

Gaussian neighborhood

Custom neighborhood functions

10. Algorithm Comparison and Selection Guide

Quantization error on USPS handwritten digits (256-dimensional, ~7,000 training, ~2,000 test)

Decision guide

11. Common Patterns and Tips

Multi-epoch training with random shuffling

Computing mean quantization error

Do not modify the returned prototype

Seeding prototypes intelligently

Tuning the learning-rate schedule for Neural Gas and SOM

GNG / NeuralMap: monitor neuron count between epochs

SOM U-matrix interpretation

BIRCH radius threshold calibration

Serialization

2. The `VectorQuantizer` Interface

`update(double[] x)`

`quantize(double[] x)`

`OUTLIER`

`centroids()`

`neurons()`

`network()`

`clear(double eps)`

Lattice initialization with `SOM.lattice`

8. The `smile.vq.hebb` Package

`Neuron`

`Edge`