SMILE — Independent Component Analysis (ICA) User Guide

Overview
Quick Start
Mathematical Background
Contrast Functions
Hyperparameters
- Options Record
- Persisting Options with Properties
Input Data Layout
Working with the Result
Practical Guidance
Complete Examples
ICA vs PCA
References

Overview

The smile.ica package provides Independent Component Analysis (ICA) via the FastICA algorithm invented by Aapo Hyvärinen. ICA is a blind source-separation technique that decomposes a set of mixed observed signals into a set of maximally statistically independent components.

Class / Record	Role
`ICA`	The fitted model; holds the unmixing matrix
`ICA.Options`	Hyperparameters: contrast function, iteration limit, tolerance
`LogCosh`	Default contrast function — general purpose
`Exp`	Gaussian contrast function — super-Gaussian / robust
`Kurtosis`	Kurtosis contrast function — simple, sensitive to outliers

All contrast-function classes implement smile.util.function.DifferentiableFunction and java.io.Serializable, so custom implementations can be serialized alongside the model.

Quick Start

java

import smile.ica.*;
import smile.math.MathEx;

// data[i] is sample i, data[i][j] is the j-th mixed signal value.
// Rearrange to variables × samples before calling fit().
double[][] data = /* load your mixed-signal matrix (samples × variables) */;
double[][] X    = MathEx.transpose(data); // variables × samples

// Fit ICA — extract 2 independent components using default settings.
MathEx.setSeed(19650218);       // for reproducibility
ICA ica = ICA.fit(X, 2);

// Each row of components() is one unit-norm independent component vector.
double[][] components = ica.components();
System.out.printf("Component 0, sample 0: %.5f%n", components[0][0]);
System.out.printf("Component 1, sample 0: %.5f%n", components[1][0]);

Mathematical Background

The ICA Model

ICA assumes the observed signal vector x is a linear instantaneous mixture of statistically independent source signals s:

x = A · s

where:

x ∈ ℝᵐ is the vector of observed (mixed) signals at one time instant
s ∈ ℝᵖ is the vector of unknown independent source signals
A ∈ ℝ^{m×p} is the unknown mixing matrix

The goal is to estimate the unmixing matrix W such that:

ŝ = W · x

recovers estimates of the original sources s up to permutation and scaling.

Assumptions

ICA requires the following assumptions to hold:

Statistical independence — the source signals s₁, s₂, …, sₚ are mutually statistically independent.
Non-Gaussianity — at most one source may be Gaussian (by the Central Limit Theorem, a mixture of independent sources becomes more Gaussian; ICA reverses this by maximizing non-Gaussianity).
Linearity — the mixing is instantaneous and linear (no time delays or convolution).
Sufficient observations — at least as many observed signals as source signals (m ≥ p).

Non-Gaussianity as Independence Proxy

FastICA measures non-Gaussianity using negentropy approximations based on a non-quadratic, non-linear contrast function G(u). Negentropy is always non-negative and equals zero if and only if the variable is Gaussian.

The negentropy approximation is:

J(y) ≈ [E{G(y)} - E{G(ν)}]²

where ν is a standard Gaussian variable and the expectation is over samples. Maximizing this over unit-norm directions w gives the most non-Gaussian (most independent) projection.

The FastICA Algorithm

FastICA uses a fixed-point iteration to find each unmixing vector w:

w ← (1/n) · X · g(Xᵀw) − mean(g′(Xᵀw)) · w
w ← w / ‖w‖

where:

X ∈ ℝ^{n×m} is the whitened data matrix (n samples, m variables)
g = G′ is the first derivative of the contrast function G
g′ = G″ is the second derivative

After convergence of each component, deflation orthogonalization removes its contribution so that subsequent components are orthogonal:

w ← w − Σₖ (wₖᵀ w) wₖ    for all previously found components wₖ
w ← w / ‖w‖

Convergence is declared when:

min(‖w − w_old‖, ‖w + w_old‖) < tol

The two-case test handles the sign ambiguity: a direction and its negative represent the same component.

Data Whitening

Before applying FastICA the data is pre-whitened (sphered):

Center — subtract the mean of each observed variable.
Eigendecompose — compute the eigendecomposition of the sample covariance matrix C�� = XᵀX / n = E D Eᵀ.
Scale — form the whitened data Z = X E D^{-1/2}.

After whitening, E{ZᵀZ} = I, so the covariance is the identity matrix. Whitening reduces the ICA problem to finding an orthogonal rotation, which simplifies the optimization considerably.

Numerical note: If any eigenvalue of the covariance matrix is smaller than 1e-8, an IllegalArgumentException is thrown — the data is nearly linearly dependent and ICA is not applicable without dimensionality reduction.

Contrast Functions

A contrast function G must be:

non-quadratic
non-linear
twice differentiable

The three built-in contrast functions cover the most common use cases.

LogCosh (default)

java

new LogCosh()   // or  new ICA.Options("LogCosh", 100)

Property	Value
G(u)	`log(cosh(u))` ≈ `\|u\| − log 2` for large \|u\|
G′(u) = g(u)	`tanh(u)`
G″(u) = g′(u)	`1 − tanh²(u)`
Implementation	Numerically stable: `\|u\| + log1p(exp(−2\|u\|)) − log 2`
Suitable for	General purpose — balances robustness and accuracy
Signal types	Sub-Gaussian and super-Gaussian sources

LogCosh is the recommended default. It is smooth, bounded in derivative, and avoids the numerical overflow that log(cosh(x)) would produce for |x| > 710.

Gaussian (Exp)

java

new Exp()   // or  new ICA.Options("Gaussian", 100)

Property	Value
G(u)	`−exp(−u²/2)`
G′(u) = g(u)	`u · exp(−u²/2)`
G″(u) = g′(u)	`(1 − u²) · exp(−u²/2)`
Suitable for	Super-Gaussian sources, robustness to outliers
Signal types	Sparse signals, impulsive noise

The Gaussian contrast function down-weights extreme values, making it more robust when the data contain outliers.

Kurtosis

java

new Kurtosis()   // or  new ICA.Options("Kurtosis", 100)

Property	Value
G(u)	`u⁴ / 4`
G′(u) = g(u)	`u³`
G″(u) = g′(u)	`3u²`
Suitable for	Simple / educational use, clean data
Signal types	Any, but sensitive to outliers

Kurtosis is a classical measure of non-Gaussianity. However, because G grows as u⁴, it is highly sensitive to large values (outliers). Prefer LogCosh or Exp for real data.

Choosing a Contrast Function

Scenario	Recommended
No prior knowledge	`LogCosh`
Super-Gaussian signals (e.g., speech, sparse)	`Exp`
Sub-Gaussian signals (e.g., uniform)	`LogCosh`
Clean data, no outliers, quick test	`Kurtosis`
Outlier-prone data	`Exp`

Custom Contrast Functions

Implement DifferentiableFunction and optionally Serializable:

java

import smile.util.function.DifferentiableFunction;
import java.io.Serializable;

public class MyContrast implements DifferentiableFunction, Serializable {
    @java.io.Serial
    private static final long serialVersionUID = 1L;

    @Override
    public double f(double x) {
        // G(u) — the contrast function itself (not required by FastICA
        // but required by the DifferentiableFunction interface)
        return Math.log1p(x * x);
    }

    @Override
    public double g(double x) {
        // G′(u) — first derivative
        return 2.0 * x / (1.0 + x * x);
    }

    @Override
    public double g2(double x) {
        // G″(u) — second derivative
        double d = 1.0 + x * x;
        return 2.0 * (1.0 - x * x) / (d * d);
    }
}

// Use it
ICA.Options opts = new ICA.Options(new MyContrast(), 200, 1E-5);
ICA ica = ICA.fit(X, p, opts);

Hyperparameters

Options Record

java

ICA.Options(DifferentiableFunction contrast, int maxIter, double tol)

Parameter	Type	Default	Description
`contrast`	`DifferentiableFunction`	`new LogCosh()`	Contrast function G
`maxIter`	`int`	`100`	Maximum fixed-point iterations per component
`tol`	`double`	`1e-4`	Convergence tolerance on ‖w − w_old‖

Convenience constructors:

java

// Default tolerance 1e-4
new ICA.Options(new LogCosh(), 100)

// Custom tolerance
new ICA.Options(new LogCosh(), 200, 1E-6)

// By name
new ICA.Options("LogCosh",  100)
new ICA.Options("Gaussian", 100)
new ICA.Options("Kurtosis", 100)

Persisting Options with Properties

Options can be serialized to/from java.util.Properties, which is convenient for configuration files or command-line parameter passing.

java

// Save
ICA.Options opts = new ICA.Options(new Exp(), 150, 1E-5);
Properties props = opts.toProperties();
// props contains:
//   smile.ica.contrast   = Gaussian
//   smile.ica.iterations = 150
//   smile.ica.tolerance  = 1.0E-5

// Restore
ICA.Options restored = ICA.Options.of(props);

Property keys:

Key	Value
`smile.ica.contrast`	`"LogCosh"`, `"Gaussian"`, `"Kurtosis"`, or fully-qualified class name
`smile.ica.iterations`	integer string
`smile.ica.tolerance`	double string

When a fully-qualified class name is stored, Options.of() instantiates it via reflection using its no-argument constructor — so custom contrast classes must have a public no-arg constructor.

Input Data Layout

ICA.fit() expects data in variables × samples layout:

data[i][j]  →  value of the i-th observed signal (variable) at time j (sample)

Rows: observed signals / channels (dimension = m)
Columns: time steps / observations (dimension = n)

If your data is in the conventional samples × variables layout (each row is one observation), transpose it first:

java

double[][] samplesXvars = /* ... */;              // shape: n × m
double[][] X = MathEx.transpose(samplesXvars);    // shape: m × n
ICA ica = ICA.fit(X, p);

The number of components p must satisfy 1 ≤ p ≤ m (number of signals).

Working with the Result

ICA is a Java record with a single field:

java

double[][] components = ica.components();

components.length == p (number of independent components extracted)
components[i].length == n (number of samples)
Each row components[i] is a unit-norm vector in the whitened sample space representing the i-th independent component.
Rows are mutually orthogonal: components[i] · components[j] ≈ 0 for i ≠ j.

ICA implements java.io.Serializable (serialVersionUID = 2), so models can be saved and loaded with standard Java object serialization or any compatible framework.

Practical Guidance

Number of Components

Set p to the number of source signals you believe are present. In the absence of domain knowledge:

Start with p = m (full decomposition) to see all components, then keep only the most interpretable ones.
Use domain knowledge or cross-validation to select a smaller p.
p must not exceed m (the number of observed signals).

Convergence and Iteration Limit

If a component does not converge within maxIter iterations a WARN-level SLF4J message is emitted:

Component 2 did not converge in 100 iterations.

Suggested remedies:

Increase maxIter — try 200 or 500.
Loosen tol — e.g., 1e-3 if sub-sample precision is acceptable.
Change contrast function — Exp sometimes converges faster for super-Gaussian sources.
Check data quality — near-collinear signals or strong outliers can prevent convergence.

Reproducibility and Seeding

FastICA initializes w with random Gaussian vectors. Results are non-deterministic across runs. For reproducible output call:

java

MathEx.setSeed(19650218);
ICA ica = ICA.fit(X, p);

Sign and Ordering Ambiguity

ICA has two fundamental ambiguities that cannot be resolved algorithmically:

Sign — w and −w define the same independent subspace. The sign of each extracted component is arbitrary.
Order — there is no canonical ordering of the components. The order may vary between runs even with the same seed.

If consistent ordering matters, sort the components by a domain-specific criterion (e.g., variance of the recovered source, frequency content, etc.) after fitting.

Limitations

Limitation	Details
Linear mixing only	ICA does not handle nonlinear or convolutive mixtures
No temporal structure	FastICA ignores time-ordering; for time-series use SOBI or TDSEP
Gaussian sources	Cannot separate more than one Gaussian source
Square or over-determined systems	Requires `m ≥ p` (at least as many sensors as sources)
Outlier sensitivity (`Kurtosis`)	Use `LogCosh` or `Exp` for noisy real data

Complete Examples

Cocktail Party Problem

The classic motivating example: separate two mixed speech-like signals.

java

import smile.ica.*;
import smile.math.MathEx;

MathEx.setSeed(12345);

int T = 2000;

// Two non-Gaussian source signals
double[] s1 = new double[T];   // sawtooth
double[] s2 = new double[T];   // square wave
for (int t = 0; t < T; t++) {
    s1[t] = 2.0 * ((t % 50) / 50.0) - 1.0;
    s2[t] = Math.signum(Math.sin(2 * Math.PI * t / 30.0));
}

// Linear mixing: two microphones
double[][] mixed = new double[T][2];
for (int t = 0; t < T; t++) {
    mixed[t][0] = 0.7 * s1[t] + 0.3 * s2[t];   // microphone 1
    mixed[t][1] = 0.4 * s1[t] + 0.6 * s2[t];   // microphone 2
}

// Fit ICA (must transpose to variables × samples layout)
ICA ica = ICA.fit(MathEx.transpose(mixed), 2);

double[][] w = ica.components();
System.out.printf("‖w₀‖ = %.6f (should be 1.0)%n", MathEx.norm(w[0]));
System.out.printf("‖w₁‖ = %.6f (should be 1.0)%n", MathEx.norm(w[1]));
System.out.printf("w₀ · w₁ = %.6f (should be ≈ 0)%n", MathEx.dot(w[0], w[1]));

Feature Extraction

ICA can be used for feature extraction as an alternative to PCA. Unlike PCA, ICA components are statistically independent (not just uncorrelated), which makes them more meaningful for non-Gaussian data such as natural images or EEG signals.

java

import smile.ica.*;
import smile.math.MathEx;

// Suppose eegData is channels × timepoints
double[][] eegData = loadEEG(); // shape: 64 × 10000

MathEx.setSeed(42);

// Extract 20 independent components from 64-channel EEG
ICA.Options opts = new ICA.Options(new LogCosh(), 300, 1E-5);
ICA ica = ICA.fit(eegData, 20, opts);

// The components matrix: 20 × 10000
double[][] icComponents = ica.components();

// Inspect convergence via SLF4J logging output.
// Each row icComponents[i] is a unit-norm vector in the whitened sample space.

Configuring via Properties

Useful for externalizing configuration in applications or ML pipelines:

java

import smile.ica.*;
import java.util.Properties;

// ---- at configuration time ----
Properties props = new Properties();
props.setProperty("smile.ica.contrast",   "Gaussian");
props.setProperty("smile.ica.iterations", "200");
props.setProperty("smile.ica.tolerance",  "1E-5");

// ---- at fit time ----
ICA.Options opts = ICA.Options.of(props);  // throws ReflectiveOperationException
ICA ica = ICA.fit(X, p, opts);

// ---- serialize the options for later ----
Properties savedProps = opts.toProperties();
// savedProps.getProperty("smile.ica.contrast") == "Gaussian"

ICA vs PCA

Both ICA and PCA decompose a multivariate signal, but with different goals:

Aspect	PCA	ICA
Criterion	Maximum variance	Maximum statistical independence
Output	Uncorrelated components	Independent components
Gaussian data	Optimal	Undefined (see Limitations)
Non-Gaussian data	Sub-optimal	Optimal
Ordering	Decreasing variance	Arbitrary
Sign	Consistent (largest projection positive)	Arbitrary
Preprocessing	No	Requires whitening (done automatically)
Typical use	Dimensionality reduction, compression	Blind source separation, artifact removal

In practice, PCA whitening is applied as a pre-processing step inside FastICA — the two methods are complementary rather than competing.

References

Aapo Hyvärinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3):626–634, 1999.
Aapo Hyvärinen and Erkki Oja. Independent component analysis: Algorithms and applications. Neural Networks, 13(4–5):411–430, 2000.
Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Independent Component Analysis. Wiley, 2001.
Pierre Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314, 1994.

SMILE — Independent Component Analysis (ICA) User Guide

SMILE — Independent Component Analysis (ICA) User Guide

Table of Contents

Overview

Quick Start

Mathematical Background

The ICA Model

Assumptions

Non-Gaussianity as Independence Proxy

The FastICA Algorithm

Data Whitening

Contrast Functions

LogCosh (default)

Gaussian (Exp)

Kurtosis

Choosing a Contrast Function

Custom Contrast Functions

Hyperparameters

Options Record

Persisting Options with Properties

Input Data Layout

Working with the Result

Practical Guidance

Number of Components

Convergence and Iteration Limit

Reproducibility and Seeding

Sign and Ordering Ambiguity

Limitations

Complete Examples

Cocktail Party Problem

Feature Extraction

Configuring via Properties

ICA vs PCA

References