Back to Smile

SMILE — Independent Component Analysis (ICA) User Guide

base/ICA.md

6.1.018.2 KB
Original Source

SMILE — Independent Component Analysis (ICA) User Guide

Table of Contents

  1. Overview
  2. Quick Start
  3. Mathematical Background
  4. Contrast Functions
  5. Hyperparameters
  6. Input Data Layout
  7. Working with the Result
  8. Practical Guidance
  9. Complete Examples
  10. ICA vs PCA
  11. References

Overview

The smile.ica package provides Independent Component Analysis (ICA) via the FastICA algorithm invented by Aapo Hyvärinen. ICA is a blind source-separation technique that decomposes a set of mixed observed signals into a set of maximally statistically independent components.

Class / RecordRole
ICAThe fitted model; holds the unmixing matrix
ICA.OptionsHyperparameters: contrast function, iteration limit, tolerance
LogCoshDefault contrast function — general purpose
ExpGaussian contrast function — super-Gaussian / robust
KurtosisKurtosis contrast function — simple, sensitive to outliers

All contrast-function classes implement smile.util.function.DifferentiableFunction and java.io.Serializable, so custom implementations can be serialized alongside the model.


Quick Start

java
import smile.ica.*;
import smile.math.MathEx;

// data[i] is sample i, data[i][j] is the j-th mixed signal value.
// Rearrange to variables × samples before calling fit().
double[][] data = /* load your mixed-signal matrix (samples × variables) */;
double[][] X    = MathEx.transpose(data); // variables × samples

// Fit ICA — extract 2 independent components using default settings.
MathEx.setSeed(19650218);       // for reproducibility
ICA ica = ICA.fit(X, 2);

// Each row of components() is one unit-norm independent component vector.
double[][] components = ica.components();
System.out.printf("Component 0, sample 0: %.5f%n", components[0][0]);
System.out.printf("Component 1, sample 0: %.5f%n", components[1][0]);

Mathematical Background

The ICA Model

ICA assumes the observed signal vector x is a linear instantaneous mixture of statistically independent source signals s:

x = A · s

where:

  • x ∈ ℝᵐ is the vector of observed (mixed) signals at one time instant
  • s ∈ ℝᵖ is the vector of unknown independent source signals
  • A ∈ ℝ^{m×p} is the unknown mixing matrix

The goal is to estimate the unmixing matrix W such that:

ŝ = W · x

recovers estimates of the original sources s up to permutation and scaling.

Assumptions

ICA requires the following assumptions to hold:

  1. Statistical independence — the source signals s₁, s₂, …, sₚ are mutually statistically independent.
  2. Non-Gaussianity — at most one source may be Gaussian (by the Central Limit Theorem, a mixture of independent sources becomes more Gaussian; ICA reverses this by maximizing non-Gaussianity).
  3. Linearity — the mixing is instantaneous and linear (no time delays or convolution).
  4. Sufficient observations — at least as many observed signals as source signals (m ≥ p).

Non-Gaussianity as Independence Proxy

FastICA measures non-Gaussianity using negentropy approximations based on a non-quadratic, non-linear contrast function G(u). Negentropy is always non-negative and equals zero if and only if the variable is Gaussian.

The negentropy approximation is:

J(y) ≈ [E{G(y)} - E{G(ν)}]²

where ν is a standard Gaussian variable and the expectation is over samples. Maximizing this over unit-norm directions w gives the most non-Gaussian (most independent) projection.

The FastICA Algorithm

FastICA uses a fixed-point iteration to find each unmixing vector w:

w ← (1/n) · X · g(Xᵀw) − mean(g′(Xᵀw)) · w
w ← w / ‖w‖

where:

  • X ∈ ℝ^{n×m} is the whitened data matrix (n samples, m variables)
  • g = G′ is the first derivative of the contrast function G
  • g′ = G″ is the second derivative

After convergence of each component, deflation orthogonalization removes its contribution so that subsequent components are orthogonal:

w ← w − Σₖ (wₖᵀ w) wₖ    for all previously found components wₖ
w ← w / ‖w‖

Convergence is declared when:

min(‖w − w_old‖, ‖w + w_old‖) < tol

The two-case test handles the sign ambiguity: a direction and its negative represent the same component.

Data Whitening

Before applying FastICA the data is pre-whitened (sphered):

  1. Center — subtract the mean of each observed variable.
  2. Eigendecompose — compute the eigendecomposition of the sample covariance matrix C��� = XᵀX / n = E D Eᵀ.
  3. Scale — form the whitened data Z = X E D^{-1/2}.

After whitening, E{ZᵀZ} = I, so the covariance is the identity matrix. Whitening reduces the ICA problem to finding an orthogonal rotation, which simplifies the optimization considerably.

Numerical note: If any eigenvalue of the covariance matrix is smaller than 1e-8, an IllegalArgumentException is thrown — the data is nearly linearly dependent and ICA is not applicable without dimensionality reduction.


Contrast Functions

A contrast function G must be:

  • non-quadratic
  • non-linear
  • twice differentiable

The three built-in contrast functions cover the most common use cases.

LogCosh (default)

java
new LogCosh()   // or  new ICA.Options("LogCosh", 100)
PropertyValue
G(u)log(cosh(u))|u| − log 2 for large |u|
G′(u) = g(u)tanh(u)
G″(u) = g′(u)1 − tanh²(u)
ImplementationNumerically stable: |u| + log1p(exp(−2|u|)) − log 2
Suitable forGeneral purpose — balances robustness and accuracy
Signal typesSub-Gaussian and super-Gaussian sources

LogCosh is the recommended default. It is smooth, bounded in derivative, and avoids the numerical overflow that log(cosh(x)) would produce for |x| > 710.

Gaussian (Exp)

java
new Exp()   // or  new ICA.Options("Gaussian", 100)
PropertyValue
G(u)−exp(−u²/2)
G′(u) = g(u)u · exp(−u²/2)
G″(u) = g′(u)(1 − u²) · exp(−u²/2)
Suitable forSuper-Gaussian sources, robustness to outliers
Signal typesSparse signals, impulsive noise

The Gaussian contrast function down-weights extreme values, making it more robust when the data contain outliers.

Kurtosis

java
new Kurtosis()   // or  new ICA.Options("Kurtosis", 100)
PropertyValue
G(u)u⁴ / 4
G′(u) = g(u)
G″(u) = g′(u)3u²
Suitable forSimple / educational use, clean data
Signal typesAny, but sensitive to outliers

Kurtosis is a classical measure of non-Gaussianity. However, because G grows as u⁴, it is highly sensitive to large values (outliers). Prefer LogCosh or Exp for real data.

Choosing a Contrast Function

ScenarioRecommended
No prior knowledgeLogCosh
Super-Gaussian signals (e.g., speech, sparse)Exp
Sub-Gaussian signals (e.g., uniform)LogCosh
Clean data, no outliers, quick testKurtosis
Outlier-prone dataExp

Custom Contrast Functions

Implement DifferentiableFunction and optionally Serializable:

java
import smile.util.function.DifferentiableFunction;
import java.io.Serializable;

public class MyContrast implements DifferentiableFunction, Serializable {
    @java.io.Serial
    private static final long serialVersionUID = 1L;

    @Override
    public double f(double x) {
        // G(u) — the contrast function itself (not required by FastICA
        // but required by the DifferentiableFunction interface)
        return Math.log1p(x * x);
    }

    @Override
    public double g(double x) {
        // G′(u) — first derivative
        return 2.0 * x / (1.0 + x * x);
    }

    @Override
    public double g2(double x) {
        // G″(u) — second derivative
        double d = 1.0 + x * x;
        return 2.0 * (1.0 - x * x) / (d * d);
    }
}

// Use it
ICA.Options opts = new ICA.Options(new MyContrast(), 200, 1E-5);
ICA ica = ICA.fit(X, p, opts);

Hyperparameters

Options Record

java
ICA.Options(DifferentiableFunction contrast, int maxIter, double tol)
ParameterTypeDefaultDescription
contrastDifferentiableFunctionnew LogCosh()Contrast function G
maxIterint100Maximum fixed-point iterations per component
toldouble1e-4Convergence tolerance on ‖w − w_old‖

Convenience constructors:

java
// Default tolerance 1e-4
new ICA.Options(new LogCosh(), 100)

// Custom tolerance
new ICA.Options(new LogCosh(), 200, 1E-6)

// By name
new ICA.Options("LogCosh",  100)
new ICA.Options("Gaussian", 100)
new ICA.Options("Kurtosis", 100)

Persisting Options with Properties

Options can be serialized to/from java.util.Properties, which is convenient for configuration files or command-line parameter passing.

java
// Save
ICA.Options opts = new ICA.Options(new Exp(), 150, 1E-5);
Properties props = opts.toProperties();
// props contains:
//   smile.ica.contrast   = Gaussian
//   smile.ica.iterations = 150
//   smile.ica.tolerance  = 1.0E-5

// Restore
ICA.Options restored = ICA.Options.of(props);

Property keys:

KeyValue
smile.ica.contrast"LogCosh", "Gaussian", "Kurtosis", or fully-qualified class name
smile.ica.iterationsinteger string
smile.ica.tolerancedouble string

When a fully-qualified class name is stored, Options.of() instantiates it via reflection using its no-argument constructor — so custom contrast classes must have a public no-arg constructor.


Input Data Layout

ICA.fit() expects data in variables × samples layout:

data[i][j]  →  value of the i-th observed signal (variable) at time j (sample)
  • Rows: observed signals / channels (dimension = m)
  • Columns: time steps / observations (dimension = n)

If your data is in the conventional samples × variables layout (each row is one observation), transpose it first:

java
double[][] samplesXvars = /* ... */;              // shape: n × m
double[][] X = MathEx.transpose(samplesXvars);    // shape: m × n
ICA ica = ICA.fit(X, p);

The number of components p must satisfy 1 ≤ p ≤ m (number of signals).


Working with the Result

ICA is a Java record with a single field:

java
double[][] components = ica.components();
  • components.length == p (number of independent components extracted)
  • components[i].length == n (number of samples)
  • Each row components[i] is a unit-norm vector in the whitened sample space representing the i-th independent component.
  • Rows are mutually orthogonal: components[i] · components[j] ≈ 0 for i ≠ j.

ICA implements java.io.Serializable (serialVersionUID = 2), so models can be saved and loaded with standard Java object serialization or any compatible framework.


Practical Guidance

Number of Components

Set p to the number of source signals you believe are present. In the absence of domain knowledge:

  • Start with p = m (full decomposition) to see all components, then keep only the most interpretable ones.
  • Use domain knowledge or cross-validation to select a smaller p.
  • p must not exceed m (the number of observed signals).

Convergence and Iteration Limit

If a component does not converge within maxIter iterations a WARN-level SLF4J message is emitted:

Component 2 did not converge in 100 iterations.

Suggested remedies:

  1. Increase maxIter — try 200 or 500.
  2. Loosen tol — e.g., 1e-3 if sub-sample precision is acceptable.
  3. Change contrast functionExp sometimes converges faster for super-Gaussian sources.
  4. Check data quality — near-collinear signals or strong outliers can prevent convergence.

Reproducibility and Seeding

FastICA initializes w with random Gaussian vectors. Results are non-deterministic across runs. For reproducible output call:

java
MathEx.setSeed(19650218);
ICA ica = ICA.fit(X, p);

Sign and Ordering Ambiguity

ICA has two fundamental ambiguities that cannot be resolved algorithmically:

  1. Signw and −w define the same independent subspace. The sign of each extracted component is arbitrary.
  2. Order — there is no canonical ordering of the components. The order may vary between runs even with the same seed.

If consistent ordering matters, sort the components by a domain-specific criterion (e.g., variance of the recovered source, frequency content, etc.) after fitting.

Limitations

LimitationDetails
Linear mixing onlyICA does not handle nonlinear or convolutive mixtures
No temporal structureFastICA ignores time-ordering; for time-series use SOBI or TDSEP
Gaussian sourcesCannot separate more than one Gaussian source
Square or over-determined systemsRequires m ≥ p (at least as many sensors as sources)
Outlier sensitivity (Kurtosis)Use LogCosh or Exp for noisy real data

Complete Examples

Cocktail Party Problem

The classic motivating example: separate two mixed speech-like signals.

java
import smile.ica.*;
import smile.math.MathEx;

MathEx.setSeed(12345);

int T = 2000;

// Two non-Gaussian source signals
double[] s1 = new double[T];   // sawtooth
double[] s2 = new double[T];   // square wave
for (int t = 0; t < T; t++) {
    s1[t] = 2.0 * ((t % 50) / 50.0) - 1.0;
    s2[t] = Math.signum(Math.sin(2 * Math.PI * t / 30.0));
}

// Linear mixing: two microphones
double[][] mixed = new double[T][2];
for (int t = 0; t < T; t++) {
    mixed[t][0] = 0.7 * s1[t] + 0.3 * s2[t];   // microphone 1
    mixed[t][1] = 0.4 * s1[t] + 0.6 * s2[t];   // microphone 2
}

// Fit ICA (must transpose to variables × samples layout)
ICA ica = ICA.fit(MathEx.transpose(mixed), 2);

double[][] w = ica.components();
System.out.printf("‖w₀‖ = %.6f (should be 1.0)%n", MathEx.norm(w[0]));
System.out.printf("‖w₁‖ = %.6f (should be 1.0)%n", MathEx.norm(w[1]));
System.out.printf("w₀ · w₁ = %.6f (should be ≈ 0)%n", MathEx.dot(w[0], w[1]));

Feature Extraction

ICA can be used for feature extraction as an alternative to PCA. Unlike PCA, ICA components are statistically independent (not just uncorrelated), which makes them more meaningful for non-Gaussian data such as natural images or EEG signals.

java
import smile.ica.*;
import smile.math.MathEx;

// Suppose eegData is channels × timepoints
double[][] eegData = loadEEG(); // shape: 64 × 10000

MathEx.setSeed(42);

// Extract 20 independent components from 64-channel EEG
ICA.Options opts = new ICA.Options(new LogCosh(), 300, 1E-5);
ICA ica = ICA.fit(eegData, 20, opts);

// The components matrix: 20 × 10000
double[][] icComponents = ica.components();

// Inspect convergence via SLF4J logging output.
// Each row icComponents[i] is a unit-norm vector in the whitened sample space.

Configuring via Properties

Useful for externalizing configuration in applications or ML pipelines:

java
import smile.ica.*;
import java.util.Properties;

// ---- at configuration time ----
Properties props = new Properties();
props.setProperty("smile.ica.contrast",   "Gaussian");
props.setProperty("smile.ica.iterations", "200");
props.setProperty("smile.ica.tolerance",  "1E-5");

// ---- at fit time ----
ICA.Options opts = ICA.Options.of(props);  // throws ReflectiveOperationException
ICA ica = ICA.fit(X, p, opts);

// ---- serialize the options for later ----
Properties savedProps = opts.toProperties();
// savedProps.getProperty("smile.ica.contrast") == "Gaussian"

ICA vs PCA

Both ICA and PCA decompose a multivariate signal, but with different goals:

AspectPCAICA
CriterionMaximum varianceMaximum statistical independence
OutputUncorrelated componentsIndependent components
Gaussian dataOptimalUndefined (see Limitations)
Non-Gaussian dataSub-optimalOptimal
OrderingDecreasing varianceArbitrary
SignConsistent (largest projection positive)Arbitrary
PreprocessingNoRequires whitening (done automatically)
Typical useDimensionality reduction, compressionBlind source separation, artifact removal

In practice, PCA whitening is applied as a pre-processing step inside FastICA — the two methods are complementary rather than competing.


References

  1. Aapo Hyvärinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3):626–634, 1999.

  2. Aapo Hyvärinen and Erkki Oja. Independent component analysis: Algorithms and applications. Neural Networks, 13(4–5):411–430, 2000.

  3. Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Independent Component Analysis. Wiley, 2001.

  4. Pierre Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314, 1994.


SMILE — © 2010-2026 Haifeng Li. GNU GPL licensed.