base/ICA.md
The smile.ica package provides Independent Component Analysis (ICA) via the
FastICA algorithm invented by Aapo Hyvärinen. ICA is a blind source-separation
technique that decomposes a set of mixed observed signals into a set of maximally
statistically independent components.
| Class / Record | Role |
|---|---|
ICA | The fitted model; holds the unmixing matrix |
ICA.Options | Hyperparameters: contrast function, iteration limit, tolerance |
LogCosh | Default contrast function — general purpose |
Exp | Gaussian contrast function — super-Gaussian / robust |
Kurtosis | Kurtosis contrast function — simple, sensitive to outliers |
All contrast-function classes implement smile.util.function.DifferentiableFunction
and java.io.Serializable, so custom implementations can be serialized alongside
the model.
import smile.ica.*;
import smile.math.MathEx;
// data[i] is sample i, data[i][j] is the j-th mixed signal value.
// Rearrange to variables × samples before calling fit().
double[][] data = /* load your mixed-signal matrix (samples × variables) */;
double[][] X = MathEx.transpose(data); // variables × samples
// Fit ICA — extract 2 independent components using default settings.
MathEx.setSeed(19650218); // for reproducibility
ICA ica = ICA.fit(X, 2);
// Each row of components() is one unit-norm independent component vector.
double[][] components = ica.components();
System.out.printf("Component 0, sample 0: %.5f%n", components[0][0]);
System.out.printf("Component 1, sample 0: %.5f%n", components[1][0]);
ICA assumes the observed signal vector x is a linear instantaneous mixture of statistically independent source signals s:
x = A · s
where:
The goal is to estimate the unmixing matrix W such that:
ŝ = W · x
recovers estimates of the original sources s up to permutation and scaling.
ICA requires the following assumptions to hold:
s₁, s₂, …, sₚ are
mutually statistically independent.m ≥ p).FastICA measures non-Gaussianity using negentropy approximations based on a non-quadratic, non-linear contrast function G(u). Negentropy is always non-negative and equals zero if and only if the variable is Gaussian.
The negentropy approximation is:
J(y) ≈ [E{G(y)} - E{G(ν)}]²
where ν is a standard Gaussian variable and the expectation is over samples. Maximizing this over unit-norm directions w gives the most non-Gaussian (most independent) projection.
FastICA uses a fixed-point iteration to find each unmixing vector w:
w ← (1/n) · X · g(Xᵀw) − mean(g′(Xᵀw)) · w
w ← w / ‖w‖
where:
g = G′ is the first derivative of the contrast function Gg′ = G″ is the second derivativeAfter convergence of each component, deflation orthogonalization removes its contribution so that subsequent components are orthogonal:
w ← w − Σₖ (wₖᵀ w) wₖ for all previously found components wₖ
w ← w / ‖w‖
Convergence is declared when:
min(‖w − w_old‖, ‖w + w_old‖) < tol
The two-case test handles the sign ambiguity: a direction and its negative represent the same component.
Before applying FastICA the data is pre-whitened (sphered):
C��� = XᵀX / n = E D Eᵀ.Z = X E D^{-1/2}.After whitening, E{ZᵀZ} = I, so the covariance is the identity matrix.
Whitening reduces the ICA problem to finding an orthogonal rotation, which
simplifies the optimization considerably.
Numerical note: If any eigenvalue of the covariance matrix is smaller than
1e-8, anIllegalArgumentExceptionis thrown — the data is nearly linearly dependent and ICA is not applicable without dimensionality reduction.
A contrast function G must be:
The three built-in contrast functions cover the most common use cases.
new LogCosh() // or new ICA.Options("LogCosh", 100)
| Property | Value |
|---|---|
| G(u) | log(cosh(u)) ≈ |u| − log 2 for large |u| |
| G′(u) = g(u) | tanh(u) |
| G″(u) = g′(u) | 1 − tanh²(u) |
| Implementation | Numerically stable: |u| + log1p(exp(−2|u|)) − log 2 |
| Suitable for | General purpose — balances robustness and accuracy |
| Signal types | Sub-Gaussian and super-Gaussian sources |
LogCosh is the recommended default. It is smooth, bounded in derivative, and
avoids the numerical overflow that log(cosh(x)) would produce for |x| > 710.
new Exp() // or new ICA.Options("Gaussian", 100)
| Property | Value |
|---|---|
| G(u) | −exp(−u²/2) |
| G′(u) = g(u) | u · exp(−u²/2) |
| G″(u) = g′(u) | (1 − u²) · exp(−u²/2) |
| Suitable for | Super-Gaussian sources, robustness to outliers |
| Signal types | Sparse signals, impulsive noise |
The Gaussian contrast function down-weights extreme values, making it more robust when the data contain outliers.
new Kurtosis() // or new ICA.Options("Kurtosis", 100)
| Property | Value |
|---|---|
| G(u) | u⁴ / 4 |
| G′(u) = g(u) | u³ |
| G″(u) = g′(u) | 3u² |
| Suitable for | Simple / educational use, clean data |
| Signal types | Any, but sensitive to outliers |
Kurtosis is a classical measure of non-Gaussianity. However, because G grows as
u⁴, it is highly sensitive to large values (outliers). Prefer LogCosh or
Exp for real data.
| Scenario | Recommended |
|---|---|
| No prior knowledge | LogCosh |
| Super-Gaussian signals (e.g., speech, sparse) | Exp |
| Sub-Gaussian signals (e.g., uniform) | LogCosh |
| Clean data, no outliers, quick test | Kurtosis |
| Outlier-prone data | Exp |
Implement DifferentiableFunction and optionally Serializable:
import smile.util.function.DifferentiableFunction;
import java.io.Serializable;
public class MyContrast implements DifferentiableFunction, Serializable {
@java.io.Serial
private static final long serialVersionUID = 1L;
@Override
public double f(double x) {
// G(u) — the contrast function itself (not required by FastICA
// but required by the DifferentiableFunction interface)
return Math.log1p(x * x);
}
@Override
public double g(double x) {
// G′(u) — first derivative
return 2.0 * x / (1.0 + x * x);
}
@Override
public double g2(double x) {
// G″(u) — second derivative
double d = 1.0 + x * x;
return 2.0 * (1.0 - x * x) / (d * d);
}
}
// Use it
ICA.Options opts = new ICA.Options(new MyContrast(), 200, 1E-5);
ICA ica = ICA.fit(X, p, opts);
ICA.Options(DifferentiableFunction contrast, int maxIter, double tol)
| Parameter | Type | Default | Description |
|---|---|---|---|
contrast | DifferentiableFunction | new LogCosh() | Contrast function G |
maxIter | int | 100 | Maximum fixed-point iterations per component |
tol | double | 1e-4 | Convergence tolerance on ‖w − w_old‖ |
Convenience constructors:
// Default tolerance 1e-4
new ICA.Options(new LogCosh(), 100)
// Custom tolerance
new ICA.Options(new LogCosh(), 200, 1E-6)
// By name
new ICA.Options("LogCosh", 100)
new ICA.Options("Gaussian", 100)
new ICA.Options("Kurtosis", 100)
Options can be serialized to/from java.util.Properties, which is convenient
for configuration files or command-line parameter passing.
// Save
ICA.Options opts = new ICA.Options(new Exp(), 150, 1E-5);
Properties props = opts.toProperties();
// props contains:
// smile.ica.contrast = Gaussian
// smile.ica.iterations = 150
// smile.ica.tolerance = 1.0E-5
// Restore
ICA.Options restored = ICA.Options.of(props);
Property keys:
| Key | Value |
|---|---|
smile.ica.contrast | "LogCosh", "Gaussian", "Kurtosis", or fully-qualified class name |
smile.ica.iterations | integer string |
smile.ica.tolerance | double string |
When a fully-qualified class name is stored, Options.of() instantiates it via
reflection using its no-argument constructor — so custom contrast classes must
have a public no-arg constructor.
ICA.fit() expects data in variables × samples layout:
data[i][j] → value of the i-th observed signal (variable) at time j (sample)
If your data is in the conventional samples × variables layout (each row is one observation), transpose it first:
double[][] samplesXvars = /* ... */; // shape: n × m
double[][] X = MathEx.transpose(samplesXvars); // shape: m × n
ICA ica = ICA.fit(X, p);
The number of components p must satisfy 1 ≤ p ≤ m (number of signals).
ICA is a Java record with a single field:
double[][] components = ica.components();
components.length == p (number of independent components extracted)components[i].length == n (number of samples)components[i] is a unit-norm vector in the whitened sample
space representing the i-th independent component.components[i] · components[j] ≈ 0 for
i ≠ j.ICA implements java.io.Serializable (serialVersionUID = 2), so models can
be saved and loaded with standard Java object serialization or any compatible
framework.
Set p to the number of source signals you believe are present. In the absence
of domain knowledge:
p = m (full decomposition) to see all components, then keep
only the most interpretable ones.p.p must not exceed m (the number of observed signals).If a component does not converge within maxIter iterations a WARN-level
SLF4J message is emitted:
Component 2 did not converge in 100 iterations.
Suggested remedies:
maxIter — try 200 or 500.tol — e.g., 1e-3 if sub-sample precision is acceptable.Exp sometimes converges faster for
super-Gaussian sources.FastICA initializes w with random Gaussian vectors. Results are non-deterministic across runs. For reproducible output call:
MathEx.setSeed(19650218);
ICA ica = ICA.fit(X, p);
ICA has two fundamental ambiguities that cannot be resolved algorithmically:
w and −w define the same independent subspace. The sign of
each extracted component is arbitrary.If consistent ordering matters, sort the components by a domain-specific criterion (e.g., variance of the recovered source, frequency content, etc.) after fitting.
| Limitation | Details |
|---|---|
| Linear mixing only | ICA does not handle nonlinear or convolutive mixtures |
| No temporal structure | FastICA ignores time-ordering; for time-series use SOBI or TDSEP |
| Gaussian sources | Cannot separate more than one Gaussian source |
| Square or over-determined systems | Requires m ≥ p (at least as many sensors as sources) |
Outlier sensitivity (Kurtosis) | Use LogCosh or Exp for noisy real data |
The classic motivating example: separate two mixed speech-like signals.
import smile.ica.*;
import smile.math.MathEx;
MathEx.setSeed(12345);
int T = 2000;
// Two non-Gaussian source signals
double[] s1 = new double[T]; // sawtooth
double[] s2 = new double[T]; // square wave
for (int t = 0; t < T; t++) {
s1[t] = 2.0 * ((t % 50) / 50.0) - 1.0;
s2[t] = Math.signum(Math.sin(2 * Math.PI * t / 30.0));
}
// Linear mixing: two microphones
double[][] mixed = new double[T][2];
for (int t = 0; t < T; t++) {
mixed[t][0] = 0.7 * s1[t] + 0.3 * s2[t]; // microphone 1
mixed[t][1] = 0.4 * s1[t] + 0.6 * s2[t]; // microphone 2
}
// Fit ICA (must transpose to variables × samples layout)
ICA ica = ICA.fit(MathEx.transpose(mixed), 2);
double[][] w = ica.components();
System.out.printf("‖w₀‖ = %.6f (should be 1.0)%n", MathEx.norm(w[0]));
System.out.printf("‖w₁‖ = %.6f (should be 1.0)%n", MathEx.norm(w[1]));
System.out.printf("w₀ · w₁ = %.6f (should be ≈ 0)%n", MathEx.dot(w[0], w[1]));
ICA can be used for feature extraction as an alternative to PCA. Unlike PCA, ICA components are statistically independent (not just uncorrelated), which makes them more meaningful for non-Gaussian data such as natural images or EEG signals.
import smile.ica.*;
import smile.math.MathEx;
// Suppose eegData is channels × timepoints
double[][] eegData = loadEEG(); // shape: 64 × 10000
MathEx.setSeed(42);
// Extract 20 independent components from 64-channel EEG
ICA.Options opts = new ICA.Options(new LogCosh(), 300, 1E-5);
ICA ica = ICA.fit(eegData, 20, opts);
// The components matrix: 20 × 10000
double[][] icComponents = ica.components();
// Inspect convergence via SLF4J logging output.
// Each row icComponents[i] is a unit-norm vector in the whitened sample space.
Useful for externalizing configuration in applications or ML pipelines:
import smile.ica.*;
import java.util.Properties;
// ---- at configuration time ----
Properties props = new Properties();
props.setProperty("smile.ica.contrast", "Gaussian");
props.setProperty("smile.ica.iterations", "200");
props.setProperty("smile.ica.tolerance", "1E-5");
// ---- at fit time ----
ICA.Options opts = ICA.Options.of(props); // throws ReflectiveOperationException
ICA ica = ICA.fit(X, p, opts);
// ---- serialize the options for later ----
Properties savedProps = opts.toProperties();
// savedProps.getProperty("smile.ica.contrast") == "Gaussian"
Both ICA and PCA decompose a multivariate signal, but with different goals:
| Aspect | PCA | ICA |
|---|---|---|
| Criterion | Maximum variance | Maximum statistical independence |
| Output | Uncorrelated components | Independent components |
| Gaussian data | Optimal | Undefined (see Limitations) |
| Non-Gaussian data | Sub-optimal | Optimal |
| Ordering | Decreasing variance | Arbitrary |
| Sign | Consistent (largest projection positive) | Arbitrary |
| Preprocessing | No | Requires whitening (done automatically) |
| Typical use | Dimensionality reduction, compression | Blind source separation, artifact removal |
In practice, PCA whitening is applied as a pre-processing step inside FastICA — the two methods are complementary rather than competing.
Aapo Hyvärinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3):626–634, 1999.
Aapo Hyvärinen and Erkki Oja. Independent component analysis: Algorithms and applications. Neural Networks, 13(4–5):411–430, 2000.
Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Independent Component Analysis. Wiley, 2001.
Pierre Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314, 1994.
SMILE — © 2010-2026 Haifeng Li. GNU GPL licensed.