base/HYPOTHESIS_TESTING.md
The smile.stat.hypothesis package provides parametric and non-parametric
statistical hypothesis tests. The smile.stat.Hypothesis interface acts as
a concise facade that groups all tests under a single import.
Hypothesis FacadeEvery test returns a result record that bundles the test statistic, degrees of freedom, and two-sided p-value:
| Test | Result type | Key fields |
|---|---|---|
| t-test | TTest | t(), df(), pvalue() |
| F-test / ANOVA | FTest | f(), df1(), df2(), pvalue() |
| KS test | KSTest | d(), pvalue() |
| Correlation | CorTest | cor(), t(), df(), pvalue() |
| Chi-square | ChiSqTest | chisq(), df(), pvalue(), CramerV() |
All records implement toString() that prints the full test summary.
The helper Hypothesis.significance(double pvalue) returns the conventional
significance annotation:
| Range | Code |
|---|---|
| p < 0.001 | *** |
| p < 0.01 | ** |
| p < 0.05 | * |
| p < 0.1 | . |
| p ≥ 0.1 | (blank) |
double p = 0.003;
System.out.println(Hypothesis.significance(p)); // "**"
Hypothesis Facadesmile.stat.Hypothesis is a marker interface with nested static-method
interfaces. Import it once and access all tests through a uniform API.
import smile.stat.Hypothesis;
import smile.stat.hypothesis.*;
// t-test
TTest t = Hypothesis.t.test(x, 0.0); // one-sample
TTest t = Hypothesis.t.test(x, y); // two-sample (Welch)
TTest t = Hypothesis.t.test(x, y, "equal.var");
TTest t = Hypothesis.t.test(x, y, "paired");
// F-test / ANOVA
FTest f = Hypothesis.F.test(x, y); // variance test
FTest f = Hypothesis.F.test(labels, values); // one-way ANOVA
// KS test
KSTest ks = Hypothesis.KS.test(x, dist); // one-sample
KSTest ks = Hypothesis.KS.test(x, y); // two-sample
// Correlation tests
CorTest c = Hypothesis.cor.test(x, y); // Pearson
CorTest c = Hypothesis.cor.test(x, y, "spearman");
CorTest c = Hypothesis.cor.test(x, y, "kendall");
// Chi-square tests
ChiSqTest cs = Hypothesis.chisq.test(bins, prob); // one-sample
ChiSqTest cs = Hypothesis.chisq.test(bins1, bins2); // two-sample
ChiSqTest cs = Hypothesis.chisq.test(contingencyTable); // independence
Student's t-tests are used to compare means when the population is assumed
normally distributed. All variants are in TTest.
Null hypothesis H₀: the population mean equals a specified value μ₀.
double[] x = {2.1, 2.4, 1.9, 2.3, 2.0, 2.2, 1.8, 2.1, 2.0, 2.3};
TTest result = TTest.test(x, 2.0); // H0: mean = 2.0
// or via facade:
TTest result = Hypothesis.t.test(x, 2.0);
System.out.println(result);
// → One Sample t-test(t = 1.1547, df = 9.000, p-value = 0.278 )
System.out.printf("t=%.4f, df=%.0f, p=%.4f%n",
result.t(), result.df(), result.pvalue());
if (result.pvalue() < 0.05) {
System.out.println("Reject H0: mean significantly differs from 2.0");
} else {
System.out.println("Fail to reject H0");
}
Null hypothesis H₀: the two populations have equal means. Uses Welch's formula (unequal variances) by default.
double[] x = {2.5, 2.7, 2.3, 2.6, 2.4};
double[] y = {3.0, 3.2, 3.1, 2.9, 3.0};
TTest result = TTest.test(x, y, false); // false = unequal variances (Welch)
// or via facade:
TTest result = Hypothesis.t.test(x, y);
System.out.println(result);
TTest result = TTest.test(x, y, true); // pooled variance
// or via facade:
TTest result = Hypothesis.t.test(x, y, "equal.var");
Use "equal.var" when you have reason to believe the two populations have
the same variance (e.g. confirmed by an F-test).
Null hypothesis H₀: the mean difference between paired observations is zero. Suitable for before/after designs.
double[] before = {5.0, 5.5, 4.8, 5.2, 5.1};
double[] after = {5.8, 6.0, 5.5, 5.7, 5.6};
TTest result = TTest.testPaired(before, after);
// or via facade:
TTest result = Hypothesis.t.test(before, after, "paired");
System.out.println(result);
Tests whether the Pearson correlation coefficient differs significantly from 0.
double r = MathEx.cor(x, y);
int df = x.length - 2;
TTest result = TTest.test(r, df);
// or via facade:
TTest result = Hypothesis.t.test(r, df);
Null hypothesis H₀: the two populations have equal variances.
⚠️ The F-test is extremely sensitive to departures from normality. Use it only when the data is confirmed to be normally distributed.
double[] x = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0};
double[] y = {1.1, 1.9, 2.8, 4.2, 5.1, 6.2};
FTest result = FTest.test(x, y);
// or via facade:
FTest result = Hypothesis.F.test(x, y);
System.out.println(result);
// → F-test(f = 1.1143, df1 = 5, df2 = 5, p-value = 0.9012)
System.out.printf("F=%.4f, df1=%d, df2=%d, p=%.4f%n",
result.f(), result.df1(), result.df2(), result.pvalue());
Null hypothesis H₀: all group means are equal. Assumes equal variances across groups.
int[] groups = {1,1,1,1,1,1, 2,2,2,2,2,2, 3,3,3,3,3,3};
double[] values = {6,8,4,5,3,4, 8,12,9,11,6,8, 13,9,11,8,7,12};
FTest result = FTest.test(groups, values);
// or via facade:
FTest result = Hypothesis.F.test(groups, values);
System.out.println(result);
// → F-test(f = 9.2647, df1 = 2, df2 = 15, p-value = 0.002399)
if (result.pvalue() < 0.05) {
System.out.println("At least one group mean differs significantly.");
}
The KS test compares distributions via their empirical CDFs. It is non-parametric and sensitive to differences in both location and shape.
Note: Both
test()overloads sort the input arrays in place. Pass copies if the original order must be preserved.
Null hypothesis H₀: the data was drawn from the reference distribution.
import smile.stat.distribution.GaussianDistribution;
double[] x = /* observed data */;
KSTest result = KSTest.test(x, GaussianDistribution.getInstance());
// or via facade:
KSTest result = Hypothesis.KS.test(x, GaussianDistribution.getInstance());
System.out.println(result);
// → One Sample Kolmogorov-Smirnov Test(d = 0.0930, p-value = 0.7598)
System.out.printf("D=%.4f, p=%.4f%n", result.d(), result.pvalue());
// Large p-value → fail to reject H0 (data is consistent with N(0,1))
Null hypothesis H₀: both samples are drawn from the same distribution.
double[] x = /* sample 1 */;
double[] y = /* sample 2 */;
KSTest result = KSTest.test(x, y);
// or via facade:
KSTest result = Hypothesis.KS.test(x, y);
System.out.println(result);
// Small p-value → distributions differ significantly
CorTest provides three correlation measures, each with an associated
significance test.
Measures linear association. Assumes bivariate normality.
double[] x = {44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1};
double[] y = { 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8};
CorTest result = CorTest.pearson(x, y);
// or via facade:
CorTest result = Hypothesis.cor.test(x, y); // default = Pearson
CorTest result = Hypothesis.cor.test(x, y, "pearson");
System.out.println(result);
// → Pearson Correlation Test(cor = 0.57, t = 1.8411, df = 7, p-value = 0.1082)
System.out.printf("r=%.4f, t=%.4f, df=%.0f, p=%.4f%n",
result.cor(), result.t(), result.df(), result.pvalue());
Measures monotone association. Distribution-free (non-parametric).
CorTest result = CorTest.spearman(x, y);
// or:
CorTest result = Hypothesis.cor.test(x, y, "spearman");
System.out.printf("ρ=%.4f, p=%.4f%n", result.cor(), result.pvalue());
Measures concordance between rankings. More robust than Spearman for small samples with many ties.
CorTest result = CorTest.kendall(x, y);
// or:
CorTest result = Hypothesis.cor.test(x, y, "kendall");
System.out.printf("τ=%.4f, p=%.4f%n", result.cor(), result.pvalue());
Note: The Kendall test is non-parametric. Its
df()field is set to 0. The p-value uses a normal approximation valid for n ≥ 10.
| Scenario | Recommended method |
|---|---|
| Continuous, normally distributed data | Pearson |
| Ordinal data, or non-normal continuous | Spearman |
| Small samples with ties | Kendall |
Pearson's chi-square tests compare observed counts to expected counts.
⚠️ The chi-square approximation is only valid when expected counts are reasonably large (typically ≥ 5 per cell). With small counts, consider Fisher's exact test.
Null hypothesis H₀: the observed frequencies follow the expected distribution.
// Observed die rolls
int[] bins = {20, 22, 13, 22, 10, 13};
double[] prob = {1.0/6, 1.0/6, 1.0/6, 1.0/6, 1.0/6, 1.0/6}; // fair die
ChiSqTest result = ChiSqTest.test(bins, prob);
// or via facade:
ChiSqTest result = Hypothesis.chisq.test(bins, prob);
System.out.println(result);
// → One Sample Chi-squared Test(t = 8.36, df = 5, p-value = 0.1370)
System.out.printf("χ²=%.4f, df=%.0f, p=%.4f%n",
result.chisq(), result.df(), result.pvalue());
Specify a custom number of constraints (default = 1) when parameters were estimated from the data:
// Constraints = 1 (default): one constraint from the total count
ChiSqTest result = ChiSqTest.test(bins, prob, 1);
// or:
ChiSqTest result = Hypothesis.chisq.test(bins, prob, 1);
Null hypothesis H₀: both samples have the same underlying distribution.
int[] bins1 = {8, 13, 16, 10, 3};
int[] bins2 = {4, 9, 14, 16, 7};
ChiSqTest result = ChiSqTest.test(bins1, bins2);
// or via facade:
ChiSqTest result = Hypothesis.chisq.test(bins1, bins2);
System.out.printf("χ²=%.4f, df=%.0f, p=%.4f%n",
result.chisq(), result.df(), result.pvalue());
Null hypothesis H₀: the two categorical variables are independent.
Also computes Cramér's V — a measure of association strength between 0 (no association) and 1 (perfect association). For 2×2 tables, equals the Phi coefficient; continuity correction is applied automatically.
// 2×2 contingency table
int[][] table = {
{12, 7},
{ 5, 7}
};
ChiSqTest result = ChiSqTest.test(table);
// or via facade:
ChiSqTest result = Hypothesis.chisq.test(table);
System.out.println(result);
// → Contingency Table Chi-squared Test(t = 0.9..., df = 1, p-value = 0.4233, Cramer's V = 0.22)
System.out.printf("χ²=%.4f, df=%.0f, p=%.4f, V=%.4f%n",
result.chisq(), result.df(), result.pvalue(), result.CramerV());
Larger tables (r × c) work the same way:
int[][] table3x3 = {
{50, 30, 20},
{15, 45, 40},
{35, 25, 40}
};
ChiSqTest result = ChiSqTest.test(table3x3);
The p-value is the probability of observing data at least as extreme as the sample, assuming the null hypothesis is true. A small p-value provides evidence against H₀.
| p-value | Interpretation |
|---|---|
| < 0.001 | Very strong evidence against H₀ |
| 0.001–0.01 | Strong evidence against H₀ |
| 0.01–0.05 | Moderate evidence against H₀ |
| 0.05–0.1 | Weak / suggestive evidence |
| ≥ 0.1 | Insufficient evidence to reject H₀ |
Common misconception: a large p-value does not "prove" H₀. It merely means the data are consistent with H₀.
All tests in this package report two-sided p-values. To obtain a one-sided p-value for a t-test, divide by 2 (for symmetric distributions):
TTest t = TTest.test(x, 0.0);
double oneSidedP = t.pvalue() / 2.0; // if you predicted the direction
If you run many tests simultaneously, consider correcting for multiple comparisons (Bonferroni, Benjamini-Hochberg, etc.) to control the family-wise error rate or false discovery rate.
The p-value tells you whether an effect exists, not how large it is. Always report an effect size alongside the p-value:
| Test | Suggested effect size |
|---|---|
| t-test | Cohen's d |
| F-test / ANOVA | η² (eta-squared) |
| Correlation | r (Pearson), ρ (Spearman), τ (Kendall) |
| Chi-square | Cramér's V (provided in CramerV()) |
import smile.stat.Hypothesis;
import smile.stat.hypothesis.*;
// ── Significance code ─────────────────────────────────────────────
Hypothesis.significance(0.003); // → "**"
// ── t-Tests ───────────────────────────────────────────────────────
TTest r = Hypothesis.t.test(x, mu0); // one-sample
TTest r = Hypothesis.t.test(x, y); // Welch two-sample
TTest r = Hypothesis.t.test(x, y, "equal.var"); // pooled two-sample
TTest r = Hypothesis.t.test(x, y, "paired"); // paired
TTest r = Hypothesis.t.test(corr, df); // correlation significance
r.t(); // t-statistic
r.df(); // degrees of freedom
r.pvalue(); // two-sided p-value
// ── F-Test / ANOVA ────────────────────────────────────────────────
FTest r = Hypothesis.F.test(x, y); // variance equality
FTest r = Hypothesis.F.test(labels, values); // one-way ANOVA
r.f(); // F-statistic
r.df1(); // numerator df
r.df2(); // denominator df
r.pvalue();
// ── KS Test ───────────────────────────────────────────────────────
KSTest r = Hypothesis.KS.test(x, distribution); // one-sample
KSTest r = Hypothesis.KS.test(x, y); // two-sample
r.d(); // KS statistic (max |F_n - F|)
r.pvalue();
// ── Correlation Tests ─────────────────────────────────────────────
CorTest r = Hypothesis.cor.test(x, y); // Pearson
CorTest r = Hypothesis.cor.test(x, y, "spearman"); // Spearman
CorTest r = Hypothesis.cor.test(x, y, "kendall"); // Kendall
r.cor(); // correlation coefficient
r.t(); // test statistic
r.df(); // degrees of freedom (0 for Kendall)
r.pvalue();
// ── Chi-Square Tests ──────────────────────────────────────────────
ChiSqTest r = Hypothesis.chisq.test(bins, prob); // goodness-of-fit
ChiSqTest r = Hypothesis.chisq.test(bins, prob, k); // with k constraints
ChiSqTest r = Hypothesis.chisq.test(bins1, bins2); // two-sample
ChiSqTest r = Hypothesis.chisq.test(bins1, bins2, k); // with k constraints
ChiSqTest r = Hypothesis.chisq.test(contingencyTable); // independence
r.chisq(); // chi-square statistic
r.df(); // degrees of freedom
r.pvalue();
r.CramerV(); // effect size (contingency table only)
TTest.test(x, mean);
TTest.test(x, y, equalVariance);
TTest.testPaired(x, y);
TTest.test(r, df);
FTest.test(x, y);
FTest.test(labels, values); // ANOVA
KSTest.test(x, distribution);
KSTest.test(x, y);
CorTest.pearson(x, y);
CorTest.spearman(x, y);
CorTest.kendall(x, y);
ChiSqTest.test(bins, prob);
ChiSqTest.test(bins, prob, constraints);
ChiSqTest.test(bins1, bins2);
ChiSqTest.test(bins1, bins2, constraints);
ChiSqTest.test(table);
SMILE — © 2010-2026 Haifeng Li. GNU GPL licensed.