docs/en/sql-reference/sql-functions/aggregate-functions/mann_whitney_u_test.md
mann_whitney_u_test performs the Mann-Whitney rank test on samples derived from two populations. The Mann-Whitney U test is a non-parametric test that can be used to determine if two populations were selected from the same distribution.
MANN_WHITNEY_U_TEST (sample_data, sample_treatment[, alternative[, continuity_correction]])
sample_data: the value of sample data. It must be of numeric data types.
sample_treatment: the index of sample data. where each element indicates the treatment group that the corresponding sample belongs to. The values should be of type boolean, with false representing the first group and true representing the second group.
alternative (optional): A const string specifying the alternative hypothesis. It can be one of the following:
continuity_correction (optional): A const boolean value indicating whether to apply a continuity correction. The continuity correction adjusts the U statistic by 0.5 towards the mean of the U distribution, which can improve the accuracy of the test for small sample sizes. The default value is true.
The function returns a json array containing the following two elements:
The Mann-Whitney U statistic and the p-value associated with the test.
This function ignores NULLs.
Suppose there is a table named testing_data with the following data.
create table testing_data (
id int,
score int,
treatment boolean
)
properties(
"replication_num" = "1"
);
insert into testing_data values
(1, 80, false),
(2, 100, false),
(3, NULL, false),
(4, 60, true),
(5, 70, true),
(6, 85, true);
select * from testing_data;
+------+-------+-----------+
| id | score | treatment |
+------+-------+-----------+
| 1 | 80 | 0 |
| 2 | 100 | 0 |
| 3 | NULL | 0 |
| 4 | 60 | 1 |
| 5 | 70 | 1 |
| 6 | 85 | 1 |
+------+-------+-----------+
Query:
SELECT MANN_WHITNEY_U_TEST(score, treatment) FROM testing_data;
Result:
+---------------------------------------+
| mann_whitney_u_test(score, treatment) |
+---------------------------------------+
| [5, 0.38647623077123283] |
+---------------------------------------+
Query:
SELECT MANN_WHITNEY_U_TEST(score, treatment, 'less') FROM testing_data;
Result:
+-----------------------------------------------+
| mann_whitney_u_test(score, treatment, 'less') |
+-----------------------------------------------+
| [5, 0.9255426634106172] |
+-----------------------------------------------+
Query:
SELECT MANN_WHITNEY_U_TEST(score, treatment, 'two-sided', 0) FROM testing_data;
Result:
+-------------------------------------------------------+
| mann_whitney_u_test(score, treatment, 'two-sided', 0) |
+-------------------------------------------------------+
| [5, 0.2482130789899235] |
+-------------------------------------------------------+