Back to Druid

Test stats aggregators

docs/development/extensions-core/test-stats.md

latest3.9 KB
Original Source
<!-- ~ Licensed to the Apache Software Foundation (ASF) under one ~ or more contributor license agreements. See the NOTICE file ~ distributed with this work for additional information ~ regarding copyright ownership. The ASF licenses this file ~ to you under the Apache License, Version 2.0 (the ~ "License"); you may not use this file except in compliance ~ with the License. You may obtain a copy of the License at ~ ~ http://www.apache.org/licenses/LICENSE-2.0 ~ ~ Unless required by applicable law or agreed to in writing, ~ software distributed under the License is distributed on an ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY ~ KIND, either express or implied. See the License for the ~ specific language governing permissions and limitations ~ under the License. -->

The druid-stats extension for Apache Druid incorporates aggregators to compute test statistics, including z-scores and p-values. Please refer to Democratizing Experimentation Data for Product Innovations for math background and details.

Make sure to include druid-stats extension in order to use these aggregators.

Z-Score for two sample ztests post aggregator

Please refer to Making Sense of the Two-Proportions Test and An Introduction to Statistics: Comparing Two Means for more details.

z = (p1 - p2) / S.E.  (assuming null hypothesis is true)

Please see below for p1 and p2. Please note S.E. stands for standard error where

S.E. = sqrt{ p1 * ( 1 - p1 )/n1 + p2 * (1 - p2)/n2) }

(p1 – p2) is the observed difference between two sample proportions.

zscore2sample post aggregator

  • zscore2sample: calculate the z-score using two-sample z-test while converting binary variables (e.g. success or not) to continuous variables (e.g. conversion rate).
json
{
  "type": "zscore2sample",
  "name": "<output_name>",
  "successCount1": <post_aggregator> success count of sample 1,
  "sample1Size": <post_aggregaror> sample 1 size,
  "successCount2": <post_aggregator> success count of sample 2,
  "sample2Size" : <post_aggregator> sample 2 size
}

Please note the post aggregator will be converting binary variables to continuous variables for two population proportions. Specifically

p1 = (successCount1) / (sample size 1)

p2 = (successCount2) / (sample size 2)

pvalue2tailedZtest post aggregator

  • pvalue2tailedZtest: calculate p-value of two-sided z-test from zscore
    • pvalue2tailedZtest(zscore) - the input is a z-score which can be calculated using the zscore2sample post aggregator
json
{
  "type": "pvalue2tailedZtest",
  "name": "<output_name>",
  "zScore": <zscore post_aggregator>
}

Example usage

In this example, we use zscore2sample post aggregator to calculate z-score, and then feed the z-score to pvalue2tailedZtest post aggregator to calculate p-value.

A JSON query example can be as follows:

json
{
  ...
    "postAggregations" : {
    "type"   : "pvalue2tailedZtest",
    "name"   : "pvalue",
    "zScore" :
    {
     "type"   : "zscore2sample",
     "name"   : "zscore",
     "successCount1" :
       { "type"   : "constant",
         "name"   : "successCountFromPopulation1Sample",
         "value"  : 300
       },
     "sample1Size" :
       { "type"   : "constant",
         "name"   : "sampleSizeOfPopulation1",
         "value"  : 500
       },
     "successCount2":
       { "type"   : "constant",
         "name"   : "successCountFromPopulation2Sample",
         "value"  : 450
       },
     "sample2Size" :
       { "type"   : "constant",
         "name"   : "sampleSizeOfPopulation2",
         "value"  : 600
       }
     }
    }
}