Member-only story
Validating Type I and II Errors in A/B Tests in R
In the below work, we will intentionally leave out statistics theory and attempt to develop an intuitive sense of what type I(false-positive) and type II(false-negative) errors represent when comparing metrics in A/B tests.
One of the problems plaguing the analysis of A/B tests today is known as the “peeking problem.” To better understand what “peeking” is, it helps to first understand how to properly run a test. We will focus on the case of testing whether there is a difference between the conversion rates cr_a
and cr_b
for groups A and B. We define conversion rate as the total number of conversions in a group divided by the total number of subjects. The basic idea is that we create two experiences, A and B, and give half of the randomly-selected subjects experience A and half B. Then after some number of users have gone through our test, we measure how many conversions happened in each group. The important question is, how many users do we need to have in groups A and B in order to measure a difference in conversion rates of a particular size?
In order to correctly run a test, one should first calculate the required sample size by doing a power calculation. This is easily done in R using the pwr
library and requires a few parameters: the desired significance level (the false positive rate), the desired statistical power (1-false…