본문 바로가기
Growth/통계

A/B Testing for Data Science ( Python and R) | June 30th, 2021

by Diligejy 2022. 8. 18.

https://www.youtube.com/watch?v=ZdC8dwL0rlI 

 

A/B test - What & Why?

1.  What is an A/B Test?

a.  An experiment to compare two competing options(A, B). 

    i. Options : treatments (medical), designs (ad, web), products, prices, etc.

 

2. Why use an A/B Test?

a. To determine if the options are different

    i. Different in a statistical sense (Hypothesis testing, permutation test)

 

b. To determine what's the better option

    i. Better for the question/goal at hand (e.g customer acquisition, profit)

 

 

A/B Test - Examples?

1. General Example

a. Two soil Treatments : which one promotes better seed germination?

 

b. Two web headlines : which one produces more clicks?

 

c. Two web ads : which one generates more conversions?

 

d. Two prices:

    i. Which one yields a higher net profit?

    ii. Which one leads to more new customers?

 

e. Two therapies : which one is more effective at suppressing cancer?

    i. Control Group : Subjects exposed to no treatment or standard treatment.

    ii. Treatment Group : Subjects exposed to the new treatment.

 

2. Specific Example

a. Microsoft(Bing) :

    i. one A/B test : changing the way the Bing search engine displayed ad headlines 

    ii. led to a 12% increase in revenue.

    iii. that's more than $100 million per year in the US alone.

 

b. Amazon

    i. moving credit card offers from its home page to the shopping cart page

    ii. boosted profits by tens of millions of dollars annually

 

c. Google & Bing 

    i. only 10% to 20% of experiments generate positive results

 

A/B Test - How? Steps

(0. Idea & Definition : question, goal, data/subjects, options, test statistic)

 

1. Subjects : Set of all subjects.

 

2. Randomization : Randomly assign subjects to the two groups (A, B)

 

3. Results : Expose subjects to options (A, B), measure results, and compute test statistic

 

4. Hypothesis testing : determine if the observed difference is statistically significant.

    - Can be done with a permutation test    

 

(5. Action/Decision : based on test results) 

 

A/B Test - How? Didactic Example

1. Test statistic : Yes Rate Difference 

    a. (A - B) (%) = 70% - 40% = 30%

 

2. Is this a (statistically) significant difference?

    a. Is this difference just due to random change?

    b. Or, is it due the different options (A, B)?

 

A/B Test - How? - Hypothesis Testing 

1. Is the observed difference (30%) statistically significant?

 

2. Thanks to randomization any observed difference between A and B must be due to either

    a. Null hypothesis : Random chance (subject assignment)

    b. Alternative hypothesis : Real difference between A & B

 

3. Hypothesis testing : Is random chance (Null hypothesis) a reasonable explanation for the observed difference?

    a. Assumes that the Null hypothesis is true

    b. Creates corresponding Null model (probability model)

    c. Tests whether the observed difference is a reasonable outcome of that Null model

    d. Is the observed difference within the random variability of the Null model?

 

A/B Test - How? Hypothesis Testing (p-value)

 

1. P-value:

    a. given a random chance (probability) model that embodies the Null hypothesis.

    b. the p-value is the probability of obtatining results as unusual / extreme as the observed result.

 

2. Significance level (alpha)

    a. the probability threshold of "unusalness" (e.g 0.05)

    b. must be defined before the experiment

    c. the probability we accept for a type I error

 

3. Decision:

    a. p-value >= alpha : retain the null hypothesis

        i. observed difference due to random chance

    b. p-value < alpha : reject the null hypothesis

        i. observed difference due to 

 

4. Type I error (false positive)

    a. Mistakenly concluding that an effect is real (when it is due to chance). Probability = alpha

 

5. Type II error (false negative)

    a. Mistakenly concluding that an effect is due to chance (when it is real)

 

Permutation Test - What & Why?

 

1. Permutation test is a resampling procedure used for hypothesis

 

2. Resampling : repeatedly sample values from the observed data to assess a statistic's random variability.

 

3. Two main types of resampling procedures:

    a. Bootstrap : resampling with replacement, used to assess reliability of an estimate.

    b. Permutation : resampling without replacement, used for hypothesis testing.

 

4. Permutation Test:

    a. A resampling procedure used for hypothesis testing

    b. Process for combining two (or more) data samples together, and randomly (or exhaustively) reallocating the observations to resamples to assess the random variability of the test statistic.

    c. A way to create the Null model and compute the p-value

    d. Advantage : No assumptions (creates Null model from the data itself)

 

Permutation Test - How? Steps

1. Step 0 : A/B test results

 

2. Step 1 : Put all A/B test results in a single dataset ("bag")

 

3. Step 2 to 5 : Do one permutation :

 

    a. Step 2 : Shuffle the "bag"

    b. Step 3 : Randomly draw (without replacement) a resample of size of group A

    c. Step 4 : Randomly draw (without replacement) a resample of size of group B 

    d. Step 5 : Record the test statistic for resamples.

 

4. Step 6 : Do many Permutations : to yield a permutation distributino of test statistic (Null model)

 

5. Hypothesis testing : Use permutation results to compute the p-value as the ratio of values that are as or more extreme than the observed data

 

Permutation Test - How? - Hypothesis testing

 

1. A/B test observed difference : (A - B) (%) = 30%

 

2. Counts of results from 100 permutations

3. Two-way test (Null: A = B, Alternative : A != B)

    a. Extreme values count : 37 = 4 + 12 + 16 + 5

    b. Extreme values ratio (p-value) : 0.37 = 37 / 100

 

4. One-way test (Null : A <= B, Alternative : A > B)

    a. Extreme (positive) values count : 21 = 16 + 5

    b. Extreme (positive) values ratio (p-value) : 0.21 = 21 / 100

 

5. Decision : p-value >= alpha(0.05) : retain null hypothesis

 

 

 

댓글