A/B Testing Basics

What is A/B testing?

A/B testing compares two variants:

A = control
B = treatment

You measure a metric (conversion, revenue, retention) and test whether B improves it.

Choose the right metric

Primary metric (one)
Guardrail metrics (to prevent harm)

Examples:

Primary: conversion rate
Guardrail: refund rate, latency

Example: compare conversion rates

Two-proportion z-test (approx)

import numpy as np
 
# A: 120 conversions out of 4000
x1, n1 = 120, 4000
# B: 150 conversions out of 4100
x2, n2 = 150, 4100
 
p1 = x1 / n1
p2 = x2 / n2
 
p_pool = (x1 + x2) / (n1 + n2)
se = np.sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
 
z = (p2 - p1) / se
 
# two-sided p-value using normal approximation
from math import erf, sqrt
 
def norm_cdf(z):
    return 0.5 * (1 + erf(z / sqrt(2)))
 
p_value = 2 * (1 - norm_cdf(abs(z)))
 
print("p1:", p1)
print("p2:", p2)
print("z:", z)
print("p-value:", p_value)

Two-proportion z-test (approx)

import numpy as np
 
# A: 120 conversions out of 4000
x1, n1 = 120, 4000
# B: 150 conversions out of 4100
x2, n2 = 150, 4100
 
p1 = x1 / n1
p2 = x2 / n2
 
p_pool = (x1 + x2) / (n1 + n2)
se = np.sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
 
z = (p2 - p1) / se
 
# two-sided p-value using normal approximation
from math import erf, sqrt
 
def norm_cdf(z):
    return 0.5 * (1 + erf(z / sqrt(2)))
 
p_value = 2 * (1 - norm_cdf(abs(z)))
 
print("p1:", p1)
print("p2:", p2)
print("z:", z)
print("p-value:", p_value)

Practical guidance

Run long enough to cover weekly seasonality.
Don’t peek too often (increases false positives).
Report effect size + CI.

If this helped you, consider buying me a coffee ☕

Buy me a coffee

A/B Testing Basics

What is A/B testing?

Choose the right metric

Example: compare conversion rates

Practical guidance

Was this page helpful?