Skip to content

A/B test gotchas

Correlation does not imply causation

  • Even your low p-value score on well-designed experiment does not imply causation!
    • It could still be random chance
    • Other factors could be at play
    • It's your duty to ensure business owners understand this

Novelty effects

  • Changes to a website will catch the attention of previous users who are used to the way it used to be
    • They might click on something simply because it's new
    • But this attention won't last forever
  • Good idea to re-run experiments much later and validate their impact
    • Often the "old" website will outperform the new one after a while, simply because it is a change

Seasonal effects

  • An experiment run over a short period of time may only be valid for that period of time
    • Example: Consumer behaviour near Christmas is very different that other times of year
    • An experiment run near christmas may not present behaviour during the rest of the year

Selection Bias

  • Sometimes your random selection of customers for A or B isn't really random
    • For example: assignments is based somehow on customer ID
    • But customers with low ID's are better customers than ones with high ID's
  • Run an A/A test periodically to check
  • Audit your segment assignment algorithms

Data Pollution

  • Are robots (both self-identified and malicious) affecting your experiment?
    • Good reason to measure conversion based on something that requires spending real money
  • More generally, are outliers skewing the result?

Attribution Errors

  • Often there are errors in how conversion is attributed to an experiment
  • Using a widely used A/B test platform can help mitigate that risk
    • If your is home-grown, it deserves auditing
  • Watch for "gray areas"
    • Are you counting purchases toward an experiment within some given time frame of exposure to it? Is that time too large?
    • Could other changes downstream from the change you ŗe measuring affect your results?
    • Are you running multiple experiments at once?