How to ... Run a Proper A/B Test

So you'll know a clear winner and loser

Everyone's all about the A/B test these days, whether it's a landing page, an email, or a piece of direct mail. And the concept is quite simple: pit two different creatives against each other and see which one wins. But is it REALLY that simple? Turns out, if you don't have your measurement plan set up correctly in the beginning, your results will tell you nothing substantial and the only thing you'll get is wasted time and money.

What are you testing?

Before you begin your A/B test, make sure that the marketing aspect you are testing is highly differentiated. That is to say, the aspect you are testing is both clear and unique to the peice. Let's go back to my post about Optimizing Digital Ads and use the awesome graphics for a visual reference:

A

B

C

Here we see an A/B/C test. The difference between A and B is clear and unique - the test is about the color of the button. However, if we look at the difference between A/B and C, the difference is clear but not unique - the test compares a functional shopping cart image against satiric pop culture. The point is, if C performs better than A or B, you can't specifically say WHY. Was it the humor? Was it the pop culture reference? Was is just more visually interesting?

Herein lies the rub - If you can't tell the definitive difference in your A/B test creatives, you can't determine the reason for a winner.

Based on this principle, I have put together a reference list on valid elements to test by channel. It's at the bottom of the post for you to download. And it's free. Yay!

What does the winner look like?

Now that we know WHAT we're going to A/B test, we need to see HOW we're going to test and measure. Imagine that - I'm talking about measuring stuff again!

If you have a competition of any kind, there is a score that determines the winner, like race time or touchdowns. Your A/B test has to have a score that determines the winner as well. You can use the same reference list to see valid measurements for test elements, but I'll give you a couple examples here:

  • If you're testing subject lines on an email, the resulting measurement would be open rate. Subject line impacts opens.
  • If you're testing landing pages, the resulting measurement is conversion rate. The effectiveness of the landing page impact it's ability to get a customer to convert, ie do what you want them to do.

Without a valid measurement, there is no A/B test. Period.

How can you prove a winner?

In math, and specifically statistics, you need to have a good sample size to measure success against non-success. This basically means you have enough numbers to eliminate any crazy behavior. For example, imagine you have a class of 10-year-old kids and you're trying to determine how many kids like pepperoni pizza. There are 20 students in the class and you take a simple poll: Do you like pepperoni pizza? Yes/No. The results come back that 12 students DO NOT like pepperoni pizza and 8 students DO like pepperoni pizza. Does that seem right? What if I told you that 12 of the students in this class are muslim, and therefore do not eat pork. Well, then you have some severe issues with your sample and your results are not good enough to mean anything. BUT, if you did the same survey to 10-year-olds in your state, the results would be much closer to the true average and therefore meaningful.

"So how do I know what my sample size SHOULD be?" you're probably thinking. Great news! There's a bunch of calculators out there that can do the magic math for you. All you need to know are the following answers:

  • What is my previous conversion rate for this channel/item?
  • How big of a difference do I need? (Generally 20% is a safe bet; Low numbers mean a smaller difference and larger sample size.)
  • How accurate do I want my test results to be? (90, 95, and 99% are your choices. Don't go below those.)

Now let's go use a great calculator to determine my sample size for an email subject line I want to test.

  • My previous email open rate has been around 20%
  • I want a 20% change in opens
  • I want to be 95% accurate

Here's a sample size calculator from Optimizely

The calculator tells me I have to have at LEAST 1,100 email addresses in both of the A and B lists in order to accurately determine a winner.

Math yourself a winner

Finally, once the test is complete you need to be able to determine the winner. Again, we have a fun calculator (is that an oxymoron?) that will help us determine the winner.

Let's say that my email list was 10,000, and I pulled out 1,100 for my subject line test. The remaining 8,900 got my regular subject line, and now my results are in. My new subject line got 250 opens while my regular one got 1,675. Now I have:

A: Test Subject Line: 205/1,100 = 23%
B: Regular Subject Line: 1,675/8,900 = 19%

By entering those numbers into my A/B test calculator, I get that version A did 21% better than B with 100% certainty. (Any certainty that is above 90% means it's significant, or can be reproduced for the same results.) Done!

Here's an A/B test calculator from KissMetrics

All of the information presented here can by applied to direct mail, SEM, display ads, emails, landing pages... you name it! As long as you have the three core principles down - making the test element clear and unique, setting a winning measurment, and using a valid sample size - you're on the path to success!

Happy Analyzing!

As promised.... My Awesome Reference List