The 21 Most Frequently Asked Questions About A/B testing

Anders Nordling-Danils & Sofia Staaf

17 February, 2020

As conversion rate optimization experts we get a boatload of questions when it comes to A/B testing. Not surprising at all, done right, A/B testing is a very efficient method to drive growth in your company.

In this blog post, you’ll find the most common questions we get from our clients about A/B testing – and how to answer them.

What you will learn:

✅How A/B testing will help you grow your business
✅How to handle the most common A/B testing challenges
✅What to really avoid when doing A/B tests

Off we go!

1. Do we need to do A/B testing?

Yes, yes, YES! How else will you know if you are earning or losing money? Or what your customers truly care about? Harvard Business Review put it like this, and we totally agree.

Controlled experiments can transform decision making into a scientific, evidence-driven process—rather than an intuitive reaction

A/B testing is a great method to validate the effect of a certain change that you want to do, even when the effect is small. If you implement several changes to your website without testing them first, you might end up with a negative effect on your KPIs, without knowing what causes the negative effect.

Still sceptical? Maybe question number 11 can convince you.

2. How much traffic do you need to A/B test?

Let’s put it this way, you need to convert to be able to A/B test! But a good rule of thumb is a couple of hundred conversions per variant (variant = the page with the change you want to test).

Ways to check if you can A/B test

So before you start your test pre-calculate the needed sample size for each variation, meaning the number of visitors you need for your test to make the test results statistically significant. Use an A/B test sample size calculator to calculate the minimum sample size you need. Then when you run your test, don’t just stop your test as soon as you see significance – make sure to run it until you have reached that required sample size.

Use an A/B test sample size calculator to calculate the minimum sample size you need

There is a handful of calculators out there. Here are two good ones:

An easy-to-use A/B test sample size calculator from Optimizely
and a more customizable sample size calculator from CXL when you also want to calculate duration

Not enough conversions?

Don’t panic! There is other great ways to optimize your site and beat your competitors. This blog post will help you 🖖

A/B test sample size calculator from Optimizely

3. For how long should you run an A/B test?

The not so fun answer is – it depends. The most crucial factors are:

👥 Traffic volume
🎯 Goals
📅 Business cycle

In general, you need to run your A/B test for a minimum of 2 weeks and a maximum of 6 weeks. Shorter than 2 weeks might result in not reaching the minimum sample size or you stopping your test in the middle of your business cycle. That could lead to the test results being off and you will make decisions based on the wrong data.

In general, you need to run your A/B test for a minimum of 2 weeks and a maximum of 6 weeks

Running experiments longer than 6 weeks might have an increased risk of data pollution, meaning external and internal factors like campaign, holidays, cookie deletion etc. during part of the test period affecting your testing data.

Here are three guidelines to follow when it comes to testing duration:

Run your test until you have reached your minimum sample size, in order to get a statistically significant result.
Run your test for full weeks, if you start your test on a Tuesday, end your test on a Tuesday to rule out any effect from regular variation.
Run your test for complete business cycles, not stopping A/B tests in the middle of a cycle. Your online customers may not shop as soon as they enter your site, instead, they might visit your site several times before making a purchase, so make sure to run your test for at least one full business cycle.

Run your test for complete business cycles!

Use our own A/B test duration calculator to calculate your test duration before you start your test. Here is also a more customizable calculator.

A/B test duration calculator from Conversionista!

4. Why are we not stopping the A/B test after three days when we see it’s a loser?

Because what you see is not a representation of reality. It’s fake news! The algorithm in the A/B testing tool needs more data to be able to do a correct calculation.

It’s fake news!

So if you stop too early you have not fulfilled traffic volumes and the minimum amount of needed conversions. The recommendation is to run your test for a minimum of 2 weeks and full weeks, early on in an experiment you may randomly get more people who are willing to make a purchase in one group than in the other. With small sample sizes, you are a lot more likely to see a random result that does not reflect the truth. Acting on reliable results instead of random results is the whole point of doing experiments in the first place. Do not stop your test too early as soon as you see significance, there are other things to pay attention to as well.

Acting on reliable results instead of random results is the whole point of doing experiments in the first place

Also, run your test for a full business cycle even if you have reached the minimum sample size. Otherwise, you are risking taking a convenient sample and not a representative sample.

5. What is statistical significance and statistical power?

I thought you would never ask! So this is an important one, please concentrate and don’t stop reading, it will help you to understand one of the most central parts of A/B testing.

Statistical significance.
It is the likelihood that the change you are observing cannot be explained by chance alone.

It is based on the level of confidence, meaning that when you have determined a winner in your test you can be confident to for example 95% that the observed results are for real and not a result of random chance. But there is also a 5% risk that it is a random result not reflecting the truth, which is called type I error (false positive). You think your variation is a winner but in reality, it is not.

This is an important statistical fact you need to get your head around

Why is it important to understand statistical significance?

It’s an important statistical fact you need to get your head around to be able to understand what kind of risks are associated with A/B testing and how it could affect your business. So, when choosing a confidence level of 95% you should understand that even if your experiment has no effect, in 5 cases out of 100, you’d get a significant result due to random chance (a false positive).

Why 95% confidence level?

Because it’s a good balance between being able to detect winners and not declaring too many random positive results as winners (false positives).
Also, it’s a common confidence level used among behavioural scientists and A/B testing teams. So, it should work for your business as well 🙂

All right, puh, good job, almost done! Let’s move on to statistical power. Please concentrate again (and don’t stop reading).

Statistical power.
It is the likelihood that the change you are observing will be detected when there is one.

With a low level of statistical power, there is a higher chance of missing a positive change when it occurs, called type II (false negative). You conclude that your variation did not win over the original page when in reality it did. It often happens when tests are called too early without collecting enough data. By increasing the sample size of your test you will prevent type II error from happening.

Why is it important to understand statistical power?

Just like statistical significance, statistical power is a metric you need to understand to be able to validate the risks associated with your A/B testing and how it can affect your business. If your experiment has 80% statistical power, it means that the experiment has an 80% chance of finding a winner if there is one, but there is a 20% risk of not finding the winner. For example, we re-run the same winning experiment 5 times, 4 times (80%) it will show up as a winner and 1 time (20%) it will show up as inconclusive (not a winner). Important stuff 🤓

Why a target power of 80%?

Same answer as for statistical significance, it is a good risk balance and it’s a common level used among behavioural scientists and A/B testing teams.

There you have it. Statistical significance and statistical power are important, and at first sight not that easy to understand. But if you do you will understand a central mechanic behind A/B testing, which will increase your experiment quality radically!

6. When A/B testing, it’s a winner on mobile but loser on desktop. What should we do?

Congratulations! You have discovered a difference in user behaviour on mobile vs on desktop. Great insights 👍

You have two choices:

Make the change on mobile only
Do the change for all devices

If possible make the change on mobile only. User behaviour can be different on mobile and desktop. If it is not possible to make the change only on mobile you have to compromise (no, not again…). Calculate the risks and benefits based on traffic split per device and where most of the conversions are being made and make a decision. To be able to make a good compromise, it’s important you have a clear understanding of your goals. If you do, the compromise won’t be that difficult to do. If it’s difficult, it’s an indication you need to clarify and/or prioritize your goals.

7. It’s an experiment winner in some markets but a loser in others. What should we do?

This one is a little tricky due to many different scenarios. The short and “more right then wrong” answer is: If there is a small difference it’s probably only due to random chance (see the question above about statistical significance). Then you can go ahead and implement it for all markets. If there is a big difference in one of the markets your options are either 1) to explore further or 2) simply keep original in that market.

But I can’t keep the original version for just one market! 🤦

If that’s the case, you have to compromise again. Remember what we said about goals and implement what is best on average for all markets.

8. Can we run these two A/B tests at the same time or should we wait for the first one to finish?

As with many other (good) questions, it depends. For the majority of companies, the most important thing is to run a lot of experiments. So go ahead and run those two experiments.

For the majority of companies, the most important thing is to run a lot of experiments

There is a big BUT though. If they deal with the same KPI, key performance indicator, and are on the same part of the page it is, in most cases, better to wait to launch the second one. If you have the traffic you could run them at the same time and have the audience mutually excluded, meaning visitor 1 will only see test 1 and not test 2 and vice versa. Doing so you minimise the risk of unnatural user behaviour by being exposed to both tests, which will increase your chances of doing accurate inferences.

9. What A/B test tool do you recommend?

An important question of course! Without knowing A LOT more about your company it’s impossible to answer though. For example, what’s your current technical platform, how’s your overall traffic, what’s your budget, how many A/B tests are you performing today and what number of tests are you planning on performing? There are plenty of tools out there so make sure to do your research before making a decision.

Internal processes and level of expertise will affect the outcome of your A/B test program as much (if not more) as the tool you choose.

And don’t forget to ask yourself the (maybe most) important question; how are we going to work with A/B testing, within the company, going forward? Internal processes and level of expertise will affect the outcome of your A/B test program as much (if not more) as the tool you choose.

10. What KPI should I measure against when A/B testing?

You always want to keep track of your end goal, probably conversions. On top of that, you will also want to measure a KPI, key performance indicator, which is immediately affected by your test for example continuing to the next step in the purchase funnel or changes in sales for other products.

11. I want to improve CTR in the A/B test – why do you tell me to measure conversions?

Are you selling CTRs? Probably not. You want to optimize the KPI closest connected to your business goal, in this case, conversions, and keep track of your supporting metrics, CTR in this case.

Are you selling CTRs?

More traffic does not always equal more leads and/or sales. If you acquire more traffic and your conversion rate doesn’t change or it decreases, your cost per sale/acquisition will increase (other things being equal). And in most companies, that’s a bad thing.

12. Can’t we just deploy instead of A/B testing and measure afterwards?

Not if you want to be sure of the effect. Correlation is not causation. Metrics on your site will fluctuate a lot over time (may depend on seasonality, campaigns etc). So if you are seeing a major effect it may actually just be because of normal fluctuation and not because of the changes you made. So if you want to be sure that the change you made is the CAUSE of the effect, you have to do an experiment.

Correlation does not imply causation 👩‍🔬
Here are some fun examples when correlation has nothing to do with causation.

Why is relying on correlation not “good enough”?

Because two variables can have a strong correlation without having anything to do with each other. Not knowing the difference between correlation and causation can put you in a lot of trouble. So, don’t assume correlation gives you a good representation of reality. Causation does.

13. I want to A/B test our campaign, how do we test it?

I like your thinking!

A good start is to decide:

👉what in the creative asset you want to A/B test
👉where in the funnel you want to perform the experiment

For example, you can test two different visuals or headlines (not at the same time!) and see how it affects your conversion rate. Just double-check that the campaign is valid long enough for enough traffic to come in.

When it comes to deciding where in the funnel you want to perform the A/B test, focus on the step where your target group makes decisions that have the biggest effect on your campaign goal. It can be on adverts off-site or landing pages on site. And only do one test at a time to know what caused the change.

14. How many A/B tests should I run?

As many as you can! Velocity is one of the keys to succeed when doing A/B testing. When the investment in running the experiments is higher than the return, that is a breakpoint (we have never seen that happen). Continuously run experiments where your visitors make decisions that affect your business goals.

Velocity is one of the keys to succeed when doing A/B testing

There are different strategies depending on the maturity of your company. If you are in the A/B testing starting blocks, there are some easy first steps and for companies focusing on increased velocity, there are other strategies.

15. Can I A/B test three different changes in one experiment? (Isolation vs bundling)

Yes and no 🙃 If the different changes have the same expected behavioral change, you can. Otherwise, isolate and test one thing at a time to be sure knowing what caused the change. Isolating changes may give you more insights into the effect of each change.

16. When should I run an A/B/C test?

When you have A LOT of traffic. The benefit of doing an A/B/C test is that you get to test more variations at once but A/B/C tests require much more traffic than an A/B test to reach a statistically significant result and that will take more time. Calculate with an A/B test sample size calculator if you have enough traffic to do an A/B/C test.

A/B/C tests require much more traffic than an A/B test to reach a statistically significant result

Also – complicated A/B tests run the risk of complicated analysis. Try to keep it simple, test one hypothesis at a time, and change only as much as is needed to test the hypothesis. Rather do a series of experiments, build insights and build on the results of previous experiments. So in most cases run A/B tests and you will get results faster.

17. How do I know that the A/B test result will stay the same over time? Can’t people change and not react the same to our changes after a while?

You can’t know that, but it’s the best way to make a decision based on evidence. The real customer journey is messy and ever-changing. You need to continuously validate your customer insights to keep them up to date by doing a lot of experiments. If you have many returning users it might make sense to have a control group as a benchmark or do a retest later on. But in most cases, keeping up the velocity of experiments will be more important.

18. Where can I run A/B tests?

It’s up to you and your imagination! You can do them wherever you want, as long as you can measure it and divide your target group into random samples of control vs variant. For example app, email, banners… coffee machines ☕️Start with a well-defined hypothesis based on data and you’re good to go.

What is a hypothesis?

A hypothesis is a structured idea that tells you three things; where the idea came from, how the idea should work and what the intended result from the idea is. Here is a good A/B test hypothesis generator to get you started.

….app, email, banners and coffee machines

Experiment Hypothesis Generator from Conversionista!

19. Should we initially launch the A/B test with a lower % of traffic routed to the variant?

That. Is. Not. Recommended 🙅‍♀️ You should test the same size groups (%), otherwise, you are risking uneven segment distribution on the samples. This will skew the results. What you can do, for quality assurance purposes, is to run the whole experiment to a lower % of the traffic, but still 50/50 between the control and the variant.

20. What should we do if an A/B test is insignificant? What factors do we consider for different cases?

If you are running a test with the goal of increasing conversion and the result is insignificant, document the learning but don’t implement it. But if the goal is to validate, making sure a certain change doesn’t hurt conversion, an insignificant test can be implemented.

Is it safe to start selling plants?

An example. If a furniture e-commerce site wants to start selling plants, will this affect furniture sales negatively? Instead of launching it for all customers they can do an A/B test (a validation test) with the variation promoting plants. If there is no decrease in furniture sales in the variation, it should be safe to start selling plants.

21. How do we know what to A/B test first and/or prioritize?

Do one of our favorite exercises, prioritize! Whichever A/B test has the highest potential and the lowest effort to test is the one you should start with.

But first…
…you have to understand “why” you are doing the experiment, how will the result affect your business? Will it increase lead volume, revenue from online sales or some other business crucial goal? When your business goal is set, you quantify it and define what result you expect from the experiment (your experiment goal). The last step is to do an estimation of how difficult the test will be to implement (and maybe also how difficult the change will be to implement if you get a winner). Now you have a good understanding of which test has the highest potential and lowest effort to test – start with these!

Try the PIE model to start prioritize!

Tricky in reality? Try the PIE model! It is a useful tool that gives you and your team members a framework to discuss and decide how the project will affect business goals.

In summary

Hopefully, this answers some of your questions regarding A/B testing and helps you get your company’s A/B testing program up to speed!

Don’t know how to start?

Give us a call, send us an email or fill out a (short) form and we’ll get back to you 🙂

Happy testing!

1share
0
0

1share
0
0

Read also

Reconnect – Rethink – Relaunch: What our CJAM speakers say

CONVERSION Events Growth