Editor’s Note: For more information on experiment design, check out “Beyond A vs. B: How to get better results with better experiment design“, published on March 21, 2017.
Conversion Optimization using A/B testing has become standard practice at virtually every high-traffic online business today. But, not every company is achieving the best results. Most are far from reaching the 8-times ROI that the average WiderFunnel client experiences.
So, why do some companies get great A/B test results while others end up disappointed?
If your results are disappointing, it may not only be what you are testing – it is definitely how you are testing. While there are several factors for success, one of the most important to consider is Design of Experiments (DOE). DOE is rooted within statistical theory and defines how experiments are planned and analyzed.
Be warned. This post will get a bit technical, but this is an important part of any high-performance conversion optimization practice — and it’s one of main reasons WiderFunnel’s Infinity Optimization Process continually outperforms other agencies.
What is statistical theory and how can it help you?
The best conversion optimization strategies are developed from statistical theory. Statistical theory was first established for academia. Professors, researchers and PhD students needed a way to prove their research was believable. Methods, or rules, for data collection, experiment creation, and data analysis had to be defined and regulated. Researchers began using proven methods for their experimentation to ensure their results and findings would be defensible from the rest of the community in their field.
If you follow statistical theory for your conversion optimization, you will have centuries of academia behind you to ensure your data, and the insights that will influence the rest of your marketing decisions, are accurate and statistically significant.
Statistical theory can be broken down into different processes for experiment creation, data collection, and results analysis. Each process ensures the data collected is valid, but the collection method, or the steps required, can vary. DOE is the method we use most often at WiderFunnel. I’d like to share it with you in more detail.
Design of experiments
DOE is a process — really a set of rules — to follow when designing, implementing and analyzing an experiment. With DOE, there are a variety of ways you can set up an experiment, including: sampling, probability, regression analysis, multivariate testing, and factorial design.
Having learned from tens of thousands of tests run for our clients, we can share with you how to use DOE the best way. And more specifically, how we use two methods of DOE: Multivariate Testing (MVT), and Factorial Design.
MVT & factorial design
MVT is a set of rules for collecting data. The two most notable rules are:
1. You can change as many elements within a single variation that you would like.
2. You MUST test each possible combination of the changes you made. For example if you test three title options and two image options, you will need to create 6 variations, so your test combines each title option with each icon option:
- Variation 1: Title 1 x Image 1
- Variation 2: Title 2 x Image 1
- Variation 3: Title 3 x Image 1
- Variation 4: Title 1 x Image 2
- Variation 5: Title 2 x Image 2
- Variation 6: Title 3 x Image 2
With MVT, the number of variations you will need to deploy can easily get out of hand. The more variations you test, the more your traffic will be split while testing, and the longer it will take for your tests to reach statistical significance. Many companies simply can’t follow the principles of MVT because they don’t have enough traffic.
Factorial Design is another method of Design of Experiments. Similar to MVT, factorial design allows you to test more than one element change within the same variation. The greatest difference is that factorial design doesn’t force you to test every possible combination of changes.
Compared to the MVT example shown above, instead of creating a variation for every combination of changed elements, you can design the test to focus on specific isolations that you hypothesize will have the biggest impact or drives insights. In this case (hypothetically) we just want to isolate two titles and two specific images:
- Variation 1: Title 1 x Image 1
- Variation 2: Title 1 x Image 2
- Variation 3: Title 2 x Image 2
When you don’t have a lot of traffic or a high conversion rate, you need to test major changes – changes that you believe will have the greatest impact – to see the difference in conversion rates in a reasonable amount of time. There’s nothing like waiting months for tests to reach significance while your organizational momentum erodes. (Note: here’s more on how to create organizational momentum.) Two-level factorial design achieves positive ROI more quickly than MVT.
Getting started with factorial design
Start by making a cluster of changes in one variation (producing variations that are significantly different from the control), and then isolate these changes within subsequent variations (to identify the elements that are having the greatest impact). This hybrid test, using both “variable cluster” with “isolation” variations gives you the best of both worlds: radical change options and the ability to gain insights from the results.
Ensuring consistent, great results
The companies that gain continual conversion wins are using DOE, rooted soundly in academic theory, so their results, analyses, and business decisions are valid and defensible.
Using MVT and Factorial Design ensures that your experiment, data, and analysis are valid. The effect snowballs, as the decisions and insights you take from one valid test can influence multiple experiments, and conversion wins down the line.
What do you think?
Do you use DOE to plan your test? Have experience with MVT or factorial design? Have a question for Alhan or just want to share your thoughts?
Grab your complete copy of the new “State of Experimentation Maturity 2018” research report
What makes some organizations so successful when it comes to experimentation? This new 45-page report provides benchmarks for stages of experimentation maturity at leading North American brands.Get Report