Search the phrase A/B testing, and you’ll find a plethora of ads, articles, how-to guides and long-tail queries to round out your results. Needless to say, unless you’re in the market to buy SaaS, finding trusted sources of information to beef up your experimentation skills and help you stand out in your organization requires nothing short of master detective work.
When it comes to A/B testing, best practices are a difficult thing to define. You will get many different answers about best practices depending on the expert you ask. Among the sea of figureheads, you will find prolific influencers who have shaped the experimentation industry, like Ronny Kohavi, Maria Stone, and our very own Chris Goward. These accomplished individuals should have a place in your roadmap on learning how to test with confidence and on creating a culture of experimentation.
One of the most prevalent testing methodologies, A/B test results have helped inform many crucial business decisions. But A/B tests are only as helpful as the skill of the practitioner who is executing them. A/B testing can be superficial and frustrating and can sometimes even produce unreliable results.
“If your results are disappointing, it may not only be what you are testing – it is definitely how you are testing. While there are several factors for success, one of the most important to consider is Design of Experiments (DOE).”
– Alhan Keser, AI Product Manager
If you are a marketing professional who has been disappointed with your A/B testing results or are about to launch a new program, this article is for you. I’m James Flory, Director of Experimentation Strategy at Widerfunnel, and here’s my list of six best practices on navigating the pitfalls of A/B testing to help you increase your chances of success.
1. Pitfall: Not planning your optimization roadmap
Creating a roadmap for your A/B testing program is a crucial first step. It involves identifying and prioritizing the primary objectives you want your experimentation program to achieve. A roadmap will outline the benefits of the program for your business and make it easier for you to secure stakeholder buy-in, build a strong business case to secure more budget and achieve program visibility. It is also the foundation for an effective strategic testing schedule.
Unfortunately, many marketing professionals do not take the time to create a roadmap that will set themselves up for success. This absence of a roadmap makes it more challenging to make the business case for your program and will result in wasted resources, time and opportunity costs. Furthermore, if your tests are not aligned with business objectives, you will not be able to retain or grow your budget or meet short-term objectives. As a result, you risk working on things the business doesn’t care about.
Not having a roadmap can also hamper your ability to maximize the use of your traffic and resources. For example, you may accidentally end up with a backlog of homepage ideas that need to run one after another. This bottleneck can result in a high-stakes waiting game instead of using other testable areas of your site simultaneously. Alternatively, you may end up running multiple tests at the same time, unaware of any potential interaction effects between your simultaneous experiments, resulting in questionable results.
Liftmap helps experimentation and CRO managers scale their testing programs. Start today to prioritize ideas, improve collaboration, report on program results, share insights company-wide & more!
Widerfunnel’s Liftmap Program Management App
2. Pitfall: Testing too many elements together
The more precise you are when testing elements, the clearer and more insightful your results will be. Marketing professionals frequently make the mistake of testing a cluster of several elements at once. For example, it may seem efficient to test the copy and button placement at the same time. But while this may result in a win and a positive increase in your desired metrics, you’ll be stuck guessing what really caused that change—was it the new copy that increased conversions, or the button placement, or both? You’ll only have correlations and be much further away from causation than if you’d isolated each change. In short, you can not gain any real insight from these types of tests. And without clear insights on what is working, you will not be able to apply your findings to other areas, exponentially limiting further impacts.
This is equally important for losing experiments. if an experiment with multiple changes results in a significant loss—understanding why it lost is crucial to turning that loss into an insight. This insight can result in a winning test down the line. Lastly, testing multiple elements together allows other stakeholders to challenge your results and ask you questions you may be ill-equipped to answer such as “why did this happen” or “what do we do next”?
3. Pitfall: Ignoring statistical significance
If statistics is not one of your strengths, this common A/B testing pitfall is a bit more challenging to avoid. At Widerfunnel, we recommend that you leverage your testing tool capabilities to their full potential and follow some simple rules such as doing pre-test calculations, letting tests run their course, and setting realistic expectations. Statistical significance is a key component of successful testing, and if you’re ignoring it, you’re essentially just guessing with extra (and more expensive) steps!
Pre-test calculations are often overlooked but are critical at ensuring you’ve got enough traffic, conversions, and expected effect from your variant(s) to hit an acceptable level of statistical significance in a reasonable amount of time. Failing to do this first step in sizing experiments most commonly results in a high number of inconclusive results and disappointed stakeholders. Additionally, once you’ve sized your test using pre-test calculations and know your expected run times and minimum detectable effects, let the test run its course! Too often marketers get scared by early fluctuations in low-sample metrics and turn their tests off for fear of business impact. While it’s good to keep an eye out for any potentially serious negative business impacts, you may be doing more harm than good by pausing your tests early due to the peeking problem and regression to the mean – topics fit for their own blog posts.
Ultimately, statistics are hard and not very intuitive. Even statisticians have a hard time explaining p-values! You don’t want to completely invalidate your effort by failing to properly measure your results using valid statistical methods. Results that you can not be confident in or false positives are not only major time wasters, but can lead you to make incorrect and costly business decisions. If you’re not measuring your results accurately, then why test?
4. Pitfall: Using unbalanced traffic
When running experiments, you have to choose how much of your traffic to let into your test. The most common approach is to split your traffic 50/50 which means exposing 50% of your traffic to the control and 50% to your variant. Sometimes, in an effort to mitigate risk, experimenters will opt to split their traffic unevenly such as 80/20 or even 90/10 – meaning 90% of users are getting into the control and only 10% into the variant (or vice-versa).
This approach has two primary drawbacks. First, the comparison between the data becomes very unreliable due to Simpson’s paradox caused by the uneven sample sizes between the two experiences. The experience with the lower portion of traffic will always have less reliable (and less accurate) results than the experience with the larger portion of traffic. Secondly, there is another common mistake of changing the traffic allocation mid-test which is almost guaranteed to invalidate your results for a myriad of reasons.
If your goal is mitigate risk by reducing the number of users exposed to your variant, keep your split even (50/50) but limit the percentage of your overall traffic exposed to the test at the outset. Meaning, allow 10% of your total users to ever enter the test but split the users entering your test 50/50 only. This will avoid data pollution issues and enable you to slowly increase the traffic exposed to the test over time as you reevaluate the risk, while keeping the samples of both the control and variant fair and even.
Failure to use balanced traffic distribution can have both short-term and long-term impacts. Not only can it invalidate numbers but it can result in the loss of internal trust and buy-in.
5. Pitfall: Failing to follow an iterative process
It is absolutely critical to follow an iterative process with your A/B experimentation. Don’t just run a test and leave it. Take the time to understand why it won or lost, and apply those learnings to successive tests. If the test is lost, can you apply the inverse and make it a winner? If it won, where else can you apply the learnings? Without doing this, you’re just throwing spaghetti at the wall and hoping it sticks. Typically, marketing professionals who don’t follow an iterative process also fail to create a roadmap and set themselves up for success. Iterative improvements have a snowball effect, producing a cumulative impact of rich insights compared to disconnected tests that only leave isolated insights trapped in bubbles.
Ideas that are not based on insights can lead to fruitless and repetitive testing which is wasteful, risky and less impactful. However, through a successful connecting of the dots, you can maximise impactful testing, which allows you to look back and follow threads of insights to build new tests. Building on learning makes you look smart and adaptable and shows you have a process and system in place.
We’ve developed the world’s best proprietary optimization process – The Infinity Experimentation Process
6. Pitfall: Failing to consider external factors
During your planning and analysis, remember to take into account external factors that can influence test results. Think of seasonality, business cycles or external urgency such as Black Friday. External factors that create super-motivated customers can inaccurately amplify or dampen your test results. If people are motivated, they may purchase no matter what, meaning they’re less responsive to tactics used in your experiments, and more tolerant of poor user experiences. If you run a test without accounting for external factors, you could end up creating a massive change in user behaviour that won’t persist once the external factor is gone.
Inversely, you may test something that has no impact during a certain period due to an external factor, but would have worked very well under normal circumstances. Be aware of how your hypotheses and experiment cycles interact with the external environment beyond your control and adapt accordingly.
The big lesson
If there is one overriding lesson from the major pitfalls that can sabotage marketing professionals’ efforts when running their own A/B program, it is this: Achieving success in gaining robust business benefits from A/B experimentation requires combining technology and methodology with the power of human expertise and strategic thinking. This is the winning combination that separates run of the mill A/B testing from programs that translate into solid business solutions and increased profits.
Subscribe to get experimentation insights straight to your inbox.