How to A/B test for long-term success (don’t underestimate insights!)

Reading Time: 6 minutes

Imagine you’re a factory manager.

You’re under pressure from your new boss to produce big results this quarter. (Results were underwhelming last quarter). You have a good team with high-end equipment, and can meet her demands if you ramp up your production speed over the coming months.

Production

You’re eager to impress her and you know if you reduce the time you spend on machine maintenance you can make up for the lacklustre results from last quarter.

Flash forward: The end of the Q3 rolls around, and you’ve met your output goals! You were able to meet your production levels by continuing to run the equipment during scheduled down-time periods. You’ve achieved numbers that impress your boss…

…but in order to maintain this level of output you will have to continue to sacrifice maintenance.

In Q4, disaster strikes! One of your 3 machines breaks down leaving you with zero output, and no way to move the needle forward for your department. Your boss gets on your back for your lack of foresight, and eventually your job is given to the young hot-shot on your team and you are left searching for a new gig.

A sad turn of events, right? Many people would label this a familiar tale of poor management (and correctly so!). Yet, when it comes to conversion optimization, there are many companies making the same mistake.

Optimizers are so often under pressure to satisfy the speed side of the equation that they are sacrificing its equally important counterpart…

Insights.

Consider the following graphic.

Growth-insights-spectrum
The spectrum ranges from straight forward growth-driving A/B tests, to multivariate insight-driving tests.

If you’ve got Amazon-level traffic and proper Design of Experiments (DOE), you may not have to choose between growth and insights. But in smaller organizations this can be a zero-sum equation. If you want fast wins, you sacrifice insights, and if you want insights, you may have to sacrifice a win or two.

Sustainable, optimal progress for any organization will fall somewhere in the middle. Companies often put so much emphasis on reaching certain testing velocities that they shoot themselves in the foot for long-term success.

Maximum velocity does not equal maximum impact

Sacrificing insights in the short-term may lead to higher testing output this quarter, but it will leave you at a roadblock later. (Sound familiar?) One 10% win without insights may turn heads your direction now, but a test that delivers insights can turn into five 10% wins down the line. It’s similar to the compounding effect: collecting insights now can mean massive payouts over time.

As with factory production, the key to sustainable output is to find a balance between short-term (maximum testing speed) and long-term (data collection/insights).

Growth vs. Insights

Christopher Columbus had an exploration mindset.

He set sail looking to find a better trade-route to India. He had no expectation of what that was going to look like, but he was open to anything he discovered and his sense of adventure rewarded him with what is likely the largest geographical discovery in History.

insight-driving-mindset
Have a Christopher Columbus mindset: test in pursuit of unforeseeable insights.

Exploration often leads to the biggest discoveries. Yet this is not what most companies are doing when it comes to conversion optimization. Why not?

Organizations tend to view testing solely as a growth-driving process— a way of settling long-term discussions between two firmly held opinions. No doubt growth is an important part of testing, but you can’t overlook exploration.

This is the testing that will propel your business forward and lead to the kind of conversion rate lift you keep reading about in case studies. Those companies aren’t achieving that level of lift on their first try; it’s typically the result of a series of insight-driving experiments that help the tester land on the big insight.

At WiderFunnel we classify A/B tests into two buckets: growth-driving and insight-driving…and we consider them equally important!

Growth-driving experiments (Case study here)

During our partnership with Annie Selke, a retailer of home-ware goods, we ran a test featuring a round of insight-driving variations. We were testing different sections on the product category page for sensitivity: Were users sensitive to changes to the left-hand filter? How might users respond to new ‘Sort By’ functionality?

Insight-driving-test
Round I of testing for Annie Selke: Note the left-hand filter and ‘Sort By’ functionality.

Neither of our variations led to a conversion rate lift. In fact, both lost to the Control page. But the results of this first round of testing revealed key, actionable insights ― namely that the changes we had made to the left-hand filter might actually be worth significant lift, had they not been negatively impacted by other changes.

We took these insights and, combined with supplementary heatmap data, we designed a follow-up experiment. We knew exactly what to test and we knew what the projected lift would be. And we were right. In the end, we turned insights into results, getting a 23.6% lift in conversion rate for Annie Selke.

In Round II of testing, we reverted to the original 'Sort By' functionality.
In Round II of testing, we reverted to the original ‘Sort By’ functionality.

For more on the testing we did with Annie Selke, you should read this post >> “A-ha! Isolations turn a losing experiment into a winner

This follow-up test is what we call a growth-driving experiment. We were armed with compelling evidence and we had a strong hypothesis which proved to be true.

But as any optimizer knows, it can be tough to gather compelling evidence to inform every hypothesis. And this is where a tester must be brave and turn their attention to exploration. Be like Christopher.

Insight-driving experiments

The initial round of testing we did for Annie Selke, where we were looking for sensitivities, is a perfect example of an insight-driving experiment. In insight-driving experiments, the primary purpose of your test is to answer a question, and lifting conversion rates is a secondary goal.

This doesn’t mean that the two cannot go hand-in-hand. They can. But when you’re conducting insight-driving experiments, you should be asking “Did we learn what we wanted to?” before asking “What was the lift?”. This is your factory down-time, the time during which you restock the cupboard with ideas, and put those ideas into your testing piggy-bank.

We’ve seen entire organizations get totally caught up on the question “How is this test going to move the needle?”

But here’s the kicker: Often the right answer is “It’s not.”

At least not right away. This type of testing has a different purpose. With insight-driving experiments, you’re setting out on a quest for your unicorn insight.

unicorn insight
What’s your unicorn insight?

These are the ideas that aren’t applicable to any other business. You can’t borrow them from industry-leading websites, and they’re not ideas a competitor can steal.

Your unicorn insight is unique to your business. It could be finding that magic word that helps users convert all over your site, or discovering that key value proposition that keeps customers coming back. Every business has a unicorn insight, but you are not going to find it by testing in your regular wheelhouse. It’s important to think differently, and approach problem solving in new ways.

We sometimes run a test for our clients where we take the homepage and isolate, removing every section of that page individually. Are we expecting this test to deliver a big lift? Nope, but we are expecting this test to teach us something.

We know that this is the fastest possible way to answer the question “What do users care about most on this page?” After this type of experiment, we suddenly have a lot of answers to our questions.

That’s right: no lift, but we have insights and clear next steps. We can then rank the importance of every element on the page and start to leverage the things that seem to be important to users on the homepage on other areas of a site. Does this sound like a losing test to you?

Rather than guessing at what we think users are going to respond to best, we run an insight-driving test and let the users give us the insights that can then be applied all over a site.

The key is to manage your expectations during a test like this. This variation won’t be your homepage for eternity. Rather, it should be considered a temporary experiment to generate learning for your business. By definition it is an experiment.

Optimization is an infinite process, and what your page looks like today is not what it will look like in a few months.

Proper Design of Experiments (DOE)

It’s important to note that these experimental categories do have grey lines. With proper DOE and high enough traffic levels, both growth-driving and insight-driving strategies can be executed simultaneously. This is what we call “Factorial Design”.

Factorial design
Factorial design allows you to test with both growth and insights in mind.

Factorial design allows you to test more than one element change within the same experiment, without forcing you to test every possible combination of changes.

Rather than creating a variation for every combination of changed elements (as you would with multivariate testing), you can design a test to focus on specific isolations that you hypothesize will have the biggest impact or drive insights.

How to get started with Factorial Design

Start by making a cluster of changes in one variation (producing variations that are significantly different from the control), and then isolate these changes within subsequent variations (to identify the elements that are having the greatest impact). This hybrid test, using both “variable cluster” with “isolation” variations gives you the best of both worlds: radical change options and the ability to gain insights from the results.

For more on proper Design of Experiments, you should read this post >> “Design your A/B tests to get consistently better results

We see Optimization Managers make the same mistakes over and over again, discounting the future for results today. If you overlook testing “down-time” (those insight-driving experiments), you’ll prevent your testing program from reaching its full potential.

You wouldn’t run a factory without down-time, you don’t collect a paycheck without saving for the future, so why would you run a testing program without investing in insight exploration?

Rather, find the balance between speed and insights with proper factorial design that promises growth now as well as in the future.

How do you ensure your optimization program is testing for both growth and insights? Let us know in the comments!

Enjoy this post? Share with your friends and colleagues:

Michael St Laurent

Optimization Strategist

Michael ensures WiderFunnel delivers the most accurate experiments in the shortest time possible. He strives to deliver A-ha moments for clients everyday, and isn't satisfied until there are no more questions left to be answered.