How to be a heavy hitter in enterprise e-commerce CRO

9 min. read | Last updated: December 7th, 2017

There was a time when simply launching an A/B test was a big deal.

I remember my first test. It was a lead gen form. I completely redesigned it. I learned nothing. And it felt like I was on top of the world.

Today, things are different, especially if you’re a major e-commerce company doing high-volume conversion optimization in a team setting. The demands have shifted; the expectations are far greater. New tools are being created to solve new problems.

So what does it take to own enterprise e-commerce CRO in 2016 compared to before?

Make money during A/B tests

While “always be testing” is a great mantra, I have to ask, “is you ‘always be banking?’”

Most of us have been running tests that inform us first, and make money later. For example, you might run a test where you’ve got a clear winner, but it’s one of 5 other variations, so you’re only benefiting from it 20% of the time during the length of the experiment.

Furthermore, you may have 4 variations that are underperforming versus your Control, so you could even be losing money while you test. Imagine spending an entire year testing in that manner. You’d rarely be fully benefiting from your positive test results!

Of course, as part of a controlled experiment and in order to generate valid insights, it’s important to distribute traffic evenly and fairly between all variations (across multiple days of the week, etc).

But there also comes a time to be opportunistic.

Enter the multi-arm bandit (MAB) approach. MAB is an automated testing mechanism that diverts more traffic to better performing variations. Thresholds can be set to control how much better a variation has to perform before it is favored by the mechanism.

Hold your horses: MAB sounds amazing, but it is not the solution to all of your problems. It’s best reserved for times when the potential revenue gains outweigh the potential insights to be gained or the test has little long-term value.

Say, for example, you’re running a pre-Labor Day promotion and you’ve got a site-wide banner. This banner’s only going to be around for 5-10 days before you switch to the next holiday. So really, you just want to make the most of the opportunity and not think about it again until next year.

A bandit algorithm applied to an A/B test of your banner will help you find the best performer during the period of the experiment, and help generate the most revenue during the testing period.

While you may not be able to infer too many insights from the experiment, you should be able to generate more revenue than had you either not tested at all or gone with a traditional, even split test.
multi-armed-bandit-algorithm-ab-testing

  • BEFORE: Test, analyze results, decide, implement, make money later.
  • TODAY: Test and make money while you’re at it.
  • When to do it: Best used in cases where what you learn is not that useful for the future.
  • When not to do it: Not necessarily the most useful for long-term testing programs.

Track long-term revenue gains

If you’ve been testing over the course of many months and years, accurately tracking and reporting your cumulative gains can become a serious challenge.

You’re most likely testing across different zones of your website – homepage, category page, product detail page, site-wide, checkout, etc. Multiply those zones by the number of viewport ranges you’re specifically testing on.

What do you do, sum up each individual increase and project out over the course of a year? Do you create an equation to calculate the combined effect of all of your tests? Do you avoid trying to report at all?

There isn’t one good solution, but rather a few options that all have their strengths and weaknesses:

The first, and easiest, is using a formula to determine combined results. You’ll want a strong mathematician to help you with this one. Personally, I always have a lingering doubt that none of what is being reported is accurate, even using conservative estimations. And as time goes on, things only get less accurate.

The second is to periodically re-test your original Control from the moment at which you started testing. Say, every 6 months, test your best performing variation against the Control you had 6 months prior. If you’ve been testing across the funnel, test the entire funnel in one experiment.

Yes, it will be difficult. Yes, your developers will hate you. And yes, you will be able to prove the value of your work in a very confident manner.

It’s best to run these sorts of tests with a duplicate of each variation (2 “old” Controls vs 2 best performers) just to add an extra layer of certainty when you look at your results. It goes without saying that you should run these experiments for as long as reasonably possible.

Another option is to always be testing your “original” Control vs your most recent best performer in a side experiment. Take 10% of your total traffic and segment it to a constantly running experiment that pits the original control version of your site against your latest best performer.

It’s an experiment running in the background, not affected by what you are currently testing. It should serve as a constant benchmark to calculate the total effect of all your tests, combined.

Technically, this will be a challenge. You’ll be asking a lot of your developers and your analytics people, and at one point, you may ask yourself if it’s all worth it. But in the end, you will have some awesome reports to show, demonstrating the ridiculous revenue you’ve generated through CRO.
Doubled revenue

  • BEFORE: Individual test gains, cumulated.
  • TODAY: Taking into consideration interaction effects, re-running Control vs combined new variations OR using a model to predict combined effect of tests.
  • When to do it: When you want to better estimate the combined effect of multiple testing wins.
  • When not to do it: When your tests are highly seasonal and can’t be combined OR when it becomes impossible from a technical perspective (hence the importance of doing so in a reasonable time frame—don’t wait 2 years to do it).

Track and distribute cumulative insights

If you do this right, you will learn a ton about your customers and how to increase your revenue in the future. Ideally, you should have a goody-bag of insights to look through whenever you’re in need of inspiration.

So, how do you track insights over time and revalidate them in subsequent experiments? Also, does Jenny in branding know about your latest insights into the importance of your product imagery? How do you get her on board and keep her up to date on a consistent basis?

Both of these challenges deserve attention.

The simplest “system” for tracking insights is via spreadsheet, with columns that codify insights by type, device, and any other useful criteria for browsing and grouping. This proves unscalable when you’re testing at high velocity. That’s where a custom platform comes into play that does the job of tracking and sharing insights.

For example, the team at The Next Web created in internal tool for tracking tests, insights, then easily sharing ideas via Slack. There are other publicly available options, most of which integrate with Optimizely or VWO.
cro-management-tool

  • BEFORE: Excel sheets, Powerpoint presentations, word of mouth, or nothing at all.
  • TODAY: A shared and tagged database of insights that link back to the experiments that generated them and is updated on the fly. Tools such as Experiment Engine, Effective Experiments, Iridion and Liftmap are all solving some part of this puzzle.
  • When to do it: When you’re learning a lot of valuable things, but having trouble tracking or sharing what you learn. (BTW, if you’re not having this problem, you might be doing something wrong.)
  • When not to do it: When the future is of little importance.

Code implementation-ready variations

High velocity testing doesn’t just mean quickly getting tests out the door; it means being able to implement winners immediately and move on. To make this possible, your test code has to be ready to implement, meaning:

  • Code should be modularized. Your scripts should be modularized into sections functionality and design changes.
  • If you’re doing it right, style changes should be done by applying classes rather than using javascript. All css should be in one file and class names should align with your website, ready to be added when your test is completed.

ab-testing-code-modularity

  • BEFORE: Messy jQuery.
  • TODAY: Modularized experiment code, separated css that aligns with classnames.
  • When to do it: When you wish to make the implementation process as painless as possible.
  • When not to do it: When you just don’t care.

Create FOOC-free variations

If your test variations “flicker” or “flash” as they load, you’re experiencing Flash of Original Content or FOOC. It will affect your results if it goes untreated. Some of the best ways to prevent it are as follows:

  • Place your code snippets as high as possible on the page.
  • Improve site load time in general (regardless of your testing tool).
  • Briefly hide the body or div element being tested.
  • Here are 8 more remedies to fight FOOC.
Don't code your variations like this.
Don’t code your variations like this.
  • BEFORE: FOOC-galore.
  • TODAY: FOOC-free variations abound.
  • When to do it: Always.
  • When not to do it: Never.

Don’t test buttons, test business decisions

Some people think of A/B testing as a way to improve the look of their website, while others use it to test the fundamentals of their business. Take advantage of the tools at your disposal to get to the heart of what makes your business tick.

For example, we tested reducing the product range of one of our clients and discovered that they could save millions on manufacturing and marketing without losing revenue. What are the big lingering questions you could answer through A/B testing?

  • BEFORE: Most of us tested button colors at one point or another.
  • TODAY: Business decisions are being validated through A/B tests.
  • When to do it: When business decisions can be tested online, in a controlled manner.
  • When not to do it: When most factors cannot be controlled for online, during the length of an A/B test.

Use data science to test predictions, not ideas

It is highly likely that you are underutilizing the customer analytics that are available to you. Most of us don’t have the team in place or the time to dig through the data constantly. But this could be costing you dearly in missed opportunities.

If you have access to a data scientist, even on a project-basis, you can uncover insights that will vastly improve the quality of your A/B test hypotheses.
what-is-a-data-scientist

Source: Become a data scientist in 8 steps: the infographic – DataCamp

  • BEFORE: Throwing spaghetti at the wall.
  • TODAY: Predictive analytics can uncover data-driven test hypotheses.
  • When to do it: When you’ve got lots of well-organized analytics data.
  • When not to do it: When you prefer the spaghetti method.

Optimize for volume of tests

There was a time when “always be testing” was enough. These days, it’s about “always be testing in 100 different places at once.” This creates new challenges:

How do you test in multiple parts of the same funnel synchronously without concern for cross pollination?


How do you organize your human resources in a way to get all the work done?

This is the art of being a conversion optimization project manager: knowing how to juggle speed vs value of insights and considering resource availability. At WiderFunnel, we do a few things that help make sure we go as fast as possible without sacrificing insights:

  • We stagger “difficult” experiments with “easy” ones so that production can be completed on “difficult” ones while “easy” ones are running.
  • We integrate with testing tool APIs to quickly generate coding templates, meaning our development doesn’t need to do any manual work before starting to code variations.
  • We use detailed briefs to keep everyone on the same page and reduce gaps in communication.
  • We schedule experiments based on “insight flow” so that earlier experiments help inform subsequent ones.
  • We use algorithms to control for cross-pollination so that multiple tests within the same funnel can be run while being able to segment any cross-pollinated visitors.

widerfunnel-insight-flow-ab-testing

  • BEFORE: Running one experiment at a time.
  • TODAY: Running experiments across devices, segments, and funnels.
  • When to do it: When you’ve got the traffic, conversions and the team to make it happen.
  • When not to do it: When there aren’t enough conversions to go around for all of your tests.

Don’t get stuck in the optimization ways of the past. The industry is moving quickly, and the only way to stay ahead of your competitors (who are also testing) is to always be improving your conversion optimization program.

Bring your testing strategies into the modern era by mastering the 8 tactics outlined above. You’re an optimizer, after all―it’s only fitting that you optimize your optimization.

Do you agree with this list? Are there other aspects of modern-era CRO not listed here? Share your thoughts in the comments!

Author

Alhan Keser

Sr Manager, Conversion Optimization at American Express

Discover how your experimentation program stacks up!

Benchmark your experimentation maturity with our new 7-minute maturity assessment and get proven strategies to develop an insight-driving growth machine.

Get started