< Back to main blog page

Machine Learning and Experimentation

Words by: Michael St Laurent Last updated: May 31st, 2022 11 min. read

  • A/B Testing
  • Conversion Rate Optimization

Machine Learning in experimentation: hype, strengths, and limitations

As it applies to experimentation, the term “machine learning” is used in different ways by different companies. For our purposes, we will use IBM’s definition of machine learning as “a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.”.

As is often the case with new technology, it has been my experience that some professionals will overvalue the future professional and societal implications of new technological tools. This phenomenon has been consistently observed, and is part of the natural process of the Gartner Hype Cycle. It has been my observation that Machine Learning, as it pertains to experimentation, is currently around the “Peak of Inflated Expectations”. Clients have frequently asked if they could focus their experimentation practice primarily on Machine Learning, without taking into consideration how machine learning fits into the larger picture as it relates to experimentation.

While Machine Learning has some incredible applications to enhance and complement traditional A/B tests and experimentation, it shouldn’t be viewed as a replacement. Even as machine learning technology continues to advance, it will not be a silver bullet. Rather than asking when or if Machine Learning will end up driving the bulk of experimentation, a more useful exercise is evaluating the strengths of the technology and considering how those strengths can enhance how experimentation is already done.
Ultimately, machine learning is still limited by its inability to understand and develop higher-level strategies. However, the benefits of machine learning are abundant: Notably, It can help serve personalized experiences, maximize revenue, predict the likelihood that a lead will convert to a paying customer, and more.

Why use Machine Learning in experimentation?

A/B testing and experimentation are evolving practices. Traditionally, A/B tests allocate 50% of your traffic to a control and 50% to a variation and this traffic allocation stays consistent for the duration of the experiment. This is known as fixed traffic allocation and the most common way testing is done. Given the advantages and disadvantages of machine learning algorithms, what do they have to offer over fixed-allocation methods?

A critical problem with fixed allocation methods is that they are performance-inefficient:

Suppose a test has two variants and one control. If one variant is performing well and the other poorly, then you will be sending an equal amount of traffic to a good and poor experience. If you are trying to optimize for performance, you would ideally send progressively more traffic to high-performing variants, and progressively less to low-performing ones, with eventually all of the traffic being sent to the best option.

If you test multiple variations and one of them gets a statistically significant result more quickly than others (answering your hypothesis), then you will have “wasted” traffic continuing to explore that variant as much as the others. This ‘over-exploration; leads to experiments being run for longer than is necessary.

A core tenant of fixed-allocation testing is that randomization is critical to determining causality. So in a random assignment system experiences are served to users without any regard for their personal attributes. At the end of the test we look at the results and determine which variation is “best”. But, this depends on the assumption that there is a “best experience” for everyone, on average. In reality each user is unique, so a more advanced outlook might be that there is a “best experience” for each individual. The type of experience you want to provide might depend on a user’s geographical location, language spoken, device (PC, Mac, tablet, or smartphone), the time of day, and more. This is the basic theory behind personalization.

These are all examples of challenges that can’t be resolved at a rapid enough pace by humans, making them ideal candidates for the real-time problem solving abilities of a computer. It would be inefficient for a human to sit at a test all day constantly responding to performance changes between variations, not to mention all of the bias that would introduce. Similarly, there is no way that a human could assign a personalized experience to a particular user in the time it takes for a page load.

Algorithms can make these kinds of predictions orders of magnitude faster than a human can.

Machine Learning in experimentation: an overview

Machine learning coupled with experiment tracking tools can self-tune and adapt while an A/B test is still live, allocating traffic to the best-performing variant in order to increase conversions. The exact percentages of users sent to each variant will depend on how “greedy” the algorithm is​​ — i.e., how much the algorithm wants to take advantage of the most promising variant or how much it wants to explore other options.

This kind of algorithm is known as a “multi-armed bandit.” Multi-armed bandit tests, combined with machine learning-based personalization, can wisely allocate users to better-performing variants, while at the same time leaving room for new experimentation to possibly gain new insights.

Another central machine learning concept is a random forest model, which businesses use to determine the best experience to show visitors. Random forests are ensemble models of decision trees (hence the name “forest”), merging many trees together to create a stable predictive model.

Random Forest decision tree.

The trees are trained using the bagging method, using different machine learning training parameters to increase the quality of the model. Every time a user takes an action, the algorithm logs the result so that it knows whether to send more or fewer visitors down that branch next time.

The Random Forest: understanding the scope and limitations of Machine Learning

The random forest’s simplicity makes it one of the most widely used algorithms in machine learning. Even the most non-technical people can easily understand the concept of a decision tree.

For example, you can model the process of deciding what kind of pizza you want to order with a decision tree. Questions such as “What restaurant should I order from?” and “What toppings should I add?” “Should I dine in, or do I want delivery?” are all separate but overlapping decision trees that all help you narrow down your selection.

Random forests are groups of many decision trees like those, each one with a slightly different randomized structure. Thus, different trees may come to a different decision, even when given the same data. Random forests take advantage of the notion of “wisdom of crowds”: They feed the same data to the forest and then select the most voted result. This helps prevent errors in personalization and overfitting.

Another real-life example may be for a supermarket to detect some non-identifiable attributes about customers on their way into the store and then log what is on their receipts on the way out, looking for connections. A system like this may discover that when a customer parks a minivan and arrives on the weekend that they are more likely to buy children’s toys. So now the next time a minivan arrives maybe those customers should be given a flyer that shows children’s toys.

Unfortunately for supermarkets they cannot rearrange the store entirely to tailor the experience for every individual because there are simultaneously many shoppers in the store (the experience is fixed) and the cost of changing the store around is enormous. A huge advantage of digital retail is that entirely different representations of your store can be engaged simultaneously and the cost of customization is much lower.

Machine Learning and personalization: practical applications

Machine learning-based personalization can help determine the best experience for each visitor and offer them a truly unique experience. It is quickly becoming one of the go-to tools for personalization and user segmentation.

Unlike traffic allocation algorithms, there is no “winning variation” or best result in machine learning-based personalization. Rather, machine learning models for personalization can run in perpetuity, adapting to new user preferences and behaviors.

Instead, you measure the performance of the entire algorithm against not having that algorithm. Personalization-based machine learning algorithms can learn how to predict the most effective content and experiences for a particular user. This may be in service of a particular short-term goal, such as converting a user to a paying customer, or for the general goal of raising customer satisfaction rates. Any kind of data that can be fed into an algorithm in real time can then be used to build machine learning personalization models.

The limitations of Machine Learning for CRO

While machine learning is a tremendously powerful technology, it’s far from a panacea for optimizing your digital experiences. In fact, there are several pitfalls of machine learning, and it is poorly suited for certain applications. For example, neural networks often struggle to deal with real-world content that deviates from the dataset it was trained on.

Another downside is the layers of complexity it creates. It is easy to plan new experiments on top of static experiences but it becomes increasingly costly to plan and run experiments when you have to consider all the algorithmically increased variations and interactions that could occur. An argument could be made that adding more dynamically served content could actually limit your ability to be agile and innovate because of the layers of complexity in your system.

If deployed correctly, machine learning tools allow for cutting-edge personalization, traffic allocation, and A/B testing. However, they cannot replace the expertise of a human-led conversion rate optimization (CRO) team. An experienced team can advise on strategy, methods, and best practices. By combining the knowledge of humans and machines, you can use the right tools and algorithms to further improve your conversion rates and maximize revenue.

Machine Learning and experimentation: the cutting edge

After building an effective team and experimentation program, layering in an additional tool like Machine Learning can be daunting, as teams wish to further maximize revenue and the allocation of scarce resources and minimizing road bumps along the way.

The right partner can help you overcome the most frequent pitfalls and difficulties of the experimentation process, including making use of Machine Learning algorithms in your technologies.

​​As a leading CRO agency, we have the skills and experience to deliver powerful results for our clients. Ask us how you can Use Machine Learning to help supplement your Experimentation practice today.

Works cited and sourced





Related Articles

Spotlight: Nationwide’s Julia Barham on research techniques that drive expansive growth

From her time at Capital One and The Motley Fool, to her current role at Nationwide,...

continue reading arrow image
Purists and Pragmatists in experimentation

The clash between data purists and pragmatists is an age-old question, and one that every organization...

continue reading arrow image
A/B Testing During Black Friday Promotions

Mature experimentation programs can capitalize on Black Friday to discover new customer insights and the most...

continue reading arrow image

Subscribe to get experimentation insights straight to your inbox.