Data visualization is the art of our age.
Just as Michelangelo approached that giant block of Carrara marble and said, “I saw the angel in the marble and carved it until I set him free,” analysts are approaching data with the same visionary and inquisitive mind.
In today’s age, where big data reigns, the art of data analysis is making sense of our world.
Analysts are chiseling the bulk of raw data to create meaning—patterns and associations, maps and models— to help us draw insights, understand trends, and even make decisions from the stories the data tell.
Data visualization is the graphical display of abstract information for two purposes: sense-making (also called data analysis) and communication. Important stories live in our data and data visualization is a powerful means to discover and understand these stories, and then to present them to others.
But in presenting such complex information, data analysis is not easily computable to the human brain until it is presented in a data visualization.
Tables, charts, and graphs provide powerful representations of numerous data points so that the insights and trends are easily understood by the human brain.
That’s why data visualization is one of the most persuasive techniques to evangelize experimentation today—particularly in an era of ever-decreasing attention spans.
On a slide. On a dashboard in Google Data Studio. Or simply something you plan to sketch on a whiteboard. This presentation of the data will decide if your trends and insights are understood, accepted and inferences drawn as to what action should be taken.
A thoughtfully crafted visualization conveys an abundance of complex information using relatively little space and by leveraging our visual system—whether that’s the optimal number of lead generation form fields or the potential ROI of your program throughout the quarter.
In this post, we dig into the practice of designing data visualizations for your audience. You will learn:
- How your data visualizations can enhance the Executive decision-making process, using the guidelines of the Cynefin Framework
- Why data visualizations are the most powerful way for the human brain to compute complex information through dual processing theory
- What makes data visualizations effective using the five qualities defined by expert Alberto Cairo
- And a real-world example of how you can problem-solve a problem to result in the most effective data visualization for your audience.
The Brain (Or, why we need data visualization)
You may be familiar with System 1 and System 2 thinking, known as dual processing theory. System 1 (or Type 1) is the predominant fast, instinctual decision-making and System 2 (Type 2) is the slow, rational decision-making.
We often relegate System 1 thinking to your audience’s emotions. (We talked about it in “Evangelizing experimentation: A strategy for scaling your test and learn culture” or in “I feel, therefore I buy: How your users make buying decisions.”)
But that immediate grasp over complex information in a data visualization is also related to System 1 thinking.
A large part of our brain is dedicated to visual processing. It’s instinctual. It’s immediate.
If you have a strong data visualization, every sighted person can understand the information at hand. A seemingly simple 5×5 chart can provide a snapshot of thousands of data points.
In other words, visualizing data with preattentive features in mind is akin to designing ergonomic objects: You know that a sofa is made for sitting. You know that a handle on a coffee mug is designed for your hand. (This is called preattentive processing.)
Preattentive processing occurs before conscious attention. Preattentive features are processed very quickly…within around 10 milliseconds.
When creating data visualizations, you are designing for human physiology. Any other method of translating that information is a disservice to your message and your audience.
When we consider the speed of which people understand the multiple data points in a problem through dual processing theory and preattentive processing, it’s almost foolish not to take advantage of data visualization.
When you design data visualizations, you are understanding your audience.
Understanding how Executives make decisions
A data visualization is a display of data designed to enable analysis, exploration, and discovery. Data visualizations aren’t intended mainly to convey messages that are predefined by their designers. Instead they are often conceived as tools that let people extract their own conclusions from the data.
Data analysis allows Executives to weigh the alternatives of different outcomes of their decisions.
And data visualizations can be the most powerful tool in your arsenal, because your audience can see thousands of data points on a simple chart.
Your data visualization allows your audience to gauge (in seconds!) a more complete picture so they can make sense of the story the data tell.
In Jeanne Moore’s article “Data Visualization in Support of Executive Decision Making,” the author explored the nature of strategic decision making through the Cynefin framework.
The Cynefin Framework
Created by David Snowden in 1999 when he worked for IBM Global Services, the Cynefin framework has since informed leadership decision making at countless organizations.
The five domains of the Cynefin Framework are:
- In the Simple Domain, there is a clear cause and effect. The results of the decision are easy to predict and can be based on processes, best practices, or historical knowledge. Leaders must sense, categorize and respond to issues.
- In the Complicated Domain, multiple answers exist. Though there is a relationship between cause and effect, it may not be clear at first (think known unknowns). Experts sense the situation, analyze it and respond to the situation.
- In the Complex Domain, decisions can be clarified by emerging patterns. That’s because issues in this domain are susceptible to the unknown unknowns of the business landscape. Leaders must act, sense and respond.
- In the Chaotic Domain, leaders must act to establish order to a chaotic situation (an organizational crisis!), and the further gauge where stability exists and doesn’t exist to get a handle on the situation and move it into the complex or complicated domain.
- And in the Disorder Domain, the situation cannot be categorized in any of the four domains. It is utterly an unknown territory. Leaders can analyze the situation and categorize different parts of the problem into the other four domains.
In organizations, decision making is often related to the Complex Domain because business leaders are challenged to act in situations that are seemingly unclear or even unpredictable.
Leaders who try to impose order in a complex context will fail, but those who set the stage, step back a bit, allow patterns to emerge, and determine which ones are desirable will succeed. They will discern opportunities for innovation, creativity, and new business models.
Poor quarterly results, management shifts, and even a merger—these Complex Domain scenarios are unpredictable, with several methods of responding, according to David J. Snowden and Mary E. Boone.
In other words, Executives need to test and learn to gather data on how to best proceed.
“Leaders who don’t recognize that a complex domain requires a more experimental mode of management may become impatient when they don’t seem to be achieving the results they were aiming for. They may also find it difficult to tolerate failure, which is an essential aspect of experimental understanding,” explains David J. Snowden and Mary E. Boone.
Probing and sensing the scenario to determine a course of action can be assisted by data analyst to understand collaboratively the historical and current information at hand—in order to guide the next course of action.
An organization should take little interest in evaluating — and even less in justifying — past decisions. The totality of its interest should rest with how its data can inform its understanding of what is likely to happen in the future.
Of course, there is always the threat of oversimplifying issues, treating scenarios like they have easy answers.
But even with situations in the other domains of the Cynefin Framework, data visualization can provide insight into next steps—if they meet certain criteria.
What makes an effective data visualization
The presenter of the visualization must also provide a guiding force to assist the executive in reaching a final decision, but not actually formulate the decision for the executive.
With data visualization, there will always be insightful examples and examples that clearly missed the mark.
Avinash Kaushik, in his Occam Razor’s article, “Closing Data’s Last-Mile Gap: Visualizing For Impact!” called the ability for data visualizations to influence the Executive’s decision-making process closing the “last-mile” gap.
It can take an incredible effort to gather, sort, analyze and glean insights and trends from your data. If your analysis is solid, if your insights and trends are enlightening, you don’t want to muddle your audience with a confusing data visualization.
Remember: a data visualization is only as impactful as its design is on your audience.
In terms of the value in data visualization, it must provide simplicity, clarity, intuitiveness, insightfulness, gap, pattern and trending capability in a collaboration enabling manner, supporting the requirements and decision objectives of the executive.
Alberto Cairo’s Five Qualities of Great Data Visualizations
- Truthful: It should be based on thorough and objective research—just as a journalist is expected to represent the truth to the best of their abilities, so too is the data analyst.
- Functional: It should be accurate and allow your audience to act upon your information. For instance, they can perceive the incremental gains of your experimentation program over time in a sloping trendline.
- Beautiful: It needs to be well-designed. It needs to draw in your audience’s attention through an aesthetically pleasing display of information.
- Insightful: It needs to provide evidence that would be difficult to see otherwise. Trends, insights, and inferences must be drawn by the audience, in collaboration with the data analyst.
- Enlightening: It needs to illuminate your evidence. It needs to enlighten your audience with your information in a way that is easy to understand.
When you nail down all five of these criteria, your data visualization can shift your audience’s ways of thinking.
It can lead to those moments of clarity on what action to take next.
So, how are these design decisions made in data visualization?
Here’s an example.
How we make decisions about data visualization: An example in process
A note on framing: While the chart and data discussed below are real, the framing is artificial to protect confidentiality. The premise of this analysis is that we can generate better experiment ideas and prioritize future experiments by effectively communicating the insights available in the data.
Lead generation forms.
You probably come across these all the time in your web searches. Some forms have multiple fields and others have few—maybe enough for your name and email.
Suppose you manage thousands of sites, each with their own lead generation form—some long and some short. And you want to determine how many of fields you should require from your prospects.
If you require too many form fields, you’ll lose conversions; too few, and you’ll lose information to qualify those prospects.
It’s a tricky situation to balance.
Like all fun data challenges, it’s best to pare the problem down into smaller, manageable questions. In this case, the first question you should explore is the relationship between the number of required fields and the conversion rate. The question is:
How do conversion rates change when we vary the number of required fields?
Unlike lead quality—which can be harder to measure and is appraised much further down the funnel—analyzing the relationship between the number of required fields and the number of submissions is relatively straightforward with the right data in hand. (Cajoling the analytics suite to provide that data can be an interesting exercise in itself—some will not do so willingly.)
So, you query your analytics suite, and (assuming all goes well), you get back this summary table:
What’s the most effective way to convey the message in this data?
Most of you probably glossed over the table, and truth be told, I don’t blame you—it’s borderline rude to expect anyone to try to make sense of these many variables and numbers.
However, if you spend half a minute or so analyzing the table, you will make sense of what’s going on.
In this table format, you are processing the information using System 2 thinking—the cognitive way of understanding the data at hand.
On the other hand, note how immediate your understanding with a simple data visualization…
The bar graph
In terms of grasping the relationship in the data, it was pretty effective for a rough-and-ready chart.
In less than a second, you were able to see that conversion rates go down as you increase the number of required fields—but only until you hit four required fields. At this point, average conversion rates (intriguingly!) start to increase.
But you can do better…
For a good data visualization, you want to gracefully straddle the line between complexity and understanding:
How can we add layers of information and aesthetics that enrich the data visualization, without compromising understanding?
No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails.
Adding layers of information can’t be at the expense of your message—rather, it has to be in service of that message and your audience. So, when you add anything to the chart above, the key question to keep in mind is:
Will this support or undermine making informed business decisions?
In this case, you can have some fun by going through a few iterations of the chart, to see if any visualization works better than the bar chart.
The dot plot
Compared to a bar chart, a dot plot encodes the same information, while using fewer pixels (which lowers visual load), and unshackles you from a y-axis starting at zero (which is sometimes controversial, according to this Junk Charts article and this Stephanie Evergreen article).
In the context of digital experimentation, not starting the y-axis at zero generally makes sense because even small differences between conversion rates often translate into significant business impact (depending on number of visitors, the monetary / lifetime value of each conversion, etc.).
In other words, you should design your visualization to make apparent small differences in conversion rates because these differences are meaningful—in this sense, you’re using the visualization like researchers use a microscope.
If you are still not convinced, an even better idea (especially for an internal presentation) would be to map conversion rate differences to revenue—in that case, these small differences would be amplified by your site’s traffic and conversion goal’s monetary value, which would make trends easier to spot even if you start at 0.
Either way, as long as the dots are distant enough, large enough to stand out but small enough to not overlap along any axis, reading the chart isn’t significantly affected.
More importantly (spoiler alert!), our newly-found real estate (after changing from bars to dots) allows you to add layers of information without cluttering the data visualization.
One such layer is the data’s density (or distribution), represented by a density plot.
A density plot
A density plot uses the height of the curve to show roughly how many data points (what percentage of sites) require how many fields. In this case, the density plot adds the third column (“Percent of Sites”) from the table you saw earlier.
That makes it easy to see (once you understand how density plots work) how much stock to place in those averages.
For example, an average that is calculated on a small number of sites (say, less than 1% of the available data) is not as important or informative as an average that represents a greater number of sites.
So, if an average was calculated based on a mere ten sites, we would be more wary of drawing any inferences pertaining to that average.
Visualizing uncertainty and confidence intervals
When we add the density plot, we see that most of our data comes from sites that require between one and four fields (80%, if you added the percentages in the table), the next big chunk (19%) come from sites that require five to nine fields, and the remaining 1% (represented by the flat sections of the density curve) require more than nine. (The 80/20 rule strikes again!)
Another useful layer of information is the confidence interval for these averages. Given the underlying data (and how few data points go into some averages), how can we represent our confidence (or uncertainty) surrounding each average?
Explaining Confidence Intervals
Let’s say you’re taking a friend camping for three days, and you want to give them enough information so they can pack appropriately.
You check the forecast and see lows of 70°F, highs of 73°F, and an average of 72°F.
So, when you tell your friend “it’s going to be about 72°F“—you’re fairly confident that you’ve given them enough information to enjoy the trip (in terms of packing and preparing for the weather, of course).
On the other hand, suppose you’re camping in a desert that’s expecting lows of 43 °F, highs of 100°F, and (uh oh) an average of 72°F.
Assuming you want this person to travel with you again, you probably wouldn’t say, “it’s going to be about 72°F.” The information you provided would not support them in making an informed decision about what to bring.
That’s the idea behind confidence intervals: they represent uncertainty surrounding the average, given the range of the data, thereby supporting better decisions.
Visually, confidence intervals are represented as lines (error bars) that extend from the point estimate to the upper and lower bounds of our estimate: the longer the lines, the wider our interval, the more variability around the average.
When the data are spread out, confidence intervals are wider, and our point estimate is less representative of the individual points.
Conversely, when the data are closer together, confidence intervals are narrower, and the point estimate is more representative of the individual points.
At this point, there are two things to note: first, when you look at this chart, your attention will most likely be drawn to the points with the widest confidence intervals.
That is, the noisiest estimates (the ones with fewer data points and / or more variability) take up the most real estate and command the most attention.
Obviously, this is not ideal—you want to draw attention to the more robust and informative estimates: those with lots of data and narrower intervals.
Second, the absence of a confidence interval around thirteen required fields means that either there’s only one data point (which is likely the case, given the density curve we saw earlier), or all the points have the same average conversion rate (not very likely).
Luckily, both issues have the same solution: cut them out.
How to best handle outliers is a lively topic—especially since removing outliers can be abused to contort the data to fit our desired outcomes. In this case, however, there are several good reasons to do so.
The first two reasons have already been mentioned—these outliers come from less than 1% of our entire data set: so, despite removing them, we are still representing 99% of our data.
Second, they are not very reliable or representative, as evidenced by the density curve and the error bars.
Finally, and more importantly—we are not distorting the pattern in the data: we’re still showing the unexpected increase in the average conversion rate beyond four required fields.
We are doing so, however, using the more reliable data points, without giving undue attention to the lower quality ones.
Lastly, to visualize and quantify our answer to the question that sparked the whole analysis (how do conversion rates change when we vary the number of required fields?), we can add two simple linear regressions: the first going from one to four required fields, the second from four to nine required fields.
Why two, instead of the usual one?
Because we saw from the density chart discussion that 80% of our data comes from sites requiring one to four fields, a subset that shows a strong downward trend.
Given the strength of that trend, and that it spans the bulk of our data, it’s worth quantifying and understanding, rather than diluting it with the upward trend from the other 20%.
That remaining 20%, then, warrants a deeper analysis: what’s going on there—why are conversion rates increasing?
The answer to that will not be covered in this article, but here’s something to consider: could there be qualitative differences between sites, beyond four required fields? Either way, the regression lines make the trends in the data clearer to spot.
After adding the regression line, you summarize the main take-away with a nice, succinct subtitle:
“Increasing the number of Required Fields from one to four decreases average conversion rate by 1.2% per additional field, for 80% of sites.”
This caption helps orient anyone looking at the chart for the first time—especially since we’ve added several elements to provide more context.
Note how the one statement spans the three main layers of information we’ve visualized:
- The average conversion rate (as point estimates)
- The distribution of the data (the density curve)
- The observed trend
Thus, we’ve taken a solid first pass at answering the question:
How do conversion rates change when we vary the number of required fields?
Does this mean that all sites in that 80% will lose ~1% conversion rate for every required field after the first?
Of course not.
As mentioned in the opening section, this is the simplest question that’ll provide some insight into the problem at hand. The lowest-hanging fruit, if you will.
However, it is far from a complete answer.
You’ve gently bumped into the natural limitation of bivariate analyses (an analysis with only two variables involved): you’re only looking at the change in conversion rate through the lens of the number of required fields, when there are obviously more variables at play (the type of site, the client base, etc.).
Before making any business decisions, you would need a deeper dive into those other variables, (ideally!) incorporate lead quality metrics, to have a better understanding of how the number of required fields impacts total revenue.
And this is where you come back full circle to experimentation: you can use this initial incursion to start formulating and prioritizing better experiment ideas.
For example, a successful experimentation strategy in this context would have to, first, better understand the two groups of sites discussed earlier: those in the 80% and those in the other 20%.
Additionally, more specific tests (i.e., those targeting sub-domains) would have to consider whether a site belongs to the first group (where conversion rates decrease as the number of required fields increase) or the second group (where the inverse happens)—and why.
Then, we can look at which variables might explain this difference, and what values these variables take for that site.
For example, are sites in the first group B2C or B2B? Do they sell more or less expensive goods? Do they serve different or overlapping geographic regions?
In short, you’ve used data visualization to illuminate a crucial relationship to stakeholders, and to identify knowledge gaps when considering customer behaviour across a range of sites.
Addressing these gaps would yield even more valuable insights in the iterative process of data analysis.
And these insights, in turn, can guide the experimentation process and improve business outcomes.
Your audience needs to trust your data visualization—and you.
When your experimentation team and Executives can get into the boardroom together, it’s disruptive to your business. It shakes your organization from the status quo, because it introduces new ways of making decisions.
Data-driven decisions are proven to be more effective.
In fact, The Sloan School of Business surveyed 179 large publicly traded firms and found that those that used data to inform their decisions increased productivity and output by 5-6%.
And data analysts have the power to make decision-making among Executive teams more informed.
Relying not on the Executive’s ability to rationalize through the five domains of the Cynefin Framework, data visualization presents the true power of experimentation. And the ability for experimentation to solve real business problems.
But like any working dynamic, you need to foster trust—especially when you are communicating the insights and trends of data. You need to appear objective and informed.
You need to guide your audience through the avenues of action that are made clear by your analysis.
Of course, you can do this through speech. But you can also do this through the design of your data visualizations.
Whether you are presenting them in a dashboard where your team can keep a pulse on what’s happening with your experimentation program, or if it’s a simple bar graph or dot plot in your slide deck, your data visualizations matter.
Clear labeling and captions, graphic elements that showcase your data dimensions, lowering visual load, and even using color to distinguish elements in your data visualization—these help your audience see what possibilities exist.
They help your audience identify patterns and associations—and even ask questions that can be further validated through experimentation.
Because experimentation takes the guesswork out of decision making. Your data visualizations make it easier for the Executive to navigate the complexity of situations they are challenged today.
And that is, ultimately, the most persuasive way to evangelize experimentation at your organization.
How impactful have you found strong data visualizations on your team’s decision-making process? We’d love to hear about it.