Data visualization fundamentals

Data visualizations make complex concepts easier to understand. They can help users to explore, to monitor, and to explain data.

What is data visualization good for?#

Visualizations are tools that can make complex concepts easier for humans to understand. In the words of engineer and inventor Douglas Engelbart, “a tool doesn’t just make something easier—it allows for new, previously-impossible ways of thinking, of living, of being.”

The utility of data visualization can be divided into three main goals: to explore, to monitor, and to explain. While some visualizations can span more than one of these, most focus on a single goal.

To explore#

When users are looking for an open-ended tool that helps them to find patterns and insights in data, a data visualization focused on exploration and fast iteration can help. Exploration tools should have strong connections to other tools that collect (extract), clean (transform), and curate (load) data.

To monitor#

When users need to check on the performance of something, a data visualization focused on monitoring is best. Monitoring tools, such as dashboards, should focus on leading indicators and showing information that is connected to useful and direct actions.

To explain#

When users want to go beyond the “what” of a problem and dig into the “why,” a data visualization focused on explanation is ideal. Explanatory visualizations are often hand-crafted to help a broad audience understand a complex subject, and usually are not able to be automated.

Glossary of terms#

The following are terms we use in Spectrum to talk about data visualization. Any commonly used synonyms for these are noted in-line in specific guidelines.

Diagram illustrating the relationship between data visualization terms. Data is divided into two types: dimensions and metrics. Dimensions use either categorical or ordinal scales, or can be counted to become a ratio scale. Metrics are either ratio or interval scales. Ratio and interval scales can be binned to become ordinal scales.Diagram illustrating the relationship between data visualization terms. Data is divided into two types: dimensions and metrics. Dimensions use either categorical or ordinal scales, or can be counted to become a ratio scale. Metrics are either ratio or interval scales. Ratio and interval scales can be binned to become ordinal scales.

Metric#

A metric contains numeric, quantitative values that can be measured. Measures are continuous, and the difference between values can be quantified.

ScaleCategoricalOrdinalIntervalRatioExampleCountry (US, Japan, Mexico)Status (Extinct, Endangered, Threatened)Temperature (32°, 54°, 68°)Height (1.65 m, 3.1 m, 2.01 m.)The order of the values is knownxxxHas a mode (most frequent value)xxxxHas a median (middle value)xxxHas a mean (average value)xxCan quantify the difference between valuesxxCan multiply and divide valuesxHas a “true” zerox

Dimension#

A dimension contains qualitative values such as names, types, or places. Dimensions can be used to categorize, segment, and reveal details in data. A dimension is discrete; each value is individually separate and distinct.

Categorical scale#

In a categorical (nominal) scale, values are not associated with numeric values. Examples of this include locations (e.g., cities, states, countries) or scientific classification systems (e.g., kingdoms of animals or plants).

Ordinal scale#

In an ordinal (ordered) scale, values have implicit order. Two common examples of this is a ranked list (e.g., 1st, 2nd, 3rd) and sentiment (e.g., strongly disagree, disagree, neutral, agree, strongly agree), in which the order the items appear in is a vital piece of information.

Ratio scale#

One example of a ratio scale is a ruler, where values are plotted at specific points on the scale to represent their exact measure. Things like height and age also use this scale. Ratio scales usually start at zero because zero is the most meaningful starting point.

Interval scale#

An interval scale has a lot in common with a ratio scale, but it lacks a meaningful zero or origin point. Examples of this include temperature and time. In an interval scale, 20° is 15° hotter than 5°, but it would be misleading to claim that 20° is four times as hot as 5°.

Continuous data#

When values represent measurements (for example, height or age), the data is continuous.

Categorical data#

When values represent distinct entities (for example, names), they are discrete, and the data is categorical. The values themselves are often referred to as "categories.”

Discrete data#

Bin#

Binning is a way of taking continuous data and making it discrete. When numerical values are divided into discrete sections, these sections are referred to as “bins.” Bins are usually equal in size.

Mean#

The average value.

Median#

The middle value

Mode#

The most frequent value.