Graphics: Choose a Graph Type

Author

Stacy DeRuiter

Published

January 21, 2025

Section Learning Outcomes

After this tutorial, you will:

  1. Distinguish variable types: quantitative, categorical (nominal, ordinal, interval, ratio); explanatory, response, covariate.
  2. Choose an appropriate graphical display for a specified combination of variables.
  3. (Continue to) critique statistical graphics based on design principles.

Note: You do NOT have to memorize all the information in this tutorial. Review it now, but know you will probably return to this tutorial for later reference. Your goal should be to finish with a basic idea of which graph types should be used for which variable types. Notice that the “Gallery” sections in the navigation bar are labeled by which variable types are to be shown!

At the end, you might want to finish with your own notes filling in a table like the one below:

Variables Graphs
One Quantitative histogram, density plot, …
One Categorical

Text Reference

Motivation: Imagine First!

Figures are a crucial tool for exploring your data and communicating what you learn from the data.

Whether you are doing a quick check to assess basic features of a dataset or creating a key figure for an important presentation, the best practice is to work thoughtfully. You already learned about creating graphics by I.C.E.E:

The I.C.E.E. method:

  • Imagine how you want your graph to look, before you
  • Code. Once you have the basic starting point,
  • Evaluate your work, and
  • Elaborate (refine it).

Repeat until the figure is as awesome as it needs to be.

Limiting Your Imagination

There is really no limit to the creative data visualizations you might dream up.

But there is a set of basic, workhorse graphics that statisticians and data scientists use most frequently. What are the common options and how do you choose among them?

The best choice depends on what kind of data you have, and also on what you want to do with it: what question are your trying to answer? What story will you tell?

Goals

Specifically, you will now focus on choosing the right type of visualization for the task at hand.

Note that the graphs shown in this tutorial are over-simplified versions - icons, really - with missing labels, huge titles, and huge data elements. This is intentional, to evoke the look of each plot type rather than to present actual data.

Variable Types

Before designing a graphic, you need some data. Ideally, it will be in a tidy table, with one row per case and one column per variable.

Different plots may be appropriate, depending on whether the variable is:

  • Categorical (either nominal or ordinal) or
  • Quantitative (interval or ratio)
  • Beware categorical variables that are stored using numeric codes: they are still categorical!
  • Note: Variables that take on discrete numeric values can be treated as either, depending mainly on whether there are a lot of possible values (treat like numeric) or few (treat like categorical)
  • Other courses or disciplines may distinguish carefully between ordinal and nominal data. We often won’t, since we don’t learn distinct methods for them, but treat both as categorical.

The video below gives a concise explanation of the different variable types you need to be able to recognize.

DISTRIBUTIONS

Sometimes, you need a plot that lets you see the distribution of a single variable:

  • What values does it take on?
  • How often does each value occur?

Sometimes these graphs present the answer to a scientific question of interest, but often they are used during exploration or model assessment to better understand a dataset and:

  • Check the data
    • Are there lots of missing values?
    • Are missing values encoded as 999 or -1000 or some other impossible value instead of being marked as “NA”?
  • Verify whether the variable’s distribution matches expectations (for example, symmetry, etc.)

RELATIONSHIPS

Very often, we want to examine relationships between variables, not individual variables’ distributions. This means thinking carefully about what types of variables are in the (potential) relationship, and how we can best show their values graphically.