LSS 2026 Workshop: Data Visualization

Dr. Grace Tompkins, Department of Statistics

Welcome!

Who am I?

  • My name is Grace
  • I’m a Biostatistician and Assistant Professor of Teaching (Dept. of Statistics)

Who is this workshop for?

  • If you’re new(ish) to R or relatively experienced, you’re in the right place!
  • This workshop will go at your pace - just keep in mind we only have an hour together

Goals of this Workshop

By the end of this workshop, attendees should be able to:

  • Interact with data in R more confidently
  • Create plots using ggplot2

How to Interact with R in This Workshop

  • These slides are built with Quarto + WebR, meaning you can code in them without downloading any software!

  • Your code should stay on the web address, but I recommend saving a PDF by [right clicking] -> [Print…] -> [Save/Export to PDF] :)

  • For your own research/projects, I highly recommend checking out how to use RStudio! Here’s a video on how to use RStudio.

Data Visualization: Why Do We Use It?

  • ❌ Tables are hard to digest.
  • ✅ Visualizations can convey more complex ideas faster
  • ✅ Visualizations can help us answer descriptive and exploratory questions
    • ⚠️ They do not answer predictive, inferential, or causal questions

Great Data Visualizations

“A good visualization will clearly answer your question without distraction; a great visualization will suggest even what the question was itself without additional explanation.” - A First Introduction to Data Science


How do we accomplish this? Keep this SASS-y

  • Simple (plot is as simple as possible, minimizing distractions)

  • Accessible (colourblind-friendly pallettes are used, text is human-readable)

  • Specific (the purpose of the plot is clear, and explores a specific research question)

  • Scaled (small differences are not blown up, proportionality is maintained)

Data Visualizations

Which principles of good data visualization are met/violated here?

Data Visualizations

(Numbers based off previous image which is… questionably sourced :-) )

Making Nice Plots in R

A crowd favourite package for plotting in R is ggplot2 (which has the ggplot() function).

There are three key aspects of plots in ggplot2:

  1. aesthetic mappings: relates dataframe columns to visual properties

  2. geometric objects: disctates how to display those visual properties (type of plot, for example)

  3. scales: transforms variables, sets limits

We add these layers one by one using +


Important Note

Building plots is an iterative procedure. Try things, make mistakes, and refine!

Demo

Today’s Demo: Penguins!

We will explore the penguins data set from the palmerpenguins package in R.

Research Questions

An interesting research question may be how some of these measurements within a penguin are related. Let’s investigate the following:


Descriptive Visualizations

  • What species of penguins are in my data set? How many of each sex?

  • What does the distribution of bill length measurements look like?

Exploratory Visualizations

  • Do penguins with longer bills also tend to have wider bills?

  • Do certain species of penguin tend to have larger bills?

Choosing a Data Visualization

Typically a data visualization is 2-Dimensional (think of drawing a plot on a piece of paper). We can easily visualize the relationship between two variables.


What plot should I use?

  • Scatterplots: used to visualize two quantitative (numeric) variables
  • Line plots: used to visualize trends with respect to an independent quantity (like time)
  • Bar plots: used to visualize the comparison of amounts (categorical variables). Can be stacked or grouped to show the relationships across another categorical variable.
  • Box plot or histograms: used to visualize distributions, perhaps across groups.

Choosing a Data Visualization: Counting Penguin Species

Let’s visualize the types of penguins (species and sex) in our study.

If we wanted to show the counts of each penguin species, which plot should we use?


✅ A bar plot! We can plot this using geom_bar() in ggplot.

Bar Plot (geom_bar())

Let’s visualize how many penguins of each species we have in our data set

Bar Plot (geom_bar()): Elevated!

Let’s get the counts by sex, too!

Choosing a Data Visualization

Let’s describe the distribution of the bill lengths in our data set. What visualization should we use?

✅ Histogram! (Or… a box plot)

Histogram (geom_hist())

Choosing a Data Visualization

Let’s visualize bill depth and bill width, which are measured in millimetres.


What visualization should we use?

✅ Scatterplot! We plot this using the geom_point() object within ggplot.

Scatterplot (geom_point())

Scatterplots: Grouping by a Categorical Variable

What about looking at this relationship between species? Do certain species of penguin tend to have larger bills?


We can group by colour/shape to see if there are trends within/between penguin species!

Try it yourself:

Are body mass and flipper lengths in penguins related? Does this vary by sex?

Other Data Visualization Tips

  • Avoid pie charts.

  • Avoid 3D visualizations

  • ….. Absolutely avoid 3D pie charts

Ugly Pie Chart

Disgusting 3D Plot

Looking for more?

🛑 Before you click any links, save your work as a pdf!

[Right Click] -> […Print] -> [Save/Export as PDF] (or right click to open links in a new tab)


Thank you!

Written materials from this presentation can be found here: