14 Intro to data visualization with Tableplot
This tutorial will guide us through an exploration of the classic Iris dataset using the Tableplot library in Clojure. We will demonstrate how to use Tableplot’s Plotly API to create various visualizations, while explaining the core ideas and functionality of the API.
14.1 Setup
ns noj-book.tableplot-datavis-intro
(:require [scicloj.tableplot.v1.plotly :as plotly]
(:as tc]
[tablecloth.api :as datasets])) [noj-book.datasets
14.2 Introduction
Tableplot is a Clojure library for creating data visualizations using a functional grammar inspired by ggplot2 and the layered grammar of graphics. It allows for composable plots, where layers can be built up incrementally and data transformations can be seamlessly integrated.
In this tutorial, we will:
- Inspect the Iris dataset using Tablecloth.
- Create various types of plots using Tableplot’s Plotly API.
- Explore the relationships between different variables in the dataset.
- Demonstrate how to customize plots and use different features of the API.
14.3 Looking into the Iris Dataset
First, let’s look into the Iris dataset we have read in the datasets chapter.
datasets/iris
https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv [150 6]:
:rownames | :sepal-length | :sepal-width | :petal-length | :petal-width | :species |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
… | … | … | … | … | … |
140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
The Iris dataset contains measurements for 150 iris flowers from three species (setosa
, versicolor
, virginica
). The variables are:
sepal-length
: Length of the sepal (cm)sepal-width
: Width of the sepal (cm)petal-length
: Length of the petal (cm)petal-width
: Width of the petal (cm)species
: Species of the iris flower
14.4 Scatter Plot
Let’s start by creating a simple scatter plot to visualize the relationship between sepal-length
and sepal-width
.
-> datasets/iris
(
(plotly/layer-point:sepal-length
{:=x :sepal-width})) :=y
This plot shows the distribution of sepal length and width for the flowers in the dataset.
14.4.1 Adding Color by Species
To distinguish between the different species, we can add color encoding based on the species
column.
-> datasets/iris
(
(plotly/layer-point:sepal-length
{:=x :sepal-width
:=y :species})) :=color
Now, each species is represented by a different color, making it easier to see any patterns or differences between them.
14.5 Exploring Petal Measurements
Next, let’s explore how petal measurements vary across species.
-> datasets/iris
(
(plotly/layer-point:petal-length
{:=x :petal-width
:=y :species})) :=color
This plot shows a clearer separation between species based on petal measurements compared to sepal measurements.
14.6 Combining Sepal and Petal Measurements
We can create a scatter plot matrix (SPLOM) to visualize the relationships between all pairs of variables.
-> datasets/iris
(
(plotly/splom:sepal-length :sepal-width :petal-length :petal-width]
{:=colnames [:species
:=color 600
:=height 600})) :=width
The SPLOM shows pairwise scatter plots for all combinations of the selected variables, with points colored by species.
14.7 Histograms
Let’s create histograms to explore the distribution of sepal-length
.
-> datasets/iris
(
(plotly/layer-histogram:sepal-length
{:=x "count"
:=histnorm 20})) :=histogram-nbins
14.7.1 Histograms by Species
To see how the distribution of sepal-length
varies by species, we can add color encoding.
-> datasets/iris
(
(plotly/layer-histogram:sepal-length
{:=x :species
:=color "count"
:=histnorm 20
:=histogram-nbins 0.7})) :=mark-opacity
14.8 Box Plots
Box plots are useful for comparing distributions across categories.
-> datasets/iris
(
(plotly/layer-boxplot:sepal-length
{:=y :species})) :=x
This box plot shows the distribution of sepal-length
for each species.
14.9 Violin Plots
Violin plots provide a richer representation of the distribution.
-> datasets/iris
(
(plotly/layer-violin:sepal-length
{:=y :species
:=x true
:=box-visible true})) :=meanline-visible
14.10 Scatter Plot with Trend Lines
We can add a smoothing layer to show trend lines in the data.
-> datasets/iris
(
(plotly/base:sepal-length
{:=x :sepal-width
:=y :species})
:=color
plotly/layer-point plotly/layer-smooth)
This plot shows a scatter plot of sepal measurements with trend lines added for each species.
14.11 Customizing Plots
Tableplot allows for customization of plot aesthetics.
14.11.1 Changing Marker Sizes
-> datasets/iris
(
(plotly/layer-point:sepal-length
{:=x :sepal-width
:=y :species
:=color :species
:=symbol 15})) :=mark-size
14.11.2 Changing Marker Color (for all marks)
-> datasets/iris
(
(plotly/layer-point:sepal-length
{:=x :sepal-width
:=y :species
:=symbol 15
:=mark-size :darkblue})) :=mark-color
14.11.3 Adjusting Opacity
-> datasets/iris
(
(plotly/layer-point:sepal-length
{:=x :sepal-width
:=y :species
:=color 15
:=mark-size 0.6})) :=mark-opacity
14.12 3d Scatter Plot
We can create a 3d scatter plot to visualize relationships in three dimensions.
-> datasets/iris
(
(plotly/layer-point:sepal-length
{:=x :sepal-width
:=y :petal-length
:=z :species
:=color :3d
:=coordinates 5})) :=mark-size
14.13 Conclusion
In this tutorial, we have explored the Iris dataset using the Tableplot library in Clojure. We demonstrated how to create various types of plots, customize them, and explore relationships in the data.
Tableplot’s API is designed to be intuitive and flexible, allowing for the creation of complex plots with simple, composable functions.
For more information and advanced usage, refer to the Tableplot documentation.
14.14 Appendix: Understanding the Tableplot API
The core idea of the Tableplot API is to build plots by composing layers. Each layer corresponds to a visual representation of data, such as points, lines, bars, etc.
14.14.1 Basic Functions
plotly/layer-point
: Adds a scatter plot layer with points.plotly/layer-line
: Adds a line plot layer.plotly/layer-bar
: Adds a bar plot layer.plotly/layer-boxplot
: Adds a box plot layer.plotly/layer-violin
: Adds a violin plot layer.plotly/layer-histogram
: Adds a histogram layer.plotly/layer-smooth
: Adds a smoothing layer (trend line).plotly/splom
: Creates a scatter plot matrix (SPLOM).
14.14.2 Parameters
Parameters are provided as a map, with keys prefixed by :=
to distinguish them from dataset columns.
:=x
: The x-axis variable.:=y
: The y-axis variable.:=z
: The z-axis variable (for 3D plots).:=color
: Variable used to color the data points.:=symbol
: Variable used to determine marker symbols.:=mark-opacity
: Opacity of the markers.:=mark-size
: Size of the markers.:=mark-color
: Color of the markers.:=histogram-nbins
: Number of bins in the x-axis for histograms.:=box-visible
: Whether to show box plot inside violin plots.:=meanline-visible
: Whether to show mean line in violin plots.
14.14.3 Composing Plots
Plots are built by starting with a dataset and chaining layer functions.
comment
(-> dataset
(
(plotly/layer-point:x-variable
{:=x :y-variable}))) :=y
Multiple layers can be added to create more complex plots, sharing parameters defined in base
.
comment
(-> dataset
(
(plotly/base:x-variable
{:=x :y-variable})
:=y
(plotly/layer-point {... ...}) (plotly/layer-smooth {... ...})))