16 Relationships
Scatter plots, regression, smoothing, density estimation, and heatmaps β revealing structure between two variables.
Scatter is the foundation. Each row becomes a point in the plane, and the eye reads structure off the cloud. Regression and smoothing draw trend lines through it; 2D density and contours reveal where the cloud is dense or sparse; the scatter-plot matrix (SPLOM) at the end shows every pair of columns at once.
(ns plotje-book.relationships
(:require
;; Kindly -- notebook rendering protocol
[scicloj.kindly.v4.kind :as kind]
;; Rdatasets -- standard datasets
[scicloj.metamorph.ml.rdatasets :as rdatasets]
;; Plotje -- composable plotting
[scicloj.plotje.api :as pj]
;; Fastmath -- random number generation
[fastmath.random :as rng]))Basic Scatter
Sepal dimensions, no color β the default mark.
(-> (rdatasets/datasets-iris)
(pj/lay-point :sepal-length :sepal-width))Colored by Species
Adding :color :species groups points by species with distinct colors.
(-> (rdatasets/datasets-iris)
(pj/lay-point :sepal-length :sepal-width {:color :species}))Petal Dimensions
Petal length vs width β a strongly correlated pair, set up here as the running example for the regression sections below.
(-> (rdatasets/datasets-iris)
(pj/lay-point :petal-length :petal-width {:color :species}))Linear Regression
A single regression line through all data.
(-> (rdatasets/datasets-iris)
(pj/lay-point :sepal-length :sepal-width)
(pj/lay-smooth {:stat :linear-model}))Per-Group Regression
Fit a regression line per group.
(-> (rdatasets/datasets-iris)
(pj/pose :petal-length :petal-width {:color :species})
pj/lay-point
(pj/lay-smooth {:stat :linear-model}))Regression with Confidence Ribbon
Pass {:confidence-band true} to show a 95% confidence band around the line.
(-> (rdatasets/datasets-iris)
(pj/pose :sepal-length :sepal-width {:color :species})
pj/lay-point
(pj/lay-smooth {:stat :linear-model :confidence-band true}))Pass :level to widen or narrow the band. A 99% interval covers more of the regressionβs uncertainty than the default 95%; an 80% interval covers less:
(-> (rdatasets/datasets-iris)
(pj/pose :sepal-length :sepal-width)
pj/lay-point
(pj/lay-smooth {:stat :linear-model :confidence-band true :level 0.80}))(-> (rdatasets/datasets-iris)
(pj/pose :sepal-length :sepal-width)
pj/lay-point
(pj/lay-smooth {:stat :linear-model :confidence-band true :level 0.99}))Tips with Regression
Do smokers and non-smokers tip differently?
(-> (rdatasets/reshape2-tips)
(pj/pose :total-bill :tip {:color :smoker})
pj/lay-point
(pj/lay-smooth {:stat :linear-model}))LOESS Smoothing
A smooth curve through noisy data.
(-> (let [r (rng/rng :jdk 42)
xs (vec (range 50))]
{:x xs
:y (mapv #(+ (Math/sin (* % 0.2))
(* 0.3 (- (rng/drandom r) 0.5)))
xs)})
(pj/lay-point :x :y)
(pj/lay-smooth {:bandwidth 0.2}))Heatmap (Auto-Binned)
Bin x and y into a grid, count points per cell.
(-> (rdatasets/datasets-iris)
(pj/lay-tile :sepal-length :sepal-width))Heatmap (Pre-Computed)
Use a numeric column for tile color.
(def grid-data
(let [r (rng/rng :jdk 99)]
{:x (for [i (range 5) _j (range 5)] i)
:y (for [_i (range 5) j (range 5)] j)
:value (repeatedly 25 #(rng/irandom r 100))}))(-> grid-data
(pj/lay-tile :x :y {:fill :value}))Density 2D
KDE-smoothed 2D density heatmap.
(-> (rdatasets/datasets-iris)
(pj/lay-density-2d :sepal-length :sepal-width))Density 2D with Points
Overlay scatter points on the density heatmap.
(-> (rdatasets/datasets-iris)
(pj/lay-density-2d :sepal-length :sepal-width)
(pj/lay-point {:alpha 0.5}))Contour Lines
Iso-density contour lines from 2D KDE.
(-> (rdatasets/datasets-iris)
(pj/lay-contour :sepal-length :sepal-width))Contour with Points
Contour lines overlaid on scatter points.
(-> (rdatasets/datasets-iris)
(pj/lay-point :sepal-length :sepal-width {:alpha 0.3})
(pj/lay-contour {:levels 8}))Scatter Plot Matrix (SPLOM)
pj/cross generates all combinations of two lists. Passing column names produces a grid of scatter plots β one per pair of variables. The diagonal shows histograms (automatic inference for same-column pairs).
Start small: two variables crossed with themselves give a 2x2 grid. Off-diagonal cells (where the row and column variables differ) get scatter plots; diagonal cells (where they match) get histograms.
(def small-cols [:sepal-length :petal-length])(-> (rdatasets/datasets-iris)
(pj/pose (pj/cross small-cols small-cols) {:color :species}))The full 4x4 SPLOM follows the same pattern with irisβs four numeric columns:
(def cols [:sepal-length :sepal-width :petal-length :petal-width])(-> (rdatasets/datasets-iris)
(pj/pose (pj/cross cols cols) {:color :species}))Per-cell inference picks the layer type for each panel: diagonal cells (x = y) get histograms; off-diagonal cells get scatter plots. All panels share the color aesthetic set at the composite root.
See the Faceting chapter for more SPLOM variations, and the Customization chapter for brush selection.
See Also
- Composition β composite poses (the SPLOM is one) and shared scales
- Distributions β one-variable shape and spread
Whatβs Next
- Faceting β split any chart into panels by category
- Polar Coordinates β radial charts and pie-style visualizations
- Customization β mark styling, palettes, and themes