17  Relationships

Regression, smoothing, density estimation, and heatmaps – revealing structure between two variables.

(ns plotje-book.relationships
  (:require
   ;; Kindly -- notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Rdatasets -- standard datasets
   [scicloj.metamorph.ml.rdatasets :as rdatasets]
   ;; Plotje -- composable plotting
   [scicloj.plotje.api :as pj]
   ;; Fastmath -- random number generation
   [fastmath.random :as rng]))

Linear Regression

A single regression line through all data.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width)
    (pj/lay-smooth {:stat :linear-model}))
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Per-Group Regression

Fit a regression line per group.

(-> (rdatasets/datasets-iris)
    (pj/pose :petal-length :petal-width {:color :species})
    pj/lay-point
    (pj/lay-smooth {:stat :linear-model}))
petal widthpetal lengthspeciessetosaversicolorvirginica12345670.00.51.01.52.02.5

Regression with Confidence Ribbon

Pass {:confidence-band true} to show a 95% confidence band around the line.

(-> (rdatasets/datasets-iris)
    (pj/pose :sepal-length :sepal-width {:color :species})
    pj/lay-point
    (pj/lay-smooth {:stat :linear-model :confidence-band true}))
sepal widthsepal lengthspeciessetosaversicolorvirginica4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Tips with Regression

Do smokers and non-smokers tip differently?

(-> (rdatasets/reshape2-tips)
    (pj/pose :total-bill :tip {:color :smoker})
    pj/lay-point
    (pj/lay-smooth {:stat :linear-model}))
tiptotal billsmokerNoYes102030405012345678910

LOESS Smoothing

A smooth curve through noisy data.

(-> (let [r (rng/rng :jdk 42)
          xs (vec (range 50))]
      {:x xs
       :y (mapv #(+ (Math/sin (* % 0.2))
                    (* 0.3 (- (rng/drandom r) 0.5)))
                xs)})
    (pj/lay-point :x :y)
    (pj/lay-smooth {:bandwidth 0.2}))
yx05101520253035404550-1.2-1.0-0.8-0.6-0.4-0.20.00.20.40.60.81.0

Heatmap (Auto-Binned)

Bin x and y into a grid, count points per cell.

(-> (rdatasets/datasets-iris)
    (pj/lay-tile :sepal-length :sepal-width))
sepal widthsepal lengthcount0.0009.000no data4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Heatmap (Pre-Computed)

Use a numeric column for tile color.

(def grid-data
  (let [r (rng/rng :jdk 99)]
    {:x (for [i (range 5) _j (range 5)] i)
     :y (for [_i (range 5) j (range 5)] j)
     :value (repeatedly 25 #(rng/irandom r 100))}))
(-> grid-data
    (pj/lay-tile :x :y {:fill :value}))
yxfill0.00099.00no data0.00.51.01.52.02.53.03.54.00.00.51.01.52.02.53.03.54.0

Density 2D

KDE-smoothed 2D density heatmap.

(-> (rdatasets/datasets-iris)
    (pj/lay-density-2d :sepal-length :sepal-width))
sepal widthsepal lengthrelative density0.00027.10no data34567891.52.02.53.03.54.04.55.0

Density 2D with Points

Overlay scatter points on the density heatmap.

(-> (rdatasets/datasets-iris)
    (pj/lay-density-2d :sepal-length :sepal-width)
    (pj/lay-point {:alpha 0.5}))
sepal widthsepal lengthrelative density0.00027.1034567891.52.02.53.03.54.04.55.0

Contour Lines

Iso-density contour lines from 2D KDE.

(-> (rdatasets/datasets-iris)
    (pj/lay-contour :sepal-length :sepal-width))
sepal widthsepal lengthrelative density0.00027.10no data34567891.52.02.53.03.54.04.55.0

Contour with Points

Contour lines overlaid on scatter points.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width {:alpha 0.3})
    (pj/lay-contour {:levels 8}))
sepal widthsepal lengthrelative density0.00027.1034567891.52.02.53.03.54.04.55.0

What’s Next

  • Polar Coordinates – radial charts and pie-style visualizations
  • Faceting – split any chart into panels by category
source: notebooks/plotje_book/relationships.clj