12  Scatter Plots

Point mark variations – color, size, alpha, shape, jitter, and continuous color scale.

(ns plotje-book.scatter
  (:require
   ;; Kindly -- notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Rdatasets -- standard datasets
   [scicloj.metamorph.ml.rdatasets :as rdatasets]
   ;; Plotje -- composable plotting
   [scicloj.plotje.api :as pj]))

Basic Scatter

Sepal dimensions, no color – the default mark.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width))
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Colored by Species

Adding :color :species groups points by species with distinct colors.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width {:color :species}))
sepal widthsepal lengthspeciessetosaversicolorvirginica4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Petal Dimensions

Petal length vs width – a strongly correlated pair.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :petal-length :petal-width {:color :species}))
petal widthpetal lengthspeciessetosaversicolorvirginica12345670.00.51.01.52.02.5

Fixed Color

A fixed color string (not a column reference) applies to all points.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width {:color "#E74C3C"}))
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Custom Dimensions

Wider plot with custom title and labels.

(-> (rdatasets/reshape2-tips)
    (pj/lay-point :total-bill :tip {:color :day})
    (pj/options {:width 700 :height 300
                 :title "Tips by Day"
                 :x-label "Total Bill ($)"
                 :y-label "Tip ($)"}))
Tips by DayTip ($)Total Bill ($)daySunSatThurFri5101520253035404550246810

Bubble Plot

Map :size to a numeric column to create a bubble plot. Each point’s radius reflects the column value.

(-> (rdatasets/reshape2-tips)
    (pj/lay-point :total-bill :tip {:color :day :size :size}))
tiptotal billdaySunSatThurFrisize1.02.03.04.05.06.0102030405012345678910

Combine size with alpha for dense data.

(-> (rdatasets/reshape2-tips)
    (pj/lay-point :total-bill :tip {:color :day :size :size :alpha 0.6}))
tiptotal billdaySunSatThurFrisize1.02.03.04.05.06.0102030405012345678910

Jitter

When plotting a numeric column against a categorical column, points overlap. Use :jitter true to add random pixel offsets.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :species :sepal-width {:jitter true}))
sepal widthspeciessetosaversicolorvirginica2.02.53.03.54.04.5

Control the jitter amount in pixels.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :species :sepal-width {:jitter 10 :alpha 0.5}))
sepal widthspeciessetosaversicolorvirginica2.02.53.03.54.04.5

Continuous Color

When :color maps to a numeric column, Plotje uses a continuous blue gradient instead of discrete palette colors.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width {:color :petal-length}))
sepal widthsepal lengthpetal length1.0006.9004.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Continuous color with size – a color-size bubble plot.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width {:color :petal-length :size :petal-width :alpha 0.7}))
sepal widthsepal lengthpetal length1.0006.900petal width0.51.01.52.02.54.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Shape by Species

Map :shape to a categorical column to render each group with a different marker shape. Useful for monochrome printing or to reinforce the color encoding.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width {:shape :species}))
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Scatter Plot Matrix (SPLOM)

pj/cross generates all combinations of two lists. Passing column names produces a grid of scatter plots – one per pair of variables. The diagonal shows histograms (automatic inference for same-column pairs).

(def cols [:sepal-length :sepal-width :petal-length :petal-width])
(-> (rdatasets/datasets-iris)
    (pj/pose (pj/cross cols cols) {:color :species}))
01066126121212sepal-lengthsepal-widthpetal-lengthpetal-widthsepal-lengthsepal-widthpetal-lengthpetal-widthspeciessetosaversicolorvirginica

Per-cell inference picks the layer type for each panel: diagonal cells (x = y) get histograms; off-diagonal cells get scatter plots. All panels share the color aesthetic set at the composite root.

See the Faceting chapter for more SPLOM variations, and the Customization chapter for brush selection.

What’s Next

source: notebooks/plotje_book/scatter.clj