13 Distributions
Histograms, density plots, boxplots, violins, and ridgelines for exploring the shape and spread of data.
(ns plotje-book.distributions
(:require
;; Rdatasets -- standard datasets
[scicloj.metamorph.ml.rdatasets :as rdatasets]
;; Kindly -- notebook rendering protocol
[scicloj.kindly.v4.kind :as kind]
;; Plotje -- composable plotting
[scicloj.plotje.api :as pj]))Histogram
Distribution of sepal length across all species.
(-> (rdatasets/datasets-iris)
(pj/lay-histogram :sepal-length))Colored Histogram
Split by species β each group gets its own color.
(-> (rdatasets/datasets-iris)
(pj/lay-histogram :sepal-length {:color :species}))Petal Width Histogram
Petal width has a bimodal distribution.
(-> (rdatasets/datasets-iris)
(pj/lay-histogram :petal-width))Histogram with Custom Title
(-> (rdatasets/reshape2-tips)
(pj/lay-histogram :total-bill)
(pj/options {:title "Distribution of Total Bill"
:x-label "Amount ($)"}))Density-Normalized Histogram
Pass {:normalize :density} so the y-axis shows probability density instead of raw counts. This makes the histogram directly comparable with a density curve overlay.
(-> (rdatasets/datasets-iris)
(pj/lay-histogram :sepal-length {:normalize :density :alpha 0.5})
pj/lay-density)Density Plot
A smooth curve estimating the probability density function. Less sensitive to bin width than histograms.
(-> (rdatasets/datasets-iris)
(pj/lay-density :sepal-length))Grouped Density
Per-species density curves with automatic color mapping.
(-> (rdatasets/datasets-iris)
(pj/lay-density :sepal-length {:color :species}))Density with Custom Bandwidth
A narrow bandwidth reveals more detail; a wide bandwidth smooths more.
(-> (rdatasets/datasets-iris)
(pj/lay-density :sepal-length {:bandwidth 0.3}))Rug
A rug shows the raw data positions as short tick marks along the axis. Layered with a density curve, it shows the smooth shape and the underlying observations together.
(-> (rdatasets/datasets-iris)
(pj/lay-density :sepal-length)
pj/lay-rug)Boxplot
Median, quartiles, whiskers at 1.5xIQR (interquartile range), and outlier points.
(-> (rdatasets/datasets-iris)
(pj/lay-boxplot :species :sepal-width))The 1.5xIQR claim is structural: each whisker stays within the Tukey fence [Q1 - 1.5*IQR, Q3 + 1.5*IQR], and every outlier falls outside it.
(let [plan (-> (rdatasets/datasets-iris)
(pj/lay-boxplot :species :sepal-width)
pj/plan)
box-layer (first (filter #(= :boxplot (:mark %))
(:layers (first (:panels plan)))))]
(mapv (fn [{:keys [q1 q3 whisker-lo whisker-hi outliers]}]
(let [iqr (- q3 q1)
lo-fence (- q1 (* 1.5 iqr))
hi-fence (+ q3 (* 1.5 iqr))]
{:whisker-lo-in-fence (>= whisker-lo lo-fence)
:whisker-hi-in-fence (<= whisker-hi hi-fence)
:outliers-outside-fence
(every? (fn [o] (or (< o lo-fence) (> o hi-fence)))
outliers)}))
(:boxes box-layer)))[{:whisker-lo-in-fence true,
:whisker-hi-in-fence true,
:outliers-outside-fence true}
{:whisker-lo-in-fence true,
:whisker-hi-in-fence true,
:outliers-outside-fence true}
{:whisker-lo-in-fence true,
:whisker-hi-in-fence true,
:outliers-outside-fence true}]Grouped Boxplot
Side-by-side boxplots colored by a grouping variable.
(-> (rdatasets/reshape2-tips)
(pj/lay-boxplot :day :total-bill {:color :smoker}))Verify dodge positioning: each color group gets a distinct offset.
(let [plan (-> (rdatasets/reshape2-tips)
(pj/lay-boxplot :day :total-bill {:color :smoker})
pj/plan)
panel (first (:panels plan))
box-layer (first (filter #(= :boxplot (:mark %)) (:layers panel)))
cats (:color-categories box-layer)]
(count cats))2Horizontal Boxplot
Flipped coordinate for horizontal orientation.
(-> (rdatasets/datasets-iris)
(pj/lay-boxplot :species :sepal-width)
(pj/coord :flip))Violin Plot
A violin shows the full density shape per category β more informative than a boxplot for multimodal distributions.
(-> (rdatasets/reshape2-tips)
(pj/lay-violin :day :total-bill))Grouped Violin
Color splits each category into side-by-side violins.
(-> (rdatasets/reshape2-tips)
(pj/lay-violin :day :total-bill {:color :smoker}))Verify dodge positioning: each color group gets a distinct offset.
(let [plan (-> (rdatasets/reshape2-tips)
(pj/lay-violin :day :total-bill {:color :smoker})
pj/plan)
panel (first (:panels plan))
viol-layer (first (filter #(= :violin (:mark %)) (:layers panel)))
cats (:color-categories viol-layer)]
(count cats))2Horizontal Violin
(-> (rdatasets/datasets-iris)
(pj/lay-violin :species :petal-length)
(pj/coord :flip))Ridgeline Plot
Overlapping density curves stacked vertically by category β good for comparing distribution shapes across many groups.
(-> (rdatasets/datasets-iris)
(pj/lay-ridgeline :species :sepal-length))Colored Ridgeline
Map color to the same categorical column for distinct curves.
(-> (rdatasets/datasets-iris)
(pj/lay-ridgeline :species :sepal-length {:color :species}))Comparing Multiple Columns
Pass a vector of column names to pj/lay-histogram (or any lay-* function) to create one panel per column. This is useful for comparing the shape of different variables side by side.
(pj/lay-histogram (rdatasets/datasets-iris) [:sepal-length :sepal-width :petal-length])Combine with :color to see group differences within each column.
(pj/lay-density (rdatasets/datasets-iris) [:sepal-length :sepal-width :petal-length] {:color :species})The multi-column vector works with any lay-* function β histograms, density curves, boxplots, violin plots, and more.