12 Distributions
Histograms, density plots, boxplots, violins, and ridgelines for exploring the shape and spread of data.
(ns plotje-book.distributions
(:require
;; Rdatasets -- standard datasets
[scicloj.metamorph.ml.rdatasets :as rdatasets]
;; Kindly -- notebook rendering protocol
[scicloj.kindly.v4.kind :as kind]
;; Plotje -- composable plotting
[scicloj.plotje.api :as pj]))Histogram
Distribution of sepal length across all species.
(-> (rdatasets/datasets-iris)
(pj/lay-histogram :sepal-length))Colored Histogram
Split by species β each group gets its own color.
(-> (rdatasets/datasets-iris)
(pj/lay-histogram :sepal-length {:color :species}))Petal Width Histogram
Petal width has a bimodal distribution.
(-> (rdatasets/datasets-iris)
(pj/lay-histogram :petal-width))Histogram with Custom Title
(-> (rdatasets/reshape2-tips)
(pj/lay-histogram :total-bill)
(pj/options {:title "Distribution of Total Bill"
:x-label "Amount ($)"}))Density-Normalized Histogram
Pass {:normalize :density} so the y-axis shows probability density instead of raw counts. This makes the histogram directly comparable with a density curve overlay.
(-> (rdatasets/datasets-iris)
(pj/lay-histogram :sepal-length {:normalize :density :alpha 0.5})
pj/lay-density)Log-Scale Histogram
When bin counts span orders of magnitude, the smallest bars disappear next to the largest on a linear y-axis. A log y-scale lets every bar register. The data below doubles the count of each successive bin (1, 2, 4, β¦, 512); on a log axis the doubling shows as a uniform staircase β each bar a fixed step above the previous, the same step every time.
The lower bound under log comes from the smallest positive bin count, not from the visual zero baseline β log scales have no zero. Empty bins emit no bar and do not pull the axis down.
(-> {:x (mapcat (fn [i] (repeat (long (Math/pow 2 i)) i)) (range 10))}
(pj/lay-histogram {:bins 10})
(pj/scale :y :log)
(pj/options {:title "Log Y on Histogram"}))Density Plot
A smooth curve estimating the probability density function. Less sensitive to bin width than histograms.
(-> (rdatasets/datasets-iris)
(pj/lay-density :sepal-length))Grouped Density
Per-species density curves with automatic color mapping.
(-> (rdatasets/datasets-iris)
(pj/lay-density :sepal-length {:color :species}))Density with Custom Bandwidth
A narrow bandwidth reveals more detail; a wide bandwidth smooths more.
(-> (rdatasets/datasets-iris)
(pj/lay-density :sepal-length {:bandwidth 0.3}))Rug
A rug shows the raw data positions as short tick marks along the axis. Layered with a density curve, it shows the smooth shape and the underlying observations together.
(-> (rdatasets/datasets-iris)
(pj/lay-density :sepal-length)
pj/lay-rug)Strip Plot (Jitter)
When plotting a numeric column against a categorical column, points stack on the same band positions. :jitter true spreads them with small random offsets along the categorical axis.
(-> (rdatasets/datasets-iris)
(pj/lay-point :species :sepal-width {:jitter true}))Pass a number to control the jitter amount in drawing units.
(-> (rdatasets/datasets-iris)
(pj/lay-point :species :sepal-width {:jitter 10 :alpha 0.5}))Boxplot
Median, quartiles, whiskers at 1.5xIQR (interquartile range), and outlier points.
(-> (rdatasets/datasets-iris)
(pj/lay-boxplot :species :sepal-width))The 1.5xIQR claim is structural: each whisker stays within the Tukey fence [Q1 - 1.5*IQR, Q3 + 1.5*IQR], and every outlier falls outside it.
Grouped Boxplot
Side-by-side boxplots colored by a grouping variable.
(-> (rdatasets/reshape2-tips)
(pj/lay-boxplot :day :total-bill {:color :smoker}))Each color group gets a distinct dodge offset, visible as side-by-side boxes within each day.
Horizontal Boxplot
Flipped coordinate for horizontal orientation.
(-> (rdatasets/datasets-iris)
(pj/lay-boxplot :species :sepal-width)
(pj/coord :flip))Violin Plot
A violin shows the full density shape per category β more informative than a boxplot for multimodal distributions.
(-> (rdatasets/reshape2-tips)
(pj/lay-violin :day :total-bill))Grouped Violin
Color splits each category into side-by-side violins.
(-> (rdatasets/reshape2-tips)
(pj/lay-violin :day :total-bill {:color :smoker}))Each color group gets a distinct dodge offset, visible as side-by-side violins within each day.
Horizontal Violin
(-> (rdatasets/datasets-iris)
(pj/lay-violin :species :petal-length)
(pj/coord :flip))Ridgeline Plot
Overlapping density curves stacked vertically by category β good for comparing distribution shapes across many groups.
(-> (rdatasets/datasets-iris)
(pj/lay-ridgeline :species :sepal-length))Colored Ridgeline
Map color to the same categorical column for distinct curves.
(-> (rdatasets/datasets-iris)
(pj/lay-ridgeline :species :sepal-length {:color :species}))Comparing Multiple Columns
Pass a vector of column names to pj/lay-histogram (or any lay-* function) to create one panel per column. This is useful for comparing the shape of different variables side by side.
(pj/lay-histogram (rdatasets/datasets-iris) [:sepal-length :sepal-width :petal-length])Combine with :color to see group differences within each column.
(pj/lay-density (rdatasets/datasets-iris) [:sepal-length :sepal-width :petal-length] {:color :species})The multi-column vector works with any lay-* function β histograms, density curves, boxplots, violin plots, and more.
See Also
- Core Concepts β mappings and aesthetics referenced throughout
- Relationships β two-distribution comparisons via heatmap, contour, and SPLOM