10  Distributions

Histograms, density plots, boxplots, violins, and ridgelines for exploring the shape and spread of data.

(ns napkinsketch-book.distributions
  (:require
   ;; Shared datasets for these docs
   [napkinsketch-book.datasets :as data]
   ;; Kindly β€” notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Napkinsketch β€” composable plotting
   [scicloj.napkinsketch.api :as sk]))

Histogram

Distribution of sepal length across all species.

(-> data/iris
    (sk/lay-histogram :sepal_length))
sepal length4.55.05.56.06.57.07.58.00510152025

Colored Histogram

Split by species β€” each group gets its own color.

(-> data/iris
    (sk/lay-histogram :sepal_length {:color :species}))
sepal lengthspeciessetosaversicolorvirginica4.55.05.56.06.57.07.58.00246810121416

Petal Width Histogram

Petal width has a bimodal distribution.

(-> data/iris
    (sk/lay-histogram :petal_width))
petal width0.00.20.40.60.81.01.21.41.61.82.02.22.42.60510152025303540

Histogram with Custom Title

(-> data/tips
    (sk/lay-histogram :total_bill)
    (sk/options {:title "Distribution of Total Bill"
                 :x-label "Amount ($)"}))
Distribution of Total BillAmount ($)5101520253035404550010203040506070

Density-Normalized Histogram

Pass {:normalize :density} so the y-axis shows probability density instead of raw counts. This makes the histogram directly comparable with a density curve overlay.

(-> data/iris
    (sk/lay-histogram :sepal_length {:normalize :density :alpha 0.5})
    sk/lay-density)
sepal length3456789100.00.050.10.150.20.250.30.350.40.450.5

Density Plot

A smooth curve estimating the probability density function. Less sensitive to bin width than histograms.

(-> data/iris
    (sk/lay-density :sepal_length))
sepal length3456789100.00.050.10.150.20.250.30.350.4

Grouped Density

Per-species density curves with automatic color mapping.

(-> data/iris
    (sk/lay-density :sepal_length {:color :species}))
sepal lengthspeciessetosaversicolorvirginica4567890.00.20.40.60.81.01.2

Density with Custom Bandwidth

A narrow bandwidth reveals more detail; a wide bandwidth smooths more.

(-> data/iris
    (sk/lay-density :sepal_length {:bandwidth 0.3}))
sepal length3456789100.00.050.10.150.20.250.30.350.4

Boxplot

Median, quartiles, whiskers at 1.5Γ—IQR (interquartile range), and outlier points.

(-> data/iris
    (sk/lay-boxplot :species :sepal_width))
sepal widthspeciessetosaversicolorvirginica2.02.53.03.54.04.5

Grouped Boxplot

Side-by-side boxplots colored by a grouping variable.

(-> data/tips
    (sk/lay-boxplot :day :total_bill {:color :smoker}))
total billdaysmokerNoYesSunSatThurFri5101520253035404550

Verify dodge positioning: each color group gets a distinct offset.

(let [pl (-> data/tips
             (sk/lay-boxplot :day :total_bill {:color :smoker})
             sk/plan)
      panel (first (:panels pl))
      box-layer (first (filter #(= :boxplot (:mark %)) (:layers panel)))
      cats (:color-categories box-layer)]
  (count cats))
2

Horizontal Boxplot

Flipped coordinate for horizontal orientation.

(-> data/iris
    (sk/lay-boxplot :species :sepal_width)
    (sk/coord :flip))
speciessepal width2.02.22.42.62.83.03.23.43.63.84.04.24.4setosaversicolorvirginica

Violin Plot

A violin shows the full density shape per category β€” more informative than a boxplot for multimodal distributions.

(-> data/tips
    (sk/lay-violin :day :total_bill))
total billdaySunSatThurFri-20-10010203040506070

Grouped Violin

Color splits each category into side-by-side violins.

(-> data/tips
    (sk/lay-violin :day :total_bill {:color :smoker}))
total billdaysmokerNoYesSunSatThurFri-20-10010203040506070

Verify dodge positioning: each color group gets a distinct offset.

(let [pl (-> data/tips
             (sk/lay-violin :day :total_bill {:color :smoker})
             sk/plan)
      panel (first (:panels pl))
      viol-layer (first (filter #(= :violin (:mark %)) (:layers panel)))
      cats (:color-categories viol-layer)]
  (count cats))
2

Horizontal Violin

(-> data/iris
    (sk/lay-violin :species :petal_length)
    (sk/coord :flip))
speciespetal length12345678setosaversicolorvirginica

Ridgeline Plot

Overlapping density curves stacked vertically by category β€” good for comparing distribution shapes across many groups.

(-> data/iris
    (sk/lay-ridgeline :species :sepal_length))
speciessepal length456789setosaversicolorvirginica

Colored Ridgeline

Map color to the same categorical column for distinct curves.

(-> data/iris
    (sk/lay-ridgeline :species :sepal_length {:color :species}))
speciessepal lengthspeciessetosaversicolorvirginica456789setosaversicolorvirginica

Comparing Multiple Columns

sk/distribution creates side-by-side histograms for multiple columns β€” useful when you want to compare the shape of different variables. Each column gets its own facet panel.

(-> (sk/distribution data/iris :sepal_length :sepal_width :petal_length)
    sk/lay-histogram)
010205sepal lengthsepal widthpetal lengthsepal lengthsepal widthpetal length

Combine with :color to see group differences within each column.

(-> (sk/distribution data/iris :sepal_length :sepal_width :petal_length)
    (sk/lay-density {:color :species}))
speciessetosaversicolorvirginica0.00.51.05sepal lengthsepal widthpetal lengthsepal lengthsepal widthpetal length

All distribution methods support :color for group comparisons and compose freely with other layers and facets.

What’s Next

  • Ranking β€” bar charts and lollipop plots for categorical comparisons
  • Faceting β€” split distributions by groups into separate panels
source: notebooks/napkinsketch_book/distributions.clj