10 Distributions
Histograms, density plots, boxplots, violins, and ridgelines for exploring the shape and spread of data.
(ns napkinsketch-book.distributions
(:require
;; Shared datasets for these docs
[napkinsketch-book.datasets :as data]
;; Kindly β notebook rendering protocol
[scicloj.kindly.v4.kind :as kind]
;; Napkinsketch β composable plotting
[scicloj.napkinsketch.api :as sk]))Histogram
Distribution of sepal length across all species.
(-> data/iris
(sk/lay-histogram :sepal_length))Colored Histogram
Split by species β each group gets its own color.
(-> data/iris
(sk/lay-histogram :sepal_length {:color :species}))Petal Width Histogram
Petal width has a bimodal distribution.
(-> data/iris
(sk/lay-histogram :petal_width))Histogram with Custom Title
(-> data/tips
(sk/lay-histogram :total_bill)
(sk/options {:title "Distribution of Total Bill"
:x-label "Amount ($)"}))Density-Normalized Histogram
Pass {:normalize :density} so the y-axis shows probability density instead of raw counts. This makes the histogram directly comparable with a density curve overlay.
(-> data/iris
(sk/lay-histogram :sepal_length {:normalize :density :alpha 0.5})
sk/lay-density)Density Plot
A smooth curve estimating the probability density function. Less sensitive to bin width than histograms.
(-> data/iris
(sk/lay-density :sepal_length))Grouped Density
Per-species density curves with automatic color mapping.
(-> data/iris
(sk/lay-density :sepal_length {:color :species}))Density with Custom Bandwidth
A narrow bandwidth reveals more detail; a wide bandwidth smooths more.
(-> data/iris
(sk/lay-density :sepal_length {:bandwidth 0.3}))Boxplot
Median, quartiles, whiskers at 1.5ΓIQR (interquartile range), and outlier points.
(-> data/iris
(sk/lay-boxplot :species :sepal_width))Grouped Boxplot
Side-by-side boxplots colored by a grouping variable.
(-> data/tips
(sk/lay-boxplot :day :total_bill {:color :smoker}))Verify dodge positioning: each color group gets a distinct offset.
(let [pl (-> data/tips
(sk/lay-boxplot :day :total_bill {:color :smoker})
sk/plan)
panel (first (:panels pl))
box-layer (first (filter #(= :boxplot (:mark %)) (:layers panel)))
cats (:color-categories box-layer)]
(count cats))2Horizontal Boxplot
Flipped coordinate for horizontal orientation.
(-> data/iris
(sk/lay-boxplot :species :sepal_width)
(sk/coord :flip))Violin Plot
A violin shows the full density shape per category β more informative than a boxplot for multimodal distributions.
(-> data/tips
(sk/lay-violin :day :total_bill))Grouped Violin
Color splits each category into side-by-side violins.
(-> data/tips
(sk/lay-violin :day :total_bill {:color :smoker}))Verify dodge positioning: each color group gets a distinct offset.
(let [pl (-> data/tips
(sk/lay-violin :day :total_bill {:color :smoker})
sk/plan)
panel (first (:panels pl))
viol-layer (first (filter #(= :violin (:mark %)) (:layers panel)))
cats (:color-categories viol-layer)]
(count cats))2Horizontal Violin
(-> data/iris
(sk/lay-violin :species :petal_length)
(sk/coord :flip))Ridgeline Plot
Overlapping density curves stacked vertically by category β good for comparing distribution shapes across many groups.
(-> data/iris
(sk/lay-ridgeline :species :sepal_length))Colored Ridgeline
Map color to the same categorical column for distinct curves.
(-> data/iris
(sk/lay-ridgeline :species :sepal_length {:color :species}))Comparing Multiple Columns
sk/distribution creates side-by-side histograms for multiple columns β useful when you want to compare the shape of different variables. Each column gets its own facet panel.
(-> (sk/distribution data/iris :sepal_length :sepal_width :petal_length)
sk/lay-histogram)Combine with :color to see group differences within each column.
(-> (sk/distribution data/iris :sepal_length :sepal_width :petal_length)
(sk/lay-density {:color :species}))All distribution methods support :color for group comparisons and compose freely with other layers and facets.