5  Core Concepts

This chapter is a reference for every concept in Napkinsketch. If you have not read the Composable Plotting chapter, start there β€” it introduces sketches, views, and methods through progressive examples.

(ns napkinsketch-book.core-concepts
  (:require
   ;; Tablecloth β€” dataset manipulation
   [tablecloth.api :as tc]
   ;; Shared datasets for these docs
   [napkinsketch-book.datasets :as data]
   ;; Kindly β€” notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Napkinsketch β€” composable plotting
   [scicloj.napkinsketch.api :as sk]))

Data

A dataset is a table of rows and columns β€” like a spreadsheet. Each column has a name (a keyword like :sepal_length) and holds values of one type. Napkinsketch uses tech.ml.dataset as its columnar data representation, typically through the Tablecloth API.

We use the classic iris flower dataset throughout these examples. It is loaded in the Datasets chapter and available as data/iris.

data/iris

https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv [150 5]:

:sepal_length :sepal_width :petal_length :petal_width :species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
… … … … …
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica

The dataset has 150 rows and 5 columns. Four columns are numerical (measurements in centimeters) and one is categorical (the species name β€” one of three strings).

This distinction matters: Napkinsketch treats numerical and categorical columns differently when choosing axes, colors, and statistical transforms.

Here is a scatter plot of sepal dimensions, colored by species:

(-> data/iris
    (sk/lay-point :sepal_length :sepal_width {:color :species}))
sepal widthsepal lengthspeciessetosaversicolorvirginica4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Input formats

You do not need to construct a Tablecloth dataset explicitly. Napkinsketch accepts several common Clojure data shapes and coerces them into a dataset internally.

Map of columns β€” keys are column names, values are sequences:

(-> {:x [1 2 3 4 5]
     :y [2 4 3 5 4]}
    (sk/lay-point :x :y))
yx1.01.52.02.53.03.54.04.55.02.02.53.03.54.04.55.0

Sequence of row maps β€” each map is one row. Missing keys become nil:

(-> [{:city "Paris" :temperature 22}
     {:city "London" :temperature 18}
     {:city "Berlin" :temperature 20}
     {:city "Rome" :temperature 28}]
    (sk/lay-value-bar :city :temperature))
temperaturecityParisLondonBerlinRome0510152025

When the dataset has 1, 2, or 3 columns, you can omit the column names entirely β€” they are inferred by position (first β†’ x, second β†’ y, third β†’ color):

(-> {:x [1 2 3 4 5] :y [2 4 3 5 4]}
    sk/lay-point)
yx1.01.52.02.53.03.54.04.55.02.02.53.03.54.04.55.0

With three columns, the third becomes the color grouping:

(-> {:x [1 2 3 4] :y [4 5 6 7] :group ["a" "a" "b" "b"]}
    sk/lay-point)
yxgroupab1.01.52.02.53.03.54.04.04.55.05.56.06.57.0

Datasets with four or more columns require explicit column names.

Tablecloth dataset β€” tc/dataset loads data from CSV files, URLs, and other file formats. The :key-fn keyword option converts string column headers to keywords. See the Tablecloth documentation for all supported formats (CSV, TSV, JSON, Parquet, and more).

Sequence of sequences β€” each inner sequence is a row. Pass column names explicitly since there are no keys:

(-> (tc/dataset [[1 10] [2 20] [3 15] [4 25]]
                {:column-names [:x :y]})
    (sk/lay-line :x :y))
Notestderr
[nREPL-session-8e8bb98e-1616-4622-b3ee-582c8251a9e2] WARN tablecloth.api.dataset - Dataset creation behaviour changed for 2d 2-element arrays in v7.029. See https://github.com/scicloj/tablecloth/issues/142 for details.
yx1.01.52.02.53.03.54.01012141618202224

Sketches, Views, and Methods

The Composable Plotting chapter introduces these concepts through progressive examples. Read it first if you have not already.

In short: sk/view describes your views of the data, sk/lay-* functions add drawing methods, and the result is a sketch β€” a composable, auto-rendering value. Everything is plain data you can inspect with sk/views-of.

Options

There are three kinds of options in Napkinsketch:

  • Layer options β€” per-layer settings like :color, :size, :position, and method-specific parameters (:bandwidth, :se, etc.). Passed in the options map of layer functions. See the Methods chapter.

  • Plot options β€” per-plot text content: :title, :subtitle, :caption, and axis labels. Passed via sk/options.

  • Configuration β€” global rendering defaults: dimensions, theme, palette, color scale, and more. These follow a layered precedence chain. See the Configuration chapter.

Here is one option from each scope in a single pipeline:

(-> data/iris
    (sk/lay-point :sepal_length :sepal_width {:color :species :alpha 0.5}) ;; layer options
    (sk/options {:title "Iris Measurements"                                ;; plot option
                 :width 500 :palette :dark2}))                             ;; config options
Iris Measurementssepal widthsepal lengthspeciessetosaversicolorvirginica4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Mark

The mark is the visual shape drawn on the plot. The scatter plot above used :mark :point β€” each data point became a dot.

A method’s name describes its intent while the mark describes the shape. The :histogram method uses :mark :bar because a histogram is drawn with bar shapes:

(sk/method-lookup :histogram)
{:mark :bar,
 :stat :bin,
 :x-only true,
 :accepts [:normalize],
 :doc "Histogram β€” bins numerical data into bars."}

A histogram draws bar shapes filled to show binned counts:

(-> data/iris
    (sk/lay-histogram :sepal_length))
sepal length4.55.05.56.06.57.07.58.00510152025

Stat

The stat is the computation applied to data before drawing. The scatter plot used :stat :identity β€” every row became one point, unchanged.

The :histogram method uses :stat :bin β€” it groups values into ranges and counts how many fall in each range. The stat transforms the data; the mark renders the result. Together, :stat :bin and :mark :bar produce the familiar histogram shape.

The :lm (linear model) method uses :stat :lm β€” it fits a straight line to the data and returns a polyline of predicted values:

(sk/method-lookup :lm)
{:mark :line,
 :stat :lm,
 :accepts [:se :size :nudge-x :nudge-y],
 :doc
 "Linear model (lm) β€” ordinary least squares (OLS) regression line."}

A regression line fitted through the scatter data:

(-> data/iris
    (sk/lay-point :sepal_length :sepal_width)
    sk/lay-lm)
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Position

The position controls how overlapping groups share space. Most methods leave it unset β€” groups are drawn independently (:position :identity). The :stacked-bar method includes :position :stack, which places groups on top of each other:

(sk/method-lookup :stacked-bar)
{:mark :rect,
 :stat :count,
 :position :stack,
 :x-only true,
 :accepts [],
 :doc "Stacked bar β€” counts categorical values, stacked."}

A stacked bar chart β€” each meal’s count stacks on the previous:

(-> {:day ["Mon" "Mon" "Tue" "Tue"]
     :count [30 20 45 15]
     :meal ["lunch" "dinner" "lunch" "dinner"]}
    (sk/lay-value-bar :day :count {:color :meal :position :stack}))
countdaymeallunchdinnerMonTue0102030405060

Inference

Napkinsketch infers two things automatically:

  • Columns β€” when omitted, inferred from the dataset shape (1 column β†’ x, 2 β†’ x y, 3 β†’ x y color)

  • Method β€” when using sk/view instead of an explicit sk/lay-*, the chart type is chosen from the column types

Two numerical columns produce a scatter plot; a single numerical column produces a histogram.

(-> data/iris
    (sk/view :sepal_length :sepal_width))
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

A single column produces a histogram:

(-> data/iris
    (sk/view :sepal_length))
sepal length4.55.05.56.06.57.07.58.00510152025

Use sk/lay-point, sk/lay-histogram, etc. when you want to choose a specific method, pass options like :color, or add multiple layers.

Layers

A plot can have multiple layers β€” different methods drawn on the same axes. Each sk/lay-X call adds one layer; thread them and they are drawn together, each contributing its own visual element.

Here we add a linear model regression line (sk/lay-lm) on top of the scatter points. A regression line is a straight line fitted to the data β€” it shows the overall trend.

(-> data/iris
    (sk/view :sepal_length :sepal_width)
    sk/lay-point
    sk/lay-lm)
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Or with a LOESS (local regression) smoother β€” a flexible curve that follows local trends instead of fitting a straight line:

(-> data/iris
    (sk/view :sepal_length :sepal_width)
    sk/lay-point
    sk/lay-loess)
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

The same plot without sk/view β€” the first sk/lay-X call sets the column mappings and subsequent layers inherit them:

(-> data/iris
    (sk/lay-point :sepal_length :sepal_width)
    sk/lay-lm)
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

sk/lay also accepts annotation maps (sk/rule-h, sk/band-v, etc.) β€” see the Customization chapter.

When to use sk/view

There are four common patterns:

  • Minimal β€” (sk/lay-point data) β€” columns inferred from dataset shape
  • Explicit columns β€” (sk/lay-point data :x :y) β€” no sk/view needed
  • Inferred method β€” (sk/view data :x :y) β€” the library picks the chart type
  • Shared aesthetics β€” (-> data (sk/view :x :y {:color :g}) sk/lay-point sk/lay-lm) β€” all layers inherit

Incremental Building

Because views are plain data, you can save a partial plot and extend it later. Each sk/lay-X call adds a layer without changing the original.

(def scatter-base
  (-> data/iris
      (sk/lay-point :sepal_length :sepal_width)))

Add a regression line:

(-> scatter-base
    sk/lay-lm)
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Or a LOESS smoother instead β€” a flexible curve that follows local patterns in the data:

(-> scatter-base
    sk/lay-loess)
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

You can also add a layer with different columns by passing them explicitly. This creates a multi-panel layout, one panel per column pair:

(-> scatter-base
    (sk/lay-point :petal_length :petal_width))
2345sepal lengthpetal lengthsepal widthpetal width

Color

The :color option controls point and line colors. Its behavior depends on what you pass.

Categorical column β€” when :color refers to a column with text values (like :species), each unique value gets a distinct color from the palette (an ordered set of colors). A legend appears alongside the plot, mapping labels to colors.

(-> data/iris
    (sk/lay-point :sepal_length :sepal_width {:color :species}))
sepal widthsepal lengthspeciessetosaversicolorvirginica4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Numeric column β€” when :color refers to a numerical column (like :petal_length), values map to a continuous gradient β€” a smooth color ramp from low to high. The legend shows a color bar instead of discrete entries.

(-> data/iris
    (sk/lay-point :sepal_length :sepal_width {:color :petal_length}))
sepal widthsepal lengthpetal length1.0006.9004.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Fixed color string β€” a literal color name like "steelblue" colors all points uniformly. No legend appears because there is nothing to distinguish.

(-> data/iris
    (sk/lay-point :sepal_length :sepal_width {:color "steelblue"}))
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Grouping

Categorical color does more than set colors β€” it creates groups. Each group is processed independently: it gets its own regression line, density curve, or bar.

Compare: without :color, sk/lay-lm (linear model) fits one line to all the data:

(-> data/iris
    (sk/view :sepal_length :sepal_width)
    sk/lay-point
    sk/lay-lm)
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Passing :color :species in sk/view makes it a shared aesthetic β€” all layers inherit it. Each species becomes a separate group, so the regression fits three lines instead of one:

(-> data/iris
    (sk/view :sepal_length :sepal_width {:color :species})
    sk/lay-point
    sk/lay-lm)
sepal widthsepal lengthspeciessetosaversicolorvirginica4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Grouping reveals patterns within each species that the overall trend line hides.

Faceting

Faceting splits a plot into multiple panels β€” separate plotting areas, one per value of a column. All panels share the same axes, making it easy to compare subsets side by side.

sk/facet specifies which column to split on:

(-> data/iris
    (sk/view :sepal_length :sepal_width)
    (sk/facet :species)
    sk/lay-point
    sk/lay-lm)
sepal widthsepal length682.02.53.03.54.04.56868setosaversicolorvirginica

Three panels, one per species. The shared axes let you compare sepal dimensions across species at a glance.

Column Combinations

sk/cross generates all combinations of two lists. Passing column names to sk/cross and the result to sk/view creates one panel per combination β€” a quick way to explore relationships across many variables at once.

(def cols [:sepal_length :sepal_width :petal_length])
(sk/cross cols cols)
([:sepal_length :sepal_length]
 [:sepal_length :sepal_width]
 [:sepal_length :petal_length]
 [:sepal_width :sepal_length]
 [:sepal_width :sepal_width]
 [:sepal_width :petal_length]
 [:petal_length :sepal_length]
 [:petal_length :sepal_width]
 [:petal_length :petal_length])

Three columns crossed with themselves produce nine panels β€” a full grid where each row and column corresponds to a variable:

(-> data/iris
    (sk/view (sk/cross cols cols)))
01020234682462345sepal lengthsepal widthpetal lengthsepal lengthsepal widthpetal length

Notice the diagonal: when x and y are the same column, Napkinsketch infers a histogram instead of a scatter plot. This is inference at work β€” each panel gets the method that fits its column types.

Coordinates and Scales

Coordinates and scales are composable modifiers. They change how data maps to visual space without changing the data itself.

sk/coord sets the coordinate system. :flip swaps the x and y axes β€” useful for horizontal layouts or when axis labels are long.

Here we flip a scatter plot so sepal length runs vertically:

(-> data/iris
    (sk/lay-point :sepal_length :sepal_width {:color :species})
    (sk/coord :flip))
sepal lengthsepal widthspeciessetosaversicolorvirginica2.02.22.42.62.83.03.23.43.63.84.04.24.44.55.05.56.06.57.07.58.0

sk/scale changes how a numeric axis is drawn. :log applies a logarithmic transformation β€” useful when values span a wide range, so that small and large values are both visible.

Here we use a dataset where values vary by orders of magnitude:

(-> {:population [1000 5000 50000 200000 1000000 5000000]
     :area [2 8 30 120 500 2100]}
    (sk/lay-point :population :area)
    (sk/scale :x :log)
    (sk/scale :y :log))
areapopulation1000100001000001000000100000001101001000

Without log scales, the small values would be crushed together near the origin. Log scales spread them out proportionally.

What’s Next

This chapter covered the core building blocks. The rest of the book builds on them:

  • Inference Rules β€” how napkinsketch chooses defaults for marks, stats, and domains
  • Methods β€” complete tables of every mark, stat, and position
  • Scatter Plots β€” the most common chart type, a good place to start exploring
source: notebooks/napkinsketch_book/core_concepts.clj