6  Inference Rules

Napkinsketch infers many parameters automatically so you can write less and get reasonable defaults. This notebook shows those rules in action by examining the plan β€” the resolved data structure that captures every inference decision.

The examples use small inline datasets so the full plan is readable.

(ns napkinsketch-book.inference-rules
  (:require
   ;; Tablecloth β€” dataset manipulation
   [tablecloth.api :as tc]
   ;; Kindly β€” notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Napkinsketch β€” composable plotting
   [scicloj.napkinsketch.api :as sk]
   ;; Shared datasets
   [napkinsketch-book.datasets :as data]))

What Gets Inferred

When you write (-> data (sk/lay-point :x :y)) β€” or even just (sk/lay-point data) β€” the library fills in everything needed to render a plot. Here is the full list of inference steps, in the order they happen:

  1. Column selection β€” which columns map to x, y, and color (inferred from dataset shape when omitted)
  2. Column types β€” numerical, categorical, or temporal
  3. Aesthetic resolution β€” is :color a column reference, a hex string, or a CSS name?
  4. Grouping β€” which column(s) split data into subsets
  5. Method β€” which mark and stat to use (scatter, histogram, bar, …)
  6. Domains β€” data extent for each axis, with padding
  7. Ticks β€” nice round values and formatted labels
  8. Axis labels β€” derived from column names
  9. Legend β€” type, entries, and layout space
  10. Layout β€” single panel, facet grid, or multi-variable
  11. Coordinate transform β€” cartesian, flip, or polar

Each rule has a sensible default and an explicit override. The sections below demonstrate each rule with live examples.

Inspecting the Plan

Every call to sk/plan returns a plain Clojure map: the plan. It contains everything needed to render a plot β€” domains, ticks, scales, layers with positioned data, legend, layout dimensions.

To understand what Napkinsketch inferred, look at the plan.

(def five-points
  {:x [1.0 2.0 3.0 4.0 5.0]
   :y [2.1 4.3 3.0 5.2 4.8]})
(def scatter-views
  (-> five-points
      (sk/lay-point :x :y)))

Here is the full plan:

(sk/plan scatter-views)
{:panels
 [{:coord :cartesian,
   :y-domain [1.945 5.355],
   :x-scale {:type :linear},
   :x-domain [0.8 5.2],
   :x-ticks
   {:values [1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0],
    :labels ["1.0" "1.5" "2.0" "2.5" "3.0" "3.5" "4.0" "4.5" "5.0"],
    :categorical? false},
   :col 0,
   :layers
   [{:mark :point,
     :style {:opacity 0.75, :radius 3.0},
     :groups
     [{:color [0.2 0.2 0.2 1.0], :xs #tech.v3.dataset.column<float64>[5]
:x
[1.000, 2.000, 3.000, 4.000, 5.000],
       :ys #tech.v3.dataset.column<float64>[5]
:y
[2.100, 4.300, 3.000, 5.200, 4.800],
       :row-indices #tech.v3.dataset.column<int64>[5]
:__row-idx
[0, 1, 2, 3, 4]}],
     :y-domain [2.1 5.2],
     :x-domain [1.0 5.0]}],
   :y-scale {:type :linear},
   :y-ticks
   {:values [2.0 2.5 3.0 3.5 4.0 4.5 5.0],
    :labels ["2.0" "2.5" "3.0" "3.5" "4.0" "4.5" "5.0"],
    :categorical? false},
   :row 0}],
 :width 600,
 :height 400,
 :caption nil,
 :total-width 622.5,
 :legend-position :right,
 :layout-type :single,
 :layout
 {:subtitle-pad 0,
  :legend-w 0,
  :caption-pad 0,
  :y-label-pad 22.5,
  :legend-h 0,
  :title-pad 0,
  :strip-h 0,
  :x-label-pad 18,
  :strip-w 0},
 :grid {:rows 1, :cols 1},
 :legend nil,
 :panel-height 400.0,
 :title nil,
 :y-label "y",
 :alpha-legend nil,
 :x-label "x",
 :subtitle nil,
 :panel-width 600.0,
 :size-legend nil,
 :total-height 418.0,
 :margin 30}

And the resulting plot:

scatter-views
yx1.01.52.02.53.03.54.04.55.02.02.53.03.54.04.55.0

Notice in the plan above:

  • :x-domain is [0.8 5.2] β€” wider than the data range [1.0, 5.0] because of 5% padding

  • :x-scale is {:type :linear} β€” inferred from numeric data

  • :x-ticks has nice round values: 1.0, 1.5, 2.0, ...

  • :x-label is "x" β€” derived from the column keyword

  • :legend is nil β€” no color mapping

  • :layout has :legend-w 0 β€” no space reserved for a legend

  • The single layer has :mark :point and a single :groups entry with all 5 data points, colored in the default color (steel blue)

Column Selection

When column names are omitted, napkinsketch infers them from the dataset shape:

Number of columns Inferred mapping
1 first β†’ x
2 first β†’ x, second β†’ y
3 first β†’ x, second β†’ y, third β†’ color
4+ error β€” specify columns explicitly

One column:

(-> {:values [1 2 3 4 5 6]}
    sk/lay-histogram)
values1.01.52.02.53.03.54.04.55.05.56.00.00.20.40.60.81.01.21.41.61.82.0

Two columns:

(-> {:x [1 2 3 4 5] :y [2 4 3 5 4]}
    sk/lay-point)
yx1.01.52.02.53.03.54.04.55.02.02.53.03.54.04.55.0

Three columns β€” the third becomes :color:

(-> {:x [1 2 3 4] :y [4 5 6 7] :g ["a" "a" "b" "b"]}
    sk/lay-point)
yxgab1.01.52.02.53.03.54.04.04.55.05.56.06.57.0

When you provide explicit columns, inference is skipped β€” you are in full control:

(-> data/iris
    (sk/lay-point :petal_length :petal_width {:color :species}))
petal widthpetal lengthspeciessetosaversicolorvirginica12345670.00.51.01.52.02.5

Column Type Detection

Once columns are selected, the next step is determining the type of each column: numerical, categorical, or temporal? This determines the scale type, domain, tick style, and the default mark.

Column dtype Inferred type
float, int :numerical
string, keyword, boolean, symbol, text :categorical
LocalDate, LocalDateTime, Instant, java.util.Date :temporal β†’ numerical with calendar-aware ticks

Internally, infer-column-types in view.clj handles this step.

A categorical column produces a band scale with string domain values. Compare:

(def animals
  {:animal ["cat" "dog" "bird" "fish"]
   :count [12 8 15 5]})
(def bar-views
  (-> animals
      (sk/lay-value-bar :animal :count)))
(sk/plan bar-views)
{:panels
 [{:coord :cartesian,
   :y-domain [-0.75 15.75],
   :x-scale {:type :linear},
   :x-domain ["cat" "dog" "bird" "fish"],
   :x-ticks
   {:values ["cat" "dog" "bird" "fish"],
    :labels ["cat" "dog" "bird" "fish"],
    :categorical? true},
   :col 0,
   :layers
   [{:mark :rect,
     :style {:opacity 0.85},
     :position :dodge,
     :groups
     [{:color [0.2 0.2 0.2 1.0],
       :label "",
       :xs #tech.v3.dataset.column<string>[4]
:animal
[cat, dog, bird, fish],
       :ys #tech.v3.dataset.column<int64>[4]
:count
[12, 8, 15, 5],
       :dodge-idx 0}],
     :y-domain [0 15],
     :x-domain ("cat" "dog" "bird" "fish"),
     :dodge-ctx {:n-groups 1}}],
   :y-scale {:type :linear},
   :y-ticks
   {:values [0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0],
    :labels ["0" "2" "4" "6" "8" "10" "12" "14"],
    :categorical? false},
   :row 0}],
 :width 600,
 :height 400,
 :caption nil,
 :total-width 618.0,
 :legend-position :right,
 :layout-type :single,
 :layout
 {:subtitle-pad 0,
  :legend-w 0,
  :caption-pad 0,
  :y-label-pad 18.0,
  :legend-h 0,
  :title-pad 0,
  :strip-h 0,
  :x-label-pad 18,
  :strip-w 0},
 :grid {:rows 1, :cols 1},
 :legend nil,
 :panel-height 400.0,
 :title nil,
 :y-label "count",
 :alpha-legend nil,
 :x-label "animal",
 :subtitle nil,
 :panel-width 600.0,
 :size-legend nil,
 :total-height 418.0,
 :margin 30}
bar-views
countanimalcatdogbirdfish02468101214

The x-domain is ["cat" "dog" "bird" "fish"] β€” strings in order of appearance. The ticks have :categorical? true. The y-domain starts at zero because this is a bar chart.

Temporal columns

Dates are detected and converted to epoch-milliseconds internally, with calendar-aware tick labels. Clojure’s #inst reader literal is a convenient way to write dates:

(let [pl (-> {:date [#inst "2024-01-01" #inst "2024-06-01" #inst "2024-12-01"]
              :val [10 25 18]}
             (sk/lay-point :date :val)
             sk/plan)
      p (first (:panels pl))]
  {:x-domain-numeric? (number? (first (:x-domain p)))
   :tick-count (count (:values (:x-ticks p)))
   :first-tick-label (first (:labels (:x-ticks p)))})
{:x-domain-numeric? true, :tick-count 10, :first-tick-label "Feb-01"}

The x-domain contains epoch-millisecond numbers, but the 10 tick labels show human-readable dates like "Feb-01". Napkinsketch accepts java.util.Date (from #inst), LocalDate, LocalDateTime, and Instant β€” all are converted to epoch-milliseconds for plotting, with calendar-aware tick formatting.

Aesthetic Resolution

The :color parameter triggers different behaviors depending on what you pass. Internally, resolve-aesthetics in view.clj classifies each aesthetic channel (:color, :size, :alpha, :text) as either a column reference or a fixed literal.

Column reference β€” colored by palette

(def colored-views
  (-> {:x [1 2 3 4 5 6]
       :y [3 5 4 7 6 8]
       :g ["a" "a" "a" "b" "b" "b"]}
      (sk/lay-point :x :y {:color :g})))
(sk/plan colored-views)
{:panels
 [{:coord :cartesian,
   :y-domain [2.75 8.25],
   :x-scale {:type :linear},
   :x-domain [0.75 6.25],
   :x-ticks
   {:values [1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0],
    :labels
    ["1.0"
     "1.5"
     "2.0"
     "2.5"
     "3.0"
     "3.5"
     "4.0"
     "4.5"
     "5.0"
     "5.5"
     "6.0"],
    :categorical? false},
   :col 0,
   :layers
   [{:mark :point,
     :style {:opacity 0.75, :radius 3.0},
     :groups
     [{:color
       [0.8941176470588236
        0.10196078431372549
        0.10980392156862745
        1.0],
       :xs #tech.v3.dataset.column<int64>[3]
:x
[1, 2, 3],
       :ys #tech.v3.dataset.column<int64>[3]
:y
[3, 5, 4],
       :label "a",
       :row-indices #tech.v3.dataset.column<int64>[3]
:__row-idx
[0, 1, 2]}
      {:color
       [0.21568627450980393
        0.49411764705882355
        0.7215686274509804
        1.0],
       :xs #tech.v3.dataset.column<int64>[3]
:x
[4, 5, 6],
       :ys #tech.v3.dataset.column<int64>[3]
:y
[7, 6, 8],
       :label "b",
       :row-indices #tech.v3.dataset.column<int64>[3]
:__row-idx
[3, 4, 5]}],
     :y-domain [3 8],
     :x-domain [1 6]}],
   :y-scale {:type :linear},
   :y-ticks
   {:values [3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0],
    :labels
    ["3.0"
     "3.5"
     "4.0"
     "4.5"
     "5.0"
     "5.5"
     "6.0"
     "6.5"
     "7.0"
     "7.5"
     "8.0"],
    :categorical? false},
   :row 0}],
 :width 600,
 :height 400,
 :caption nil,
 :total-width 722.5,
 :legend-position :right,
 :layout-type :single,
 :layout
 {:subtitle-pad 0,
  :legend-w 100,
  :caption-pad 0,
  :y-label-pad 22.5,
  :legend-h 0,
  :title-pad 0,
  :strip-h 0,
  :x-label-pad 18,
  :strip-w 0},
 :grid {:rows 1, :cols 1},
 :legend
 {:title :g,
  :entries
  [{:label "a",
    :color
    [0.8941176470588236 0.10196078431372549 0.10980392156862745 1.0]}
   {:label "b",
    :color
    [0.21568627450980393
     0.49411764705882355
     0.7215686274509804
     1.0]}]},
 :panel-height 400.0,
 :title nil,
 :y-label "y",
 :alpha-legend nil,
 :x-label "x",
 :subtitle nil,
 :panel-width 600.0,
 :size-legend nil,
 :total-height 418.0,
 :margin 30}
colored-views
yxgab1.01.52.02.53.03.54.04.55.05.56.03.03.54.04.55.05.56.06.57.07.58.0

Two entries in :groups, each with its own :color (RGBA), :xs, :ys, and :label. A :legend appeared with 2 entries. The :layout now has :legend-w 100 β€” space reserved on the right.

Why two entries? Because :g is a categorical column. The next section explores this mechanism in detail.

Fixed color string β€” single color, no legend

(def fixed-color-views
  (-> five-points
      (sk/lay-point :x :y {:color "#E74C3C"})))
(sk/plan fixed-color-views)
{:panels
 [{:coord :cartesian,
   :y-domain [1.945 5.355],
   :x-scale {:type :linear},
   :x-domain [0.8 5.2],
   :x-ticks
   {:values [1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0],
    :labels ["1.0" "1.5" "2.0" "2.5" "3.0" "3.5" "4.0" "4.5" "5.0"],
    :categorical? false},
   :col 0,
   :layers
   [{:mark :point,
     :style {:opacity 0.75, :radius 3.0},
     :groups
     [{:color
       [0.9058823529411765 0.2980392156862745 0.23529411764705882 1.0],
       :xs #tech.v3.dataset.column<float64>[5]
:x
[1.000, 2.000, 3.000, 4.000, 5.000],
       :ys #tech.v3.dataset.column<float64>[5]
:y
[2.100, 4.300, 3.000, 5.200, 4.800],
       :row-indices #tech.v3.dataset.column<int64>[5]
:__row-idx
[0, 1, 2, 3, 4]}],
     :y-domain [2.1 5.2],
     :x-domain [1.0 5.0]}],
   :y-scale {:type :linear},
   :y-ticks
   {:values [2.0 2.5 3.0 3.5 4.0 4.5 5.0],
    :labels ["2.0" "2.5" "3.0" "3.5" "4.0" "4.5" "5.0"],
    :categorical? false},
   :row 0}],
 :width 600,
 :height 400,
 :caption nil,
 :total-width 622.5,
 :legend-position :right,
 :layout-type :single,
 :layout
 {:subtitle-pad 0,
  :legend-w 0,
  :caption-pad 0,
  :y-label-pad 22.5,
  :legend-h 0,
  :title-pad 0,
  :strip-h 0,
  :x-label-pad 18,
  :strip-w 0},
 :grid {:rows 1, :cols 1},
 :legend nil,
 :panel-height 400.0,
 :title nil,
 :y-label "y",
 :alpha-legend nil,
 :x-label "x",
 :subtitle nil,
 :panel-width 600.0,
 :size-legend nil,
 :total-height 418.0,
 :margin 30}
fixed-color-views
yx1.01.52.02.53.03.54.04.55.02.02.53.03.54.04.55.0

A single :groups entry with red RGBA values. No :legend, :legend-w is 0. The hex string was converted to [0.906 0.298 0.235 1.0].

Named colors and string disambiguation

CSS color names like "red" and "steelblue" also work as fixed colors:

(-> five-points
    (sk/lay-point :x :y {:color "steelblue"}))
yx1.01.52.02.53.03.54.04.55.02.02.53.03.54.04.55.0

This raises a question: since :color also accepts column names as strings (like "species"), how does the system decide whether "red" means the column :red or the color red?

The rule is: check the dataset first. If the string matches a column name in the dataset, it is treated as a column reference. Otherwise, it is treated as a color value β€” first trying hex parsing, then CSS color name lookup.

Here is the full resolution order for a string :color value:

  1. If the string matches a dataset column β†’ column reference (grouping)
  2. If it starts with # β†’ hex color ("#E74C3C", "#F00")
  3. If it parses as hex without # β†’ hex color ("00FF00")
  4. If it matches a CSS color name β†’ named color ("red", "steelblue")
  5. Otherwise β†’ error with a helpful message

In practice, ambiguity is rare. Column names like "species" or "temperature" are not valid CSS colors, and color names like "red" are unlikely column names. When true ambiguity exists, use a keyword for the column (:red) or a hex string for the color ("#FF0000").

Verify: "red" is a fixed color when the dataset has no red column:

(let [pl (-> five-points
             (sk/lay-point :x :y {:color "red"})
             sk/plan)]
  {:legend (:legend pl)
   :color (:color (first (:groups (first (:layers (first (:panels pl)))))))})
{:legend nil, :color [1.0 0.0 0.0 1.0]}

No legend, red RGBA β€” treated as a fixed color, not a column.

No color β€” default gray

Look back at the first scatter plan above β€” its single :groups entry has the default color (steel blue). No legend.

Grouping

The :groups entries you saw above reflect a key concept: grouping controls how data is split into independent subsets. Each group gets its own visual elements β€” its own set of points, its own regression line, its own density curve, its own bar in a dodged layout.

Internally, infer-grouping in view.clj builds the grouping vector from explicit :group and categorical color.

Grouping can be derived (from a categorical :color mapping) or explicit (via the :group aesthetic).

Categorical color implies grouping

When :color maps to a categorical column (as with colored-views above), the data is split into one group per category. Each group gets a distinct palette color and a legend entry:

(let [pl (sk/plan colored-views)
      layer (first (:layers (first (:panels pl))))]
  {:group-count (count (:groups layer))
   :group-labels (mapv :label (:groups layer))
   :has-legend? (some? (:legend pl))})
{:group-count 2, :group-labels ["a" "b"], :has-legend? true}

Two groups, two legend entries. Each group has its own :xs, :ys, and :color.

Numeric color does not create groups

When :color maps to a numerical column, data is NOT split. Instead, each point gets an individual color from a continuous gradient. There is one group, and the legend is continuous with 20 pre-computed color stops.

(let [pl (-> {:x [1 2 3 4 5]
              :y [2 4 3 5 4]
              :val [10 20 30 40 50]}
             (sk/lay-point :x :y {:color :val})
             sk/plan)
      layer (first (:layers (first (:panels pl))))]
  {:group-count (count (:groups layer))
   :legend-type (:type (:legend pl))
   :color-stops (count (:stops (:legend pl)))})
{:group-count 1, :legend-type :continuous, :color-stops 20}

One group, continuous legend with 20 stops. No splitting occurred β€” the color is a visual encoding, not a grouping variable.

Explicit grouping with :group

The :group aesthetic splits data into groups without assigning distinct colors or creating a legend. This is useful when you want per-group statistics but uniform appearance.

(def grouped-data
  {:x [1 2 3 4 5 6]
   :y [3 5 4 7 6 8]
   :g ["a" "a" "a" "b" "b" "b"]})
(let [pl (-> grouped-data
             (sk/lay-point :x :y {:group :g})
             sk/plan)
      layer (first (:layers (first (:panels pl))))]
  {:group-count (count (:groups layer))
   :has-legend? (some? (:legend pl))})
{:group-count 2, :has-legend? false}

Two groups, but no legend and no color differentiation. Use :group when you need separate statistical fits but want a uniform visual style.

What grouping affects

Grouping determines how statistical transformations operate. Without grouping, sk/lay-lm (linear model) fits one regression line through all the data. With grouping, it fits one line per group.

One regression line β€” no grouping:

(-> grouped-data
    (sk/view :x :y)
    sk/lay-point
    sk/lay-lm)
yx1.01.52.02.53.03.54.04.55.05.56.03.03.54.04.55.05.56.06.57.07.58.0

Two regression lines β€” grouped by color:

(-> grouped-data
    (sk/view :x :y {:color :g})
    sk/lay-point
    sk/lay-lm)
yxgab1.01.52.02.53.03.54.04.55.05.56.03.03.54.04.55.05.56.06.57.07.58.0

The same applies to other statistics: density curves, LOESS smoothers, boxplots, and dodge/stack positioning all operate per group.

Method Inference

When you use sk/view without an explicit sk/lay-* call, Napkinsketch infers the method β€” a mark + stat bundle β€” from the column types. Internally, infer-method in view.clj implements these rules:

Columns Inferred mark Inferred stat
one numerical :bar :bin (histogram)
one categorical :rect :count (bar chart)
two numerical :point :identity (scatter)
mixed (categorical + numerical) :point :identity (scatter)

When you use sk/lay-point, sk/lay-histogram, etc., the method’s stat takes precedence β€” column-type inference is bypassed.

A single numerical column:

(def hist-views
  (-> five-points
      (sk/view :x)))
(sk/plan hist-views)
{:panels
 [{:coord :cartesian,
   :y-domain [-0.1 2.1],
   :x-scale {:type :linear},
   :x-domain [0.8 5.2],
   :x-ticks
   {:values [1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0],
    :labels ["1.0" "1.5" "2.0" "2.5" "3.0" "3.5" "4.0" "4.5" "5.0"],
    :categorical? false},
   :col 0,
   :layers
   [{:mark :bar,
     :style {:opacity 0.85},
     :groups
     [{:color [0.2 0.2 0.2 1.0],
       :bars
       [{:lo 1.0, :hi 2.0, :count 1}
        {:lo 2.0, :hi 3.0, :count 1}
        {:lo 3.0, :hi 4.0, :count 1}
        {:lo 4.0, :hi 5.0, :count 2}]}],
     :y-domain [0 2],
     :x-domain [1.0 5.0]}],
   :y-scale {:type :linear},
   :y-ticks
   {:values [-0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0],
    :labels
    ["0.0"
     "0.2"
     "0.4"
     "0.6"
     "0.8"
     "1.0"
     "1.2"
     "1.4"
     "1.6"
     "1.8"
     "2.0"],
    :categorical? false},
   :row 0}],
 :width 600,
 :height 400,
 :caption nil,
 :total-width 600.0,
 :legend-position :right,
 :layout-type :single,
 :layout
 {:subtitle-pad 0,
  :legend-w 0,
  :caption-pad 0,
  :y-label-pad 0,
  :legend-h 0,
  :title-pad 0,
  :strip-h 0,
  :x-label-pad 18,
  :strip-w 0},
 :grid {:rows 1, :cols 1},
 :legend nil,
 :panel-height 400.0,
 :title nil,
 :y-label nil,
 :alpha-legend nil,
 :x-label "x",
 :subtitle nil,
 :panel-width 600.0,
 :size-legend nil,
 :total-height 418.0,
 :margin 30}
hist-views
x1.01.52.02.53.03.54.04.55.00.00.20.40.60.81.01.21.41.61.82.0

The layer mark is :bar β€” inferred because a single numerical column means histogram. The layer data contains :bins with :x0, :x1, :count β€” the result of the :bin stat.

A single categorical column:

(def count-views
  (-> animals
      (sk/view :animal)))
(sk/plan count-views)
{:panels
 [{:coord :cartesian,
   :y-domain [-0.05 1.05],
   :x-scale {:type :linear},
   :x-domain ["cat" "dog" "bird" "fish"],
   :x-ticks
   {:values ["cat" "dog" "bird" "fish"],
    :labels ["cat" "dog" "bird" "fish"],
    :categorical? true},
   :col 0,
   :layers
   [{:mark :rect,
     :style {:opacity 0.85},
     :position :dodge,
     :categories ["cat" "dog" "bird" "fish"],
     :groups
     [{:color [0.2 0.2 0.2 1.0],
       :label "",
       :counts
       [{:category "cat", :count 1}
        {:category "dog", :count 1}
        {:category "bird", :count 1}
        {:category "fish", :count 1}],
       :dodge-idx 0}],
     :y-domain [0 1],
     :x-domain ("cat" "dog" "bird" "fish"),
     :dodge-ctx {:n-groups 1}}],
   :y-scale {:type :linear},
   :y-ticks
   {:values [-0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0],
    :labels
    ["0.0"
     "0.1"
     "0.2"
     "0.3"
     "0.4"
     "0.5"
     "0.6"
     "0.7"
     "0.8"
     "0.9"
     "1.0"],
    :categorical? false},
   :row 0}],
 :width 600,
 :height 400,
 :caption nil,
 :total-width 600.0,
 :legend-position :right,
 :layout-type :single,
 :layout
 {:subtitle-pad 0,
  :legend-w 0,
  :caption-pad 0,
  :y-label-pad 0,
  :legend-h 0,
  :title-pad 0,
  :strip-h 0,
  :x-label-pad 18,
  :strip-w 0},
 :grid {:rows 1, :cols 1},
 :legend nil,
 :panel-height 400.0,
 :title nil,
 :y-label nil,
 :alpha-legend nil,
 :x-label "animal",
 :subtitle nil,
 :panel-width 600.0,
 :size-legend nil,
 :total-height 418.0,
 :margin 30}
count-views
animalcatdogbirdfish0.00.10.20.30.40.50.60.70.80.91.0

Mark is :rect with :counts β€” the :count stat tallied each of the 4 categories.

Mixed column types (categorical x, numerical y) default to :point:

(let [pl (-> {:species ["a" "b" "c"] :val [10 20 15]}
             (sk/view :species :val)
             sk/plan)
      layer (first (:layers (first (:panels pl))))]
  (:mark layer))
:point

Domain Inference

Numerical domains extend 5% beyond the data range so points aren’t clipped at the edges. Internally, pad-domain in scale.clj computes this padding.

(let [pl (sk/plan scatter-views)
      p (first (:panels pl))]
  {:x-domain (:x-domain p)
   :data-range [1.0 5.0]
   :padding-each-side (* 0.05 (- 5.0 1.0))})
{:x-domain [0.8 5.2], :data-range [1.0 5.0], :padding-each-side 0.2}

The domain [0.8, 5.2] = data range [1.0, 5.0] Β± 0.2 (5% of 4.0).

Special domain rules apply in certain contexts:

Bar chart y-domains always include zero:

(let [pl (sk/plan bar-views)
      p (first (:panels pl))]
  {:y-domain (:y-domain p)})
{:y-domain [-0.75 15.75]}

Percentage-filled layers normalize the y-domain to [0.0, 1.0]:

(let [fill-pl (-> {:x ["a" "a" "b" "b"]
                   :g ["m" "n" "m" "n"]}
                  (sk/lay-stacked-bar-fill :x {:color :g})
                  sk/plan)
      p (first (:panels fill-pl))]
  (:y-domain p))
[0.0 1.0]

The y-domain is exactly [0.0, 1.0] β€” each category sums to 100%.

Multi-layer plots merge domains across layers β€” see β€œMulti-Layer Plans” below.

Tick Inference

Once domains are computed, Napkinsketch selects β€œnice” round tick values. The logic depends on the scale type:

  • Linear β€” wadogo selects ticks at round intervals (1, 2, 2.5, 5, …)

  • Log β€” ggplot2-style 1-2-5 nice numbers: powers of 10 when they give at least 3 ticks, otherwise intermediates at 1-2-5 or 1-2-3-5 multiples per decade

  • Categorical β€” tick at each category, in order of appearance

  • Temporal β€” calendar-aware snapping (year, month, day, hour) with adaptive formatting

Linear ticks for the scatter example:

(let [pl (sk/plan scatter-views)
      p (first (:panels pl))]
  {:x-tick-values (:values (:x-ticks p))
   :x-tick-labels (:labels (:x-ticks p))})
{:x-tick-values [1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0],
 :x-tick-labels ["1.0" "1.5" "2.0" "2.5" "3.0" "3.5" "4.0" "4.5" "5.0"]}

Nine ticks from 1.0 to 5.0 at 0.5 intervals β€” round and readable.

Log ticks for a multi-decade range:

(let [pl (-> {:x [0.1 1.0 10.0 100.0 1000.0]
              :y [5 10 15 20 25]}
             (sk/lay-point :x :y)
             (sk/scale :x :log)
             sk/plan)
      p (first (:panels pl))]
  {:tick-values (:values (:x-ticks p))
   :tick-labels (:labels (:x-ticks p))})
{:tick-values [0.1 1.0 10.0 100.0 1000.0],
 :tick-labels ["0.1" "1" "10" "100" "1000"]}

Five ticks at exact powers of 10 β€” no irrational intermediates. Whole numbers display without decimals, sub-1 values use minimal decimal places.

Categorical ticks match domain order:

(let [pl (sk/plan bar-views)
      p (first (:panels pl))]
  (:values (:x-ticks p)))
["cat" "dog" "bird" "fish"]

Axis Label Inference

Labels come from column names. Underscores and hyphens become spaces. Internally, resolve-labels in plan.clj handles this.

(def iris data/iris)
(let [pl (-> iris
             (sk/lay-point :sepal_length :sepal_width)
             sk/plan)]
  {:x-label (:x-label pl)
   :y-label (:y-label pl)})
{:x-label "sepal length", :y-label "sepal width"}

When only one column is specified, the y-axis shows computed counts. The system omits the y-label since it would repeat the column name:

(let [pl (-> five-points (sk/view :x) sk/plan)]
  {:x-label (:x-label pl)
   :y-label (:y-label pl)})
{:x-label "x", :y-label nil}

Explicit labels override inference:

(let [pl (-> five-points
             (sk/lay-point :x :y)
             (sk/options {:x-label "Length (cm)" :y-label "Width (cm)"})
             sk/plan)]
  {:x-label (:x-label pl)
   :y-label (:y-label pl)})
{:x-label "Length (cm)", :y-label "Width (cm)"}

Legend Inference

A legend appears when a column is mapped to color. Internally, build-legend in plan.clj constructs the legend from the collected color information. Three cases:

Categorical color β†’ discrete legend with one entry per category:

(:legend (sk/plan colored-views))
{:title :g,
 :entries
 [{:label "a",
   :color
   [0.8941176470588236 0.10196078431372549 0.10980392156862745 1.0]}
  {:label "b",
   :color
   [0.21568627450980393 0.49411764705882355 0.7215686274509804 1.0]}]}

Title is the column name. Each entry has a :label and :color (RGBA).

No color mapping β†’ no legend:

(:legend (sk/plan scatter-views))
nil

Fixed color string β†’ no legend:

(:legend (sk/plan fixed-color-views))
nil

Numeric color β†’ continuous legend (gradient bar):

(:legend (-> {:x [1 2 3] :y [4 5 6] :val [10 20 30]}
             (sk/lay-point :x :y {:color :val})
             sk/plan))
{:title :val,
 :type :continuous,
 :min 10,
 :max 30,
 :color-scale nil,
 :stops
 [{:t 0.0,
   :color
   [0.07450980392156863 0.16862745098039217 0.2627450980392157 1.0]}
  {:t 0.05263157894736842,
   :color
   [0.08833849329205366 0.19628482972136224 0.2998968008255934 1.0]}
  {:t 0.10526315789473684,
   :color
   [0.1021671826625387 0.22394220846233232 0.3370485036119711 1.0]}
  {:t 0.15789473684210525,
   :color
   [0.11599587203302374 0.2515995872033024 0.3742002063983488 1.0]}
  {:t 0.21052631578947367,
   :color
   [0.1298245614035088 0.27925696594427246 0.4113519091847265 1.0]}
  {:t 0.2631578947368421,
   :color
   [0.14365325077399382 0.30691434468524253 0.4485036119711042 1.0]}
  {:t 0.3157894736842105,
   :color
   [0.15748194014447883 0.33457172342621255 0.4856553147574819 1.0]}
  {:t 0.3684210526315789,
   :color
   [0.17131062951496387 0.3622291021671826 0.5228070175438597 1.0]}
  {:t 0.42105263157894735,
   :color
   [0.1851393188854489 0.3898864809081527 0.5599587203302373 1.0]}
  {:t 0.47368421052631576,
   :color
   [0.19896800825593394 0.41754385964912283 0.597110423116615 1.0]}
  {:t 0.5263157894736842,
   :color
   [0.21279669762641898 0.4452012383900929 0.6342621259029928 1.0]}
  {:t 0.5789473684210527,
   :color
   [0.22662538699690402 0.472858617131063 0.6714138286893705 1.0]}
  {:t 0.631578947368421,
   :color
   [0.24045407636738905 0.500515995872033 0.7085655314757482 1.0]}
  {:t 0.6842105263157895,
   :color
   [0.25428276573787406 0.5281733746130031 0.7457172342621259 1.0]}
  {:t 0.7368421052631579,
   :color
   [0.2681114551083591 0.5558307533539731 0.7828689370485036 1.0]}
  {:t 0.7894736842105263,
   :color
   [0.28194014447884413 0.5834881320949432 0.8200206398348814 1.0]}
  {:t 0.8421052631578947,
   :color
   [0.29576883384932917 0.6111455108359133 0.857172342621259 1.0]}
  {:t 0.8947368421052632,
   :color
   [0.3095975232198142 0.6388028895768834 0.8943240454076368 1.0]}
  {:t 0.9473684210526315,
   :color
   [0.3234262125902993 0.6664602683178534 0.9314757481940144 1.0]}
  {:t 1.0,
   :color
   [0.33725490196078434 0.6941176470588235 0.9686274509803922 1.0]}]}

Size Legend

When :size maps to a numerical column, a size legend shows graduated circles spanning the data range. Internally, build-size-legend in plan.clj generates five entries with proportional radii.

(:size-legend (-> {:x [1 2 3 4 5] :y [1 2 3 4 5] :s [10 20 30 40 50]}
                  (sk/lay-point :x :y {:size :s})
                  sk/plan))
{:title :s,
 :type :size,
 :min 10,
 :max 50,
 :entries
 [{:value 10.0, :radius 2.0}
  {:value 20.0, :radius 3.5}
  {:value 30.0, :radius 5.0}
  {:value 40.0, :radius 6.5}
  {:value 50.0, :radius 8.0}]}

Each entry has a :value and :radius. No size mapping β†’ no size legend:

(:size-legend (sk/plan scatter-views))
nil

Alpha Legend

When :alpha maps to a numerical column, an alpha legend shows graduated opacity squares. Internally, build-alpha-legend in plan.clj generates five entries with proportional opacity.

(:alpha-legend (-> {:x [1 2 3 4 5] :y [1 2 3 4 5] :a [0.1 0.3 0.5 0.7 0.9]}
                   (sk/lay-point :x :y {:alpha :a})
                   sk/plan))
{:title :a,
 :type :alpha,
 :min 0.1,
 :max 0.9,
 :entries
 [{:value 0.1, :alpha 0.2}
  {:value 0.3, :alpha 0.4}
  {:value 0.5, :alpha 0.6000000000000001}
  {:value 0.7, :alpha 0.8}
  {:value 0.9, :alpha 1.0}]}

No alpha mapping β†’ no alpha legend:

(:alpha-legend (sk/plan scatter-views))
nil

Layout Inference

The :layout map adjusts padding based on what elements are present. Internally, compute-layout-dims in plan.clj calculates the space needed for titles, labels, and legends.

Compare a bare plot to one with title, labels, and legend:

(let [bare (sk/plan scatter-views)
      full (-> {:x [1 2 3 4 5 6]
                :y [3 5 4 7 6 8]
                :g ["a" "a" "a" "b" "b" "b"]}
               (sk/lay-point :x :y {:color :g})
               (sk/options {:title "My Plot"})
               sk/plan)]
  {:bare-title-pad (get-in bare [:layout :title-pad])
   :full-title-pad (get-in full [:layout :title-pad])
   :bare-legend-w (get-in bare [:layout :legend-w])
   :full-legend-w (get-in full [:layout :legend-w])})
{:bare-title-pad 0,
 :full-title-pad 18,
 :bare-legend-w 0,
 :full-legend-w 100}

The bare plot has zero title padding and zero legend width. The full plot adds padding for the title and 100 pixels for the legend.

Layout type is also inferred from the view structure:

  • Single panel β†’ :single
  • Facet grid (:facet-row or :facet-col) β†’ :facet-grid
  • Multiple x-y pairs (scatter plot matrix) β†’ :multi-variable
(let [pl (sk/plan scatter-views)]
  (:layout-type pl))
:single

Coordinate Flipping

Setting :coord :flip swaps axes in the plan. The layer data stays the same β€” the panel-level domains and ticks are swapped. Internally, make-coord in coord.clj handles the transformation.

(def normal-pl
  (-> animals
      (sk/lay-value-bar :animal :count)
      sk/plan))
(def flip-pl
  (-> animals
      (sk/lay-value-bar :animal :count)
      (sk/coord :flip)
      sk/plan))
(let [np (first (:panels normal-pl))
      fp (first (:panels flip-pl))]
  {:normal {:x-categorical? (:categorical? (:x-ticks np))
            :y-categorical? (:categorical? (:y-ticks np))}
   :flipped {:x-categorical? (:categorical? (:x-ticks fp))
             :y-categorical? (:categorical? (:y-ticks fp))}})
{:normal {:x-categorical? true, :y-categorical? false},
 :flipped {:x-categorical? false, :y-categorical? true}}
(-> animals
    (sk/lay-value-bar :animal :count)
    (sk/coord :flip))
animalcount02468101214catdogbirdfish

The categorical axis moved from x to y.

Labels are also swapped β€” the x-label and y-label follow their visual axis, not the data axis:

(let [pl (-> five-points
             (sk/lay-point :x :y)
             (sk/coord :flip)
             sk/plan)]
  {:x-label (:x-label pl)
   :y-label (:y-label pl)})
{:x-label "y", :y-label "x"}

After flipping, the visual x-axis shows β€œy” and the visual y-axis shows β€œx” β€” labels track the visual axes.

Multi-Layer Plans

When multiple layers share a panel, their domains are merged:

(def multi-views
  (-> five-points
      (sk/view :x :y)
      sk/lay-point
      sk/lay-lm))
(sk/plan multi-views)
{:panels
 [{:coord :cartesian,
   :y-domain [1.945 5.355],
   :x-scale {:type :linear},
   :x-domain [0.8 5.2],
   :x-ticks
   {:values [1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0],
    :labels ["1.0" "1.5" "2.0" "2.5" "3.0" "3.5" "4.0" "4.5" "5.0"],
    :categorical? false},
   :col 0,
   :layers
   [{:mark :point,
     :style {:opacity 0.75, :radius 3.0},
     :groups
     [{:color [0.2 0.2 0.2 1.0], :xs #tech.v3.dataset.column<float64>[5]
:x
[1.000, 2.000, 3.000, 4.000, 5.000],
       :ys #tech.v3.dataset.column<float64>[5]
:y
[2.100, 4.300, 3.000, 5.200, 4.800],
       :row-indices #tech.v3.dataset.column<int64>[5]
:__row-idx
[0, 1, 2, 3, 4]}],
     :y-domain [2.1 5.2],
     :x-domain [1.0 5.0]}
    {:mark :line,
     :style {:stroke-width 2.5},
     :groups
     [{:color [0.2 0.2 0.2 1.0],
       :label "",
       :x1 1.0,
       :y1 2.6200000000000014,
       :x2 5.0,
       :y2 5.139999999999999}],
     :y-domain [2.1 5.2],
     :x-domain [1.0 5.0]}],
   :y-scale {:type :linear},
   :y-ticks
   {:values [2.0 2.5 3.0 3.5 4.0 4.5 5.0],
    :labels ["2.0" "2.5" "3.0" "3.5" "4.0" "4.5" "5.0"],
    :categorical? false},
   :row 0}],
 :width 600,
 :height 400,
 :caption nil,
 :total-width 622.5,
 :legend-position :right,
 :layout-type :single,
 :layout
 {:subtitle-pad 0,
  :legend-w 0,
  :caption-pad 0,
  :y-label-pad 22.5,
  :legend-h 0,
  :title-pad 0,
  :strip-h 0,
  :x-label-pad 18,
  :strip-w 0},
 :grid {:rows 1, :cols 1},
 :legend nil,
 :panel-height 400.0,
 :title nil,
 :y-label "y",
 :alpha-legend nil,
 :x-label "x",
 :subtitle nil,
 :panel-width 600.0,
 :size-legend nil,
 :total-height 418.0,
 :margin 30}
multi-views
yx1.01.52.02.53.03.54.04.55.02.02.53.03.54.04.55.0

Two layers β€” one :point, one :line β€” sharing the same domain. The :line layer has :mark :line and its groups contain :polyline-xs and :polyline-ys β€” the regression curve.

Resolution Overview

All of the inference rules above feed into views->plan, which orchestrates a resolution pipeline. The diagram below shows the key steps and their data dependencies:

graph TD VIEWS["views + options"] VIEWS --> CT["Column Types
(infer-column-types)"] VIEWS --> AE["Aesthetics
(resolve-aesthetics)"] CT --> GR["Grouping
(infer-grouping)"] AE --> GR CT --> ME["Method
(infer-method)"] GR --> STATS["Statistics
(compute-stat)"] ME --> STATS STATS --> DOM["Domains
(collect-domain + pad-domain)"] DOM --> TK["Ticks
(compute-ticks)"] VIEWS --> LBL["Labels
(resolve-labels)"] AE --> LEG["Color Legend
(build-legend)"] AE --> SLEG["Size Legend
(build-size-legend)"] AE --> ALEG["Alpha Legend
(build-alpha-legend)"] DOM --> LAYOUT["Layout
(compute-layout-dims)"] LBL --> LAYOUT LEG --> LAYOUT SLEG --> LAYOUT ALEG --> LAYOUT DOM --> PLAN["Plan"] TK --> PLAN LBL --> PLAN LEG --> PLAN SLEG --> PLAN ALEG --> PLAN LAYOUT --> PLAN STATS --> PLAN style VIEWS fill:#e8f5e9 style PLAN fill:#fff3e0 style STATS fill:#e3f2fd style DOM fill:#e3f2fd

Each box corresponds to a named function in the codebase. The top four boxes β€” Column Types, Aesthetics, Grouping, and Method β€” are the per-view inference steps (in view.clj). The remaining boxes are the plan-level orchestration steps (in plan.clj and scale.clj).

Summary

Every inference can be overridden. Here is the complete list:

What is inferred Default Override
Column selection 1β†’x, 2β†’x y, 3β†’x y color explicit column args in sk/view or sk/lay-*
Column type dtype inspection :x-type, :y-type, :color-type in view options
Aesthetic classification keyword = column, string = color/column explicit :color keyword vs hex string
Grouping categorical color column :group aesthetic
Method (mark + stat) column types (see table above) sk/lay-point, sk/lay-histogram, etc.
Domain extent data range + 5% padding (sk/scale views :x {:domain [0 10]})
Domain zero-anchor bar/stacked charts include zero (sk/scale views :y {:domain [5 20]})
Fill domain [0.0, 1.0] for fill position (sk/scale views :y {:domain [0 2]})
Tick values round intervals (linear), powers of 10 (log) wadogo scale configuration
Tick labels number formatting, calendar formatting wadogo label formatting
Axis labels column name, underscores β†’ spaces (sk/options {:x-label "Custom"})
Color legend categorical = discrete, numerical = continuous, none = no legend :color mapping controls presence
Size legend 5 graduated circles when :size maps to numerical column :size mapping controls presence
Alpha legend 5 graduated opacity squares when :alpha maps to numerical column :alpha mapping controls presence
Layout padding adjusts for title, labels, legend :width, :height in options
Layout type single, facet-grid, multi-variable sk/facet, multiple x-y pairs
Coordinate system :cartesian (sk/coord :flip), (sk/coord :polar)

The plan captures the result of all inference. When in doubt, look at the plan.

What’s Next

  • Methods β€” the full registry of marks, stats, and positions that inference selects from
  • Scatter Plots β€” see inference in action with the most common chart type
source: notebooks/napkinsketch_book/inference_rules.clj