9 Inference Rules

Plotje infers many parameters automatically so you can write less and get reasonable defaults. This notebook walks each rule with a worked example: a small pose, the rendered plot, and a description of what was inferred. Every rule is also checked against the resolved plot on every run, so the claims here stay honest as the library evolves.

This chapter is a reference: each rule with its default and its override. For the conceptual overview, read Poses and Core Concepts first. The examples use small inline datasets so the relationships are easy to read at a glance.

(ns plotje-book.inference-rules
  (:require
   ;; Tablecloth -- dataset manipulation
   [tablecloth.api :as tc]
   ;; Kindly -- notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Plotje -- composable plotting
   [scicloj.plotje.api :as pj]
   ;; Rdatasets -- standard datasets
   [scicloj.metamorph.ml.rdatasets :as rdatasets]))

A Worked Example

Before the rule-by-rule tour, here is what “inference” looks like in practice: a five-point scatter where Plotje filled in almost everything for us.

(def five-points
  {:x [1.0 2.0 3.0 4.0 5.0]
   :y [2.1 4.3 3.0 5.2 4.8]})

(def scatter-pose
  (-> five-points
      (pj/lay-point :x :y)))

scatter-pose

Notice what was inferred:

The x-axis label "x" and y-axis label "y", taken from the column keywords
A linear scale on each axis, since both columns are numeric
The data range [1.0, 5.0] widened to [0.8, 5.2] – a 5% padding so the extreme points do not sit on the panel edge
Round tick values: 1.0, 1.5, 2.0, ...
No legend, since no color mapping was given
A single point group rendered in the default color (dark gray, #333)

Each of those decisions is its own inference rule, with a default and an explicit override.

Overrides at a Glance

Every inference rule has an explicit override. The table below lists them all – scan it to find what you need, then jump to the matching section for the details and worked examples.

What is inferred	Default	Override
Column selection	one column fills x; two fill x, y; three fill x, y, color	explicit column args in `pj/pose` or `pj/lay-*`
Column type	dtype inspection	`:x-type`, `:y-type`, `:color-type` in pose or layer options
Aesthetic classification	keyword = column, string = color/column	explicit `:color` keyword vs hex string
Grouping	categorical color column	`:group` aesthetic
Layer type (mark + stat)	column types (see Layer Type section)	`pj/lay-point`, `pj/lay-histogram`, etc.
Domain extent	data range + 5% padding	`(pj/scale pose :x {:domain [0 10]})`
Domain zero-anchor	bar/stacked charts include zero	`(pj/scale pose :y {:domain [5 20]})`
Fill domain	`[0.0, 1.0]` for fill position	`(pj/scale pose :y {:domain [0 2]})`
Tick values	round intervals (linear), powers of 10 (log)	wadogo scale configuration
Tick labels	number formatting, calendar formatting	wadogo label formatting
Axis labels	column name, with underscores replaced by spaces	`(pj/options {:x-label "Custom"})`
Color legend	categorical = discrete, numerical = continuous, none = no legend	`:color` mapping controls presence
Size legend	5 graduated circles when `:size` maps to numerical column	`:size` mapping controls presence
Alpha legend	5 graduated opacity squares when `:alpha` maps to numerical column	`:alpha` mapping controls presence
Layout padding	adjusts for title, labels, legend	`:width`, `:height` in options
Layout type	single, facet-grid, multi-variable	`pj/facet`, multiple x-y pairs
Coordinate system	`:cartesian`	`(pj/coord :flip)`, `(pj/coord :polar)`

The sections below walk each rule in detail. The order roughly follows how a pose is resolved into a plot – column selection, column types, aesthetics, grouping, layer type, domains, ticks, labels, legends, layout, coord flip – with two cross-cutting closing sections on how the rules combine in multi-layer plots and a diagram of the full resolution flow.

Column Selection

When column names are omitted, Plotje infers them from the dataset shape:

Number of columns	Inferred mapping
1	first column becomes x
2	first becomes x, second becomes y
3	first becomes x, second becomes y, third becomes color
4+	no inference – see the note below

The same rule applies whether you start with pj/lay-* on raw data or pj/pose on raw data. Both read the first 1-3 columns of the dataset in the order they appear and build the mapping from there.

One column:

(-> {:values [1 2 3 4 5 6]}
    pj/lay-histogram)

Two columns:

(-> {:x [1 2 3 4 5] :y [2 4 3 5 4]}
    pj/lay-point)

Three columns – the third becomes :color:

(-> {:x [1 2 3 4] :y [4 5 6 7] :g ["a" "a" "b" "b"]}
    pj/lay-point)

Pose construction infers the same mapping

Calling pj/pose on raw data without explicit column arguments runs the same column-selection rule. A 1-3 column dataset gets its mapping filled in; the resulting pose carries the mapping but has no layer attached yet, so layer type inference (covered below) supplies the mark at render time.

(def two-col-pose
  (pj/pose {:x [1.0 2.0 3.0 4.0 5.0]
            :y [1.0 4.0 9.0 16.0 25.0]}))

two-col-pose

The inferred mapping is visible on the pose itself:

(-> two-col-pose (select-keys [:mapping :layers]) kind/pprint)

{:mapping {:x :x, :y :y}, :layers []}

4+ columns

With four or more columns there is no unambiguous default, so inference stops:

(pj/lay-* data) throws with a message listing the available columns, asking you to pass explicit :x and :y.
(pj/pose data) is gentler – it builds a pose with the data attached but no mapping, so you can add one downstream with (pj/pose pose :col-a :col-b) or (pj/lay-point pose :col-a :col-b).

When you provide explicit columns, inference is skipped – you are in full control:

(-> (rdatasets/datasets-iris)
    (pj/lay-point :petal-length :petal-width {:color :species}))

Column Types

Once columns are selected, the next step is determining the type of each column – numerical, categorical, or temporal. This determines the scale type, domain, tick style, and the default mark.

Column dtype	Inferred type
float, int	`:numerical`
string, keyword, boolean, symbol, text	`:categorical`
LocalDate, LocalDateTime, Instant, java.util.Date	`:temporal` (numerical, with calendar-aware ticks)

A categorical column produces a band scale with string domain values. Compare:

(def animals
  {:animal ["cat" "dog" "bird" "fish"]
   :count [12 8 15 5]})

(def bar-pose
  (-> animals
      (pj/lay-value-bar :animal :count)))

bar-pose

The x-axis lays out the four animal names in order of appearance – strings, treated as a categorical band scale. The y-axis starts at zero because this is a bar chart.

Temporal columns

Dates are detected and converted to epoch-milliseconds internally, with calendar-aware tick labels. Clojure’s #inst reader literal is a convenient way to write dates:

(def temporal-pose
  (-> {:date [#inst "2024-01-01" #inst "2024-06-01" #inst "2024-12-01"]
       :val [10 25 18]}
      (pj/lay-point :date :val)))

temporal-pose

The x-axis carries epoch-millisecond numbers internally, but the 10 tick labels show human-readable dates like "Feb-01". Plotje accepts java.util.Date (from #inst), LocalDate, LocalDateTime, and Instant – all are converted to epoch-milliseconds for plotting, with calendar-aware tick formatting.

Overriding inferred types with `:x-type` / `:y-type`

Sometimes a numeric column is really categorical – for example, hours of the day, years, or subject IDs. The inference system sees numbers and treats them as numerical, but you may want discrete categorical bands. Pass :x-type :categorical (or :y-type) to the pose or layer options to override:

(def hour-bar-pose
  (-> {:hour [9 10 11 12] :count [5 8 12 7]}
      (pj/lay-value-bar :hour :count {:x-type :categorical})))

hour-bar-pose

Four bars at discrete hour bands. Without the override, lay-value-bar would reject the numeric :hour column; with it, the column is treated as categorical (values cast to strings for display). The same override exists for :y-type and for :color-type (see the Grouping section below for a :color-type example).

Aesthetic Resolution

The :color parameter triggers different behaviors depending on what you pass. Each aesthetic channel (:color, :size, :alpha, :text) is classified as either a column reference or a fixed literal.

Column reference – colored by palette

(def colored-pose
  (-> {:x [1 2 3 4 5 6]
       :y [3 5 4 7 6 8]
       :g ["a" "a" "a" "b" "b" "b"]}
      (pj/lay-point :x :y {:color :g})))

colored-pose

The categorical column :g splits the data into two groups, each with its own color drawn from the palette. A legend appears on the right (100 pixels wide) and the panel shrinks to make room.

The next section explores why a categorical color column triggers grouping while a numeric color column does not.

Fixed color string – single color, no legend

(def fixed-color-pose
  (-> five-points
      (pj/lay-point :x :y {:color "#E74C3C"})))

fixed-color-pose

A literal hex string maps every point to that single color: no grouping, no legend, no legend strip. The hex was parsed into the RGBA tuple [0.906 0.298 0.235 1.0].

Named colors and string disambiguation

CSS color names like "red" and "steelblue" also work as fixed colors:

(-> five-points
    (pj/lay-point :x :y {:color "steelblue"}))

This raises a question: since :color also accepts column names as strings (like "species"), how does the system decide whether "red" means the column :red or the color red?

The rule is: check the dataset first. If the string matches a column name in the dataset, it is treated as a column reference. Otherwise, it is treated as a color value – first trying hex parsing, then CSS color name lookup.

Here is the full resolution order for a string :color value:

If the string matches a dataset column, it is a column reference (grouping)
If it starts with #, it is a hex color ("#E74C3C", "#F00")
If it parses as hex without #, it is a hex color ("00FF00")
If it matches a CSS color name, it is a named color ("red", "steelblue")
Otherwise, error with a helpful message

In practice, ambiguity is rare. Column names like "species" or "temperature" are not valid CSS colors, and color names like "red" are unlikely column names. When true ambiguity exists, use a keyword for the column (:red) or a hex string for the color ("#FF0000").

Verify: "red" is a fixed color when the dataset has no red column:

(def red-color-pose
  (-> five-points
      (pj/lay-point :x :y {:color "red"})))

red-color-pose

No legend, points drawn red – treated as a fixed color, not a column.

No color – default gray

The Worked Example at the top of the chapter shows this case: with no :color mapping, all points render in the default dark gray (#333) and no legend appears.

Grouping

The colored examples above all rest on the same concept: grouping controls how data is split into independent subsets. Each group gets its own visual elements – its own set of points, its own regression line, its own density curve, its own bar in a dodged layout.

Grouping can be derived (from a categorical :color mapping) or explicit (via the :group aesthetic).

Categorical color implies grouping

When :color maps to a categorical column (as with colored-pose above), the data is split into one group per category. Each group gets a distinct palette color and a legend entry:

colored-pose

Two groups, two legend entries – one per category in :g.

Numeric color does not create groups

When :color maps to a numerical column, data is NOT split. Instead, each point gets an individual color from a continuous gradient. There is one group, and the legend is continuous with 20 pre-computed color stops.

(def numeric-color-pose
  (-> {:x [1 2 3 4 5]
       :y [2 4 3 5 4]
       :val [10 20 30 40 50]}
      (pj/lay-point :x :y {:color :val})))

numeric-color-pose

A single group with a continuous legend of 20 color stops – the color is a visual encoding, not a grouping variable.

Overriding color type with `:color-type`

Sometimes a numeric column is really a categorical identifier – for example, subject IDs in a repeated-measures study. Inference treats numeric columns as continuous, but you want discrete groups. Setting :color-type :categorical overrides this so the column is treated as categorical despite its numeric dtype.

This is a core principle of the library: inference provides good defaults, but the user can always override.

(def study-data
  {:subject [1 1 1 2 2 2 3 3 3]
   :day     [1 2 3 1 2 3 1 2 3]
   :score   [5 7 6 3 4 5 8 9 7]})

Without override – one group, continuous gradient:

(def study-continuous-pose
  (-> study-data
      (pj/lay-line :day :score {:color :subject})))

study-continuous-pose

With :color-type :categorical – three groups, one per subject:

(def study-categorical-pose
  (-> study-data
      (pj/lay-line :day :score {:color :subject
                                :color-type :categorical})))

study-categorical-pose

The same data, the same columns – but :color-type :categorical changes inference from “one gradient” to “three distinct groups.” This affects grouping, line splitting, legend style, and palette assignment. The rendered plots look completely different:

(-> {:subject [1 1 1 2 2 2 3 3 3]
     :day     [1 2 3 1 2 3 1 2 3]
     :score   [5 7 6 3 4 5 8 9 7]}
    (pj/lay-line :day :score {:color :subject
                              :color-type :categorical})
    pj/lay-point
    (pj/options {:title "Scores by Subject (categorical override)"}))

Explicit grouping with `:group`

The :group aesthetic splits data into groups without assigning distinct colors or creating a legend. This is useful when you want per-group statistics but uniform appearance.

(def grouped-data
  {:x [1 2 3 4 5 6]
   :y [3 5 4 7 6 8]
   :g ["a" "a" "a" "b" "b" "b"]})

(def explicit-group-pose
  (-> grouped-data
      (pj/lay-point :x :y {:group :g})))

explicit-group-pose

Two groups, but no legend and no color differentiation. Use :group when you need separate statistical fits but want a uniform visual style.

What grouping affects

Grouping determines how statistical transformations operate. Without grouping, (pj/lay-smooth {:stat :linear-model}) (linear model) fits one regression line through all the data. With grouping, it fits one line per group.

One regression line – no grouping:

(-> grouped-data
    (pj/pose :x :y)
    pj/lay-point
    (pj/lay-smooth {:stat :linear-model}))

Two regression lines – grouped by color:

(-> grouped-data
    (pj/pose :x :y {:color :g})
    pj/lay-point
    (pj/lay-smooth {:stat :linear-model}))

The same applies to other statistics: density curves, LOESS smoothers, boxplots, and dodge/stack positioning all operate per group.

Layer Type

When you use pj/pose without an explicit pj/lay-* call, Plotje infers the layer type – a mark + stat bundle – from the column types of the referenced columns.

Single-column cases

Column type	Inferred	Mark + stat
numerical	histogram	`:bar` + `:bin`
temporal	histogram (over epoch-ms, with calendar-aware ticks)	`:bar` + `:bin`
categorical	bar chart of category counts	`:rect` + `:count`

Two-column cases

x type	y type	Inferred	Mark + stat
numerical	numerical	scatter	`:point` + `:identity`
temporal	numerical	time-series line	`:line` + `:identity`
categorical	numerical	boxplot (vertical)	`:boxplot` + `:boxplot`
numerical	categorical	boxplot (horizontal)	`:boxplot` + `:boxplot`
any other pair		scatter (fallback)	`:point` + `:identity`

Fallback pairs include temporal x + categorical y, categorical x + categorical y, and temporal x + temporal y. These are rarer in practice, and giving them a dedicated inference is deferred. You can always override with an explicit pj/lay-* call; the inferred layer type is only a default.

When you use pj/lay-point, pj/lay-histogram, etc., the layer type’s stat takes precedence – column-type inference is bypassed.

A single numerical column produces a histogram:

(def hist-pose
  (-> five-points
      (pj/pose :x)))

hist-pose

The inferred layer is a histogram – a :bar mark fed by the :bin stat, so the data is binned into rectangles before rendering.

A single temporal column also becomes a histogram, binned over epoch-milliseconds with calendar-aware tick labels:

(def temporal-hist-pose
  (-> {:date [#inst "2024-01-01" #inst "2024-02-01" #inst "2024-03-01"
              #inst "2024-04-01" #inst "2024-05-01"]}
      (pj/pose :date)))

temporal-hist-pose

A single categorical column produces a bar chart of counts:

(def count-pose
  (-> animals
      (pj/pose :animal)))

count-pose

The inferred layer uses a :rect mark fed by the :count stat, which tallied each of the 4 categories.

Two numerical columns produce a scatter (the chapter’s opening scatter-pose is such a pose):

(def num-num-pose
  (-> five-points (pj/pose :x :y)))

num-num-pose

A temporal x with a numerical y infers a time-series line. Row order is preserved, so pre-sort temporal data to avoid zigzag:

(def ts-line-pose
  (-> {:date [#inst "2024-01-01" #inst "2024-02-01" #inst "2024-03-01"]
       :val  [10 25 18]}
      (pj/pose :date :val)))

ts-line-pose

A categorical x with a numerical y infers a boxplot – the default for summarizing a distribution across groups:

(def boxplot-pose
  (-> {:species ["a" "a" "a" "b" "b" "b" "c" "c" "c"]
       :val     [8  10  12  18  20  22  14  15  17]}
      (pj/pose :species :val)))

boxplot-pose

A numerical x with a categorical y infers a horizontal boxplot – the same summary laid out with the category axis on y:

(def horizontal-boxplot-pose
  (-> {:val     [8  10  12  18  20  22  14  15  17]
       :species ["a" "a" "a" "b" "b" "b" "c" "c" "c"]}
      (pj/pose :val :species)))

horizontal-boxplot-pose

Domains

Numerical domains extend 5% beyond the data range so points aren’t clipped at the edges.

scatter-pose

The x-domain is [0.8, 5.2] – the data range [1.0, 5.0] plus 0.2 padding on each side (5% of the data range, 4.0).

Special domain rules apply in certain contexts:

Bar chart y-domains always include zero:

bar-pose

Percentage-filled layers normalize the y-domain to [0.0, 1.0]:

(def fill-pose
  (-> {:x ["a" "a" "b" "b"]
       :g ["m" "n" "m" "n"]}
      (pj/lay-bar :x {:position :fill :color :g})))

fill-pose

The y-domain is exactly [0.0, 1.0] – each category sums to 100%.

Multi-layer plots merge domains across layers – see “Multi-Layer Plots” below.

Ticks

Once domains are computed, Plotje selects “nice” round tick values. The logic depends on the scale type:

Linear – wadogo selects ticks at round intervals (1, 2, 2.5, 5, …)
Log – 1-2-5 nice numbers: powers of 10 when they give at least 3 ticks, otherwise intermediates at 1-2-5 or 1-2-3-5 multiples per decade
Categorical – tick at each category, in order of appearance
Temporal – calendar-aware snapping (year, month, day, hour) with adaptive formatting

Linear ticks for the scatter example:

scatter-pose

Nine ticks from 1.0 to 5.0 at 0.5 intervals – round and readable.

Log ticks for a multi-decade range:

(def log-scale-pose
  (-> {:x [0.1 1.0 10.0 100.0 1000.0]
       :y [5 10 15 20 25]}
      (pj/lay-point :x :y)
      (pj/scale :x :log)))

log-scale-pose

Five ticks at exact powers of 10 – no irrational intermediates. Whole numbers display without decimals, sub-1 values use minimal decimal places.

Categorical ticks match domain order:

bar-pose

Axis Labels

Labels come from column names. Underscores and hyphens become spaces.

(def iris-label-pose
  (-> (rdatasets/datasets-iris)
      (pj/lay-point :sepal-length :sepal-width)))

iris-label-pose

When only one column is specified, the y-axis shows computed counts. The system omits the y-label since it would repeat the column name:

(def x-only-pose
  (-> five-points (pj/pose :x)))

x-only-pose

Explicit labels override inference:

(def explicit-label-pose
  (-> five-points
      (pj/lay-point :x :y)
      (pj/options {:x-label "Length (cm)" :y-label "Width (cm)"})))

explicit-label-pose

Legends

A legend appears when a column is mapped to color. Three cases:

A categorical color mapping produces a discrete legend with one entry per category:

colored-pose

The legend’s title is the column name; each entry has a :label and a palette color.

No color mapping means no legend:

scatter-pose

A fixed color string also suppresses the legend:

fixed-color-pose

A numeric color mapping produces a continuous legend (gradient bar):

(def continuous-color-pose
  (-> {:x [1 2 3] :y [4 5 6] :val [10 20 30]}
      (pj/lay-point :x :y {:color :val})))

continuous-color-pose

Size Legend

When :size maps to a numerical column, a size legend shows five graduated circles spanning the data range, with radii proportional to the values they represent.

(def size-legend-pose
  (-> {:x [1 2 3 4 5] :y [1 2 3 4 5] :s [10 20 30 40 50]}
      (pj/lay-point :x :y {:size :s})))

size-legend-pose

The legend has 5 entries, each pairing a value with a circle of the corresponding radius. No size mapping means no size legend:

scatter-pose

Alpha Legend

When :alpha maps to a numerical column, an alpha legend shows graduated opacity squares – about five nice 1/2/5 breaks; the exact count depends on the range (here [0.1, 0.9] yields four).

(def alpha-legend-pose
  (-> {:x [1 2 3 4 5] :y [1 2 3 4 5] :a [0.1 0.3 0.5 0.7 0.9]}
      (pj/lay-point :x :y {:alpha :a})))

alpha-legend-pose

No alpha mapping means no alpha legend:

scatter-pose

Layout

Layout padding adjusts based on what elements are present – titles, axis labels, and legends each reserve their own space.

Compare a bare plot to one with title, labels, and legend:

scatter-pose

(def full-layout-pose
  (-> {:x [1 2 3 4 5 6]
       :y [3 5 4 7 6 8]
       :g ["a" "a" "a" "b" "b" "b"]}
      (pj/lay-point :x :y {:color :g})
      (pj/options {:title "My Plot"})))

full-layout-pose

The bare plot reserves no space for a title and no legend strip. The full plot adds padding above for the title and 100 pixels on the right for the legend.

Layout type is also inferred from the pose structure:

A single panel is :single
A facet grid (:facet-row or :facet-col) is :facet-grid
Multiple x-y pairs (scatter plot matrix) are :multi-variable

scatter-pose

Coordinate Flipping

Setting :coord :flip swaps the visual axes. The data stays the same – the categorical band that was on x ends up on y, with ticks and labels following along.

(def normal-pose
  (-> animals
      (pj/lay-value-bar :animal :count)))

normal-pose

(def flip-pose
  (-> animals
      (pj/lay-value-bar :animal :count)
      (pj/coord :flip)))

flip-pose

The categorical axis moved from x to y.

Labels are also swapped – the x-label and y-label follow their visual axis, not the data axis:

(def flipped-labels-pose
  (-> five-points
      (pj/lay-point :x :y)
      (pj/coord :flip)))

flipped-labels-pose

After flipping, the visual x-axis shows “y” and the visual y-axis shows “x” – labels track the visual axes.

Polar coordinates (:coord :polar) are covered separately – see the Polar Coordinates chapter for rose charts, radial bars, and related plots.

Multi-Layer Plots

When multiple layers share a panel, their domains are merged:

(def multi-pose
  (-> five-points
      (pj/pose :x :y)
      pj/lay-point
      (pj/lay-smooth {:stat :linear-model})))

multi-pose

Two layers – one :point, one :line – sharing the same domain. The line carries the regression curve as a polyline.

Resolution Overview

The diagram below sketches how the rules above combine – which inferences feed which others on the way from a pose to a rendered plot:

graph TD POSE["pose + options"] POSE --> CT["Column types"] POSE --> AE["Aesthetics"] CT --> GR["Grouping"] AE --> GR CT --> ME["Layer type"] GR --> STATS["Statistics"] ME --> STATS STATS --> DOM["Domains"] DOM --> TK["Ticks"] POSE --> LBL["Axis labels"] AE --> LEG["Color legend"] AE --> SLEG["Size legend"] AE --> ALEG["Alpha legend"] DOM --> LAYOUT["Layout"] LBL --> LAYOUT LEG --> LAYOUT SLEG --> LAYOUT ALEG --> LAYOUT DOM --> PLOT["Rendered plot"] TK --> PLOT LBL --> PLOT LEG --> PLOT SLEG --> PLOT ALEG --> PLOT LAYOUT --> PLOT STATS --> PLOT style POSE fill:#e8f5e9 style PLOT fill:#fff3e0 style STATS fill:#e3f2fd style DOM fill:#e3f2fd

Column types and aesthetic classification are the starting points; everything else flows from them. Statistics and domains together set the geometry; labels, legends, and layout round out the surrounding plot.

What’s Next

Layer Types – the full registry of marks, stats, and positions that inference selects from
Relationships – see inference in action on scatter, regression, and SPLOM

source: notebooks/plotje_book/inference_rules.clj