9 Inference Rules
Plotje infers many parameters automatically so you can write less and get reasonable defaults. This notebook walks each rule with a worked example: a small pose, the rendered plot, and a description of what was inferred. Every rule is also checked against the resolved plot on every run, so the claims here stay honest as the library evolves.
This chapter is a reference: each rule with its default and its override. For the conceptual overview, read Poses and Core Concepts first. The examples use small inline datasets so the relationships are easy to read at a glance.
(ns plotje-book.inference-rules
(:require
;; Tablecloth -- dataset manipulation
[tablecloth.api :as tc]
;; Kindly -- notebook rendering protocol
[scicloj.kindly.v4.kind :as kind]
;; Plotje -- composable plotting
[scicloj.plotje.api :as pj]
;; Rdatasets -- standard datasets
[scicloj.metamorph.ml.rdatasets :as rdatasets]))A Worked Example
Before the rule-by-rule tour, here is what βinferenceβ looks like in practice: a five-point scatter where Plotje filled in almost everything for us.
(def five-points
{:x [1.0 2.0 3.0 4.0 5.0]
:y [2.1 4.3 3.0 5.2 4.8]})(def scatter-pose
(-> five-points
(pj/lay-point :x :y)))scatter-poseNotice what was inferred:
The x-axis label
"x"and y-axis label"y", taken from the column keywordsA linear scale on each axis, since both columns are numeric
The data range
[1.0, 5.0]widened to[0.8, 5.2]β a 5% padding so the extreme points do not sit on the panel edgeRound tick values:
1.0, 1.5, 2.0, ...No legend, since no color mapping was given
A single point group rendered in the default color (dark gray,
#333)
Each of those decisions is its own inference rule, with a default and an explicit override.
Overrides at a Glance
Every inference rule has an explicit override. The table below lists them all β scan it to find what you need, then jump to the matching section for the details and worked examples.
| What is inferred | Default | Override |
|---|---|---|
| Column selection | one column fills x; two fill x, y; three fill x, y, color | explicit column args in pj/pose or pj/lay-* |
| Column type | dtype inspection | :x-type, :y-type, :color-type in pose or layer options |
| Aesthetic classification | keyword = column, string = color/column | explicit :color keyword vs hex string |
| Grouping | categorical color column | :group aesthetic |
| Layer type (mark + stat) | column types (see Layer Type section) | pj/lay-point, pj/lay-histogram, etc. |
| Domain extent | data range + 5% padding | (pj/scale pose :x {:domain [0 10]}) |
| Domain zero-anchor | bar/stacked charts include zero | (pj/scale pose :y {:domain [5 20]}) |
| Fill domain | [0.0, 1.0] for fill position |
(pj/scale pose :y {:domain [0 2]}) |
| Tick values | round intervals (linear), powers of 10 (log) | wadogo scale configuration |
| Tick labels | number formatting, calendar formatting | wadogo label formatting |
| Axis labels | column name, with underscores replaced by spaces | (pj/options {:x-label "Custom"}) |
| Color legend | categorical = discrete, numerical = continuous, none = no legend | :color mapping controls presence |
| Size legend | 5 graduated circles when :size maps to numerical column |
:size mapping controls presence |
| Alpha legend | 5 graduated opacity squares when :alpha maps to numerical column |
:alpha mapping controls presence |
| Layout padding | adjusts for title, labels, legend | :width, :height in options |
| Layout type | single, facet-grid, multi-variable | pj/facet, multiple x-y pairs |
| Coordinate system | :cartesian |
(pj/coord :flip), (pj/coord :polar) |
The sections below walk each rule in detail. The order roughly follows how a pose is resolved into a plot β column selection, column types, aesthetics, grouping, layer type, domains, ticks, labels, legends, layout, coord flip β with two cross-cutting closing sections on how the rules combine in multi-layer plots and a diagram of the full resolution flow.
Column Selection
When column names are omitted, Plotje infers them from the dataset shape:
| Number of columns | Inferred mapping |
|---|---|
| 1 | first column becomes x |
| 2 | first becomes x, second becomes y |
| 3 | first becomes x, second becomes y, third becomes color |
| 4+ | no inference β see the note below |
The same rule applies whether you start with pj/lay-* on raw data or pj/pose on raw data. Both read the first 1-3 columns of the dataset in the order they appear and build the mapping from there.
One column:
(-> {:values [1 2 3 4 5 6]}
pj/lay-histogram)Two columns:
(-> {:x [1 2 3 4 5] :y [2 4 3 5 4]}
pj/lay-point)Three columns β the third becomes :color:
(-> {:x [1 2 3 4] :y [4 5 6 7] :g ["a" "a" "b" "b"]}
pj/lay-point)Pose construction infers the same mapping
Calling pj/pose on raw data without explicit column arguments runs the same column-selection rule. A 1-3 column dataset gets its mapping filled in; the resulting pose carries the mapping but has no layer attached yet, so layer type inference (covered below) supplies the mark at render time.
(def two-col-pose
(pj/pose {:x [1.0 2.0 3.0 4.0 5.0]
:y [1.0 4.0 9.0 16.0 25.0]}))two-col-poseThe inferred mapping is visible on the pose itself:
(-> two-col-pose (select-keys [:mapping :layers]) kind/pprint){:mapping {:x :x, :y :y}, :layers []}4+ columns
With four or more columns there is no unambiguous default, so inference stops:
(pj/lay-* data)throws with a message listing the available columns, asking you to pass explicit:xand:y.(pj/pose data)is gentler β it builds a pose with the data attached but no mapping, so you can add one downstream with(pj/pose pose :col-a :col-b)or(pj/lay-point pose :col-a :col-b).
When you provide explicit columns, inference is skipped β you are in full control:
(-> (rdatasets/datasets-iris)
(pj/lay-point :petal-length :petal-width {:color :species}))Column Types
Once columns are selected, the next step is determining the type of each column β numerical, categorical, or temporal. This determines the scale type, domain, tick style, and the default mark.
| Column dtype | Inferred type |
|---|---|
| float, int | :numerical |
| string, keyword, boolean, symbol, text | :categorical |
| LocalDate, LocalDateTime, Instant, java.util.Date | :temporal (numerical, with calendar-aware ticks) |
A categorical column produces a band scale with string domain values. Compare:
(def animals
{:animal ["cat" "dog" "bird" "fish"]
:count [12 8 15 5]})(def bar-pose
(-> animals
(pj/lay-value-bar :animal :count)))bar-poseThe x-axis lays out the four animal names in order of appearance β strings, treated as a categorical band scale. The y-axis starts at zero because this is a bar chart.
Temporal columns
Dates are detected and converted to epoch-milliseconds internally, with calendar-aware tick labels. Clojureβs #inst reader literal is a convenient way to write dates:
(def temporal-pose
(-> {:date [#inst "2024-01-01" #inst "2024-06-01" #inst "2024-12-01"]
:val [10 25 18]}
(pj/lay-point :date :val)))temporal-poseThe x-axis carries epoch-millisecond numbers internally, but the 10 tick labels show human-readable dates like "Feb-01". Plotje accepts java.util.Date (from #inst), LocalDate, LocalDateTime, and Instant β all are converted to epoch-milliseconds for plotting, with calendar-aware tick formatting.
Overriding inferred types with :x-type / :y-type
Sometimes a numeric column is really categorical β for example, hours of the day, years, or subject IDs. The inference system sees numbers and treats them as numerical, but you may want discrete categorical bands. Pass :x-type :categorical (or :y-type) to the pose or layer options to override:
(def hour-bar-pose
(-> {:hour [9 10 11 12] :count [5 8 12 7]}
(pj/lay-value-bar :hour :count {:x-type :categorical})))hour-bar-poseFour bars at discrete hour bands. Without the override, lay-value-bar would reject the numeric :hour column; with it, the column is treated as categorical (values cast to strings for display). The same override exists for :y-type and for :color-type (see the Grouping section below for a :color-type example).
Aesthetic Resolution
The :color parameter triggers different behaviors depending on what you pass. Each aesthetic channel (:color, :size, :alpha, :text) is classified as either a column reference or a fixed literal.
Column reference β colored by palette
(def colored-pose
(-> {:x [1 2 3 4 5 6]
:y [3 5 4 7 6 8]
:g ["a" "a" "a" "b" "b" "b"]}
(pj/lay-point :x :y {:color :g})))colored-poseThe categorical column :g splits the data into two groups, each with its own color drawn from the palette. A legend appears on the right (100 pixels wide) and the panel shrinks to make room.
The next section explores why a categorical color column triggers grouping while a numeric color column does not.
Fixed color string β single color, no legend
(def fixed-color-pose
(-> five-points
(pj/lay-point :x :y {:color "#E74C3C"})))fixed-color-poseA literal hex string maps every point to that single color: no grouping, no legend, no legend strip. The hex was parsed into the RGBA tuple [0.906 0.298 0.235 1.0].
Named colors and string disambiguation
CSS color names like "red" and "steelblue" also work as fixed colors:
(-> five-points
(pj/lay-point :x :y {:color "steelblue"}))This raises a question: since :color also accepts column names as strings (like "species"), how does the system decide whether "red" means the column :red or the color red?
The rule is: check the dataset first. If the string matches a column name in the dataset, it is treated as a column reference. Otherwise, it is treated as a color value β first trying hex parsing, then CSS color name lookup.
Here is the full resolution order for a string :color value:
- If the string matches a dataset column, it is a column reference (grouping)
- If it starts with
#, it is a hex color ("#E74C3C","#F00") - If it parses as hex without
#, it is a hex color ("00FF00") - If it matches a CSS color name, it is a named color (
"red","steelblue") - Otherwise, error with a helpful message
In practice, ambiguity is rare. Column names like "species" or "temperature" are not valid CSS colors, and color names like "red" are unlikely column names. When true ambiguity exists, use a keyword for the column (:red) or a hex string for the color ("#FF0000").
Verify: "red" is a fixed color when the dataset has no red column:
(def red-color-pose
(-> five-points
(pj/lay-point :x :y {:color "red"})))red-color-poseNo legend, points drawn red β treated as a fixed color, not a column.
No color β default gray
The Worked Example at the top of the chapter shows this case: with no :color mapping, all points render in the default dark gray (#333) and no legend appears.
Grouping
The colored examples above all rest on the same concept: grouping controls how data is split into independent subsets. Each group gets its own visual elements β its own set of points, its own regression line, its own density curve, its own bar in a dodged layout.
Grouping can be derived (from a categorical :color mapping) or explicit (via the :group aesthetic).
Categorical color implies grouping
When :color maps to a categorical column (as with colored-pose above), the data is split into one group per category. Each group gets a distinct palette color and a legend entry:
colored-poseTwo groups, two legend entries β one per category in :g.
Numeric color does not create groups
When :color maps to a numerical column, data is NOT split. Instead, each point gets an individual color from a continuous gradient. There is one group, and the legend is continuous with 20 pre-computed color stops.
(def numeric-color-pose
(-> {:x [1 2 3 4 5]
:y [2 4 3 5 4]
:val [10 20 30 40 50]}
(pj/lay-point :x :y {:color :val})))numeric-color-poseA single group with a continuous legend of 20 color stops β the color is a visual encoding, not a grouping variable.
Overriding color type with :color-type
Sometimes a numeric column is really a categorical identifier β for example, subject IDs in a repeated-measures study. Inference treats numeric columns as continuous, but you want discrete groups. Setting :color-type :categorical overrides this so the column is treated as categorical despite its numeric dtype.
This is a core principle of the library: inference provides good defaults, but the user can always override.
(def study-data
{:subject [1 1 1 2 2 2 3 3 3]
:day [1 2 3 1 2 3 1 2 3]
:score [5 7 6 3 4 5 8 9 7]})Without override β one group, continuous gradient:
(def study-continuous-pose
(-> study-data
(pj/lay-line :day :score {:color :subject})))study-continuous-poseWith :color-type :categorical β three groups, one per subject:
(def study-categorical-pose
(-> study-data
(pj/lay-line :day :score {:color :subject
:color-type :categorical})))study-categorical-poseThe same data, the same columns β but :color-type :categorical changes inference from βone gradientβ to βthree distinct groups.β This affects grouping, line splitting, legend style, and palette assignment. The rendered plots look completely different:
(-> {:subject [1 1 1 2 2 2 3 3 3]
:day [1 2 3 1 2 3 1 2 3]
:score [5 7 6 3 4 5 8 9 7]}
(pj/lay-line :day :score {:color :subject
:color-type :categorical})
pj/lay-point
(pj/options {:title "Scores by Subject (categorical override)"}))Explicit grouping with :group
The :group aesthetic splits data into groups without assigning distinct colors or creating a legend. This is useful when you want per-group statistics but uniform appearance.
(def grouped-data
{:x [1 2 3 4 5 6]
:y [3 5 4 7 6 8]
:g ["a" "a" "a" "b" "b" "b"]})(def explicit-group-pose
(-> grouped-data
(pj/lay-point :x :y {:group :g})))explicit-group-poseTwo groups, but no legend and no color differentiation. Use :group when you need separate statistical fits but want a uniform visual style.
What grouping affects
Grouping determines how statistical transformations operate. Without grouping, (pj/lay-smooth {:stat :linear-model}) (linear model) fits one regression line through all the data. With grouping, it fits one line per group.
One regression line β no grouping:
(-> grouped-data
(pj/pose :x :y)
pj/lay-point
(pj/lay-smooth {:stat :linear-model}))Two regression lines β grouped by color:
(-> grouped-data
(pj/pose :x :y {:color :g})
pj/lay-point
(pj/lay-smooth {:stat :linear-model}))The same applies to other statistics: density curves, LOESS smoothers, boxplots, and dodge/stack positioning all operate per group.
Layer Type
When you use pj/pose without an explicit pj/lay-* call, Plotje infers the layer type β a mark + stat bundle β from the column types of the referenced columns.
Single-column cases
| Column type | Inferred | Mark + stat |
|---|---|---|
| numerical | histogram | :bar + :bin |
| temporal | histogram (over epoch-ms, with calendar-aware ticks) | :bar + :bin |
| categorical | bar chart of category counts | :rect + :count |
Two-column cases
| x type | y type | Inferred | Mark + stat |
|---|---|---|---|
| numerical | numerical | scatter | :point + :identity |
| temporal | numerical | time-series line | :line + :identity |
| categorical | numerical | boxplot (vertical) | :boxplot + :boxplot |
| numerical | categorical | boxplot (horizontal) | :boxplot + :boxplot |
| any other pair | scatter (fallback) | :point + :identity |
Fallback pairs include temporal x + categorical y, categorical x + categorical y, and temporal x + temporal y. These are rarer in practice, and giving them a dedicated inference is deferred. You can always override with an explicit pj/lay-* call; the inferred layer type is only a default.
When you use pj/lay-point, pj/lay-histogram, etc., the layer typeβs stat takes precedence β column-type inference is bypassed.
A single numerical column produces a histogram:
(def hist-pose
(-> five-points
(pj/pose :x)))hist-poseThe inferred layer is a histogram β a :bar mark fed by the :bin stat, so the data is binned into rectangles before rendering.
A single temporal column also becomes a histogram, binned over epoch-milliseconds with calendar-aware tick labels:
(def temporal-hist-pose
(-> {:date [#inst "2024-01-01" #inst "2024-02-01" #inst "2024-03-01"
#inst "2024-04-01" #inst "2024-05-01"]}
(pj/pose :date)))temporal-hist-poseA single categorical column produces a bar chart of counts:
(def count-pose
(-> animals
(pj/pose :animal)))count-poseThe inferred layer uses a :rect mark fed by the :count stat, which tallied each of the 4 categories.
Two numerical columns produce a scatter (the chapterβs opening scatter-pose is such a pose):
(def num-num-pose
(-> five-points (pj/pose :x :y)))num-num-poseA temporal x with a numerical y infers a time-series line. Row order is preserved, so pre-sort temporal data to avoid zigzag:
(def ts-line-pose
(-> {:date [#inst "2024-01-01" #inst "2024-02-01" #inst "2024-03-01"]
:val [10 25 18]}
(pj/pose :date :val)))ts-line-poseA categorical x with a numerical y infers a boxplot β the default for summarizing a distribution across groups:
(def boxplot-pose
(-> {:species ["a" "a" "a" "b" "b" "b" "c" "c" "c"]
:val [8 10 12 18 20 22 14 15 17]}
(pj/pose :species :val)))boxplot-poseA numerical x with a categorical y infers a horizontal boxplot β the same summary laid out with the category axis on y:
(def horizontal-boxplot-pose
(-> {:val [8 10 12 18 20 22 14 15 17]
:species ["a" "a" "a" "b" "b" "b" "c" "c" "c"]}
(pj/pose :val :species)))horizontal-boxplot-poseDomains
Numerical domains extend 5% beyond the data range so points arenβt clipped at the edges.
scatter-poseThe x-domain is [0.8, 5.2] β the data range [1.0, 5.0] plus 0.2 padding on each side (5% of the data range, 4.0).
Special domain rules apply in certain contexts:
Bar chart y-domains always include zero:
bar-posePercentage-filled layers normalize the y-domain to [0.0, 1.0]:
(def fill-pose
(-> {:x ["a" "a" "b" "b"]
:g ["m" "n" "m" "n"]}
(pj/lay-bar :x {:position :fill :color :g})))fill-poseThe y-domain is exactly [0.0, 1.0] β each category sums to 100%.
Multi-layer plots merge domains across layers β see βMulti-Layer Plotsβ below.
Ticks
Once domains are computed, Plotje selects βniceβ round tick values. The logic depends on the scale type:
Linear β wadogo selects ticks at round intervals (1, 2, 2.5, 5, β¦)
Log β 1-2-5 nice numbers: powers of 10 when they give at least 3 ticks, otherwise intermediates at 1-2-5 or 1-2-3-5 multiples per decade
Categorical β tick at each category, in order of appearance
Temporal β calendar-aware snapping (year, month, day, hour) with adaptive formatting
Linear ticks for the scatter example:
scatter-poseNine ticks from 1.0 to 5.0 at 0.5 intervals β round and readable.
Log ticks for a multi-decade range:
(def log-scale-pose
(-> {:x [0.1 1.0 10.0 100.0 1000.0]
:y [5 10 15 20 25]}
(pj/lay-point :x :y)
(pj/scale :x :log)))log-scale-poseFive ticks at exact powers of 10 β no irrational intermediates. Whole numbers display without decimals, sub-1 values use minimal decimal places.
Categorical ticks match domain order:
bar-poseAxis Labels
Labels come from column names. Underscores and hyphens become spaces.
(def iris-label-pose
(-> (rdatasets/datasets-iris)
(pj/lay-point :sepal-length :sepal-width)))iris-label-poseWhen only one column is specified, the y-axis shows computed counts. The system omits the y-label since it would repeat the column name:
(def x-only-pose
(-> five-points (pj/pose :x)))x-only-poseExplicit labels override inference:
(def explicit-label-pose
(-> five-points
(pj/lay-point :x :y)
(pj/options {:x-label "Length (cm)" :y-label "Width (cm)"})))explicit-label-poseLegends
A legend appears when a column is mapped to color. Three cases:
A categorical color mapping produces a discrete legend with one entry per category:
colored-poseThe legendβs title is the column name; each entry has a :label and a palette color.
No color mapping means no legend:
scatter-poseA fixed color string also suppresses the legend:
fixed-color-poseA numeric color mapping produces a continuous legend (gradient bar):
(def continuous-color-pose
(-> {:x [1 2 3] :y [4 5 6] :val [10 20 30]}
(pj/lay-point :x :y {:color :val})))continuous-color-poseSize Legend
When :size maps to a numerical column, a size legend shows five graduated circles spanning the data range, with radii proportional to the values they represent.
(def size-legend-pose
(-> {:x [1 2 3 4 5] :y [1 2 3 4 5] :s [10 20 30 40 50]}
(pj/lay-point :x :y {:size :s})))size-legend-poseThe legend has 5 entries, each pairing a value with a circle of the corresponding radius. No size mapping means no size legend:
scatter-poseAlpha Legend
When :alpha maps to a numerical column, an alpha legend shows graduated opacity squares β about five nice 1/2/5 breaks; the exact count depends on the range (here [0.1, 0.9] yields four).
(def alpha-legend-pose
(-> {:x [1 2 3 4 5] :y [1 2 3 4 5] :a [0.1 0.3 0.5 0.7 0.9]}
(pj/lay-point :x :y {:alpha :a})))alpha-legend-poseNo alpha mapping means no alpha legend:
scatter-poseLayout
Layout padding adjusts based on what elements are present β titles, axis labels, and legends each reserve their own space.
Compare a bare plot to one with title, labels, and legend:
scatter-pose(def full-layout-pose
(-> {:x [1 2 3 4 5 6]
:y [3 5 4 7 6 8]
:g ["a" "a" "a" "b" "b" "b"]}
(pj/lay-point :x :y {:color :g})
(pj/options {:title "My Plot"})))full-layout-poseThe bare plot reserves no space for a title and no legend strip. The full plot adds padding above for the title and 100 pixels on the right for the legend.
Layout type is also inferred from the pose structure:
- A single panel is
:single - A facet grid (
:facet-rowor:facet-col) is:facet-grid - Multiple x-y pairs (scatter plot matrix) are
:multi-variable
scatter-poseCoordinate Flipping
Setting :coord :flip swaps the visual axes. The data stays the same β the categorical band that was on x ends up on y, with ticks and labels following along.
(def normal-pose
(-> animals
(pj/lay-value-bar :animal :count)))normal-pose(def flip-pose
(-> animals
(pj/lay-value-bar :animal :count)
(pj/coord :flip)))flip-poseThe categorical axis moved from x to y.
Labels are also swapped β the x-label and y-label follow their visual axis, not the data axis:
(def flipped-labels-pose
(-> five-points
(pj/lay-point :x :y)
(pj/coord :flip)))flipped-labels-poseAfter flipping, the visual x-axis shows βyβ and the visual y-axis shows βxβ β labels track the visual axes.
Polar coordinates (:coord :polar) are covered separately β see the Polar Coordinates chapter for rose charts, radial bars, and related plots.
Multi-Layer Plots
When multiple layers share a panel, their domains are merged:
(def multi-pose
(-> five-points
(pj/pose :x :y)
pj/lay-point
(pj/lay-smooth {:stat :linear-model})))multi-poseTwo layers β one :point, one :line β sharing the same domain. The line carries the regression curve as a polyline.
Resolution Overview
The diagram below sketches how the rules above combine β which inferences feed which others on the way from a pose to a rendered plot:
Column types and aesthetic classification are the starting points; everything else flows from them. Statistics and domains together set the geometry; labels, legends, and layout round out the surrounding plot.
Whatβs Next
- Layer Types β the full registry of marks, stats, and positions that inference selects from
- Relationships β see inference in action on scatter, regression, and SPLOM