31  Edge Cases

This chapter tests how Plotje handles unusual or boundary inputs – missing values, extreme numbers, degenerate datasets, and uncommon configurations.

(ns plotje-book.edge-cases
  (:require
   ;; Rdatasets -- standard datasets
   [scicloj.metamorph.ml.rdatasets :as rdatasets]
   ;; Tablecloth -- dataset manipulation
   [tablecloth.api :as tc]
   ;; Kindly -- notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Plotje -- composable plotting
   [scicloj.plotje.api :as pj]
   ;; Fastmath -- random number generation
   [fastmath.random :as rng]
   ;; Java-time -- idiomatic date/time construction
   [java-time.api :as jt]
   ;; dtype-next datetime -- vectorized temporal arithmetic
   [tech.v3.datatype.datetime :as dt-dt]
   ;; dtype-next core -- const-reader for temporal sequences
   [tech.v3.datatype :as dtype]))

Data Shape

Missing Data

Rows with nil values are dropped before rendering.

(def with-missing
  {:x [1 2 nil 4 5 nil 7]
   :y [3 nil 5 6 nil 8 9]})
(-> with-missing
    (pj/lay-point :x :y))
yx12345673456789

Infinite Values

Rows with Double/POSITIVE_INFINITY or Double/NEGATIVE_INFINITY are filtered automatically with a warning – similar to log-scale filtering.

(def with-infinity
  {:x [1 2 3 4 5]
   :y [10.0 Double/POSITIVE_INFINITY 30.0 Double/NEGATIVE_INFINITY 50.0]})
(-> with-infinity
    (pj/lay-point :x :y))
yx1.01.52.02.53.03.54.04.55.0101520253035404550

Single Point

A lone data point should render without errors.

(-> {:x [3] :y [7]}
    (pj/lay-point :x :y))
yx2.02.22.42.62.83.03.23.43.63.84.06.06.26.46.66.87.07.27.47.67.88.0

Two Points with Regression

Regression requires at least 3 points. With only 2, the line is omitted and the points still render.

(-> {:x [1 10] :y [5 50]}
    (pj/lay-point :x :y)
    (pj/lay-smooth {:stat :linear-model}))
yx123456789105101520253035404550

Three Points with Regression

With 3 points, the regression line appears.

(-> {:x [1 5 10] :y [5 25 50]}
    (pj/lay-point :x :y)
    (pj/lay-smooth {:stat :linear-model}))
yx123456789105101520253035404550

Constant X

All x values are the same – the plot should still render.

(-> {:x [5 5 5 5 5] :y [1 2 3 4 5]}
    (pj/lay-point :x :y))
yx4.04.24.44.64.85.05.25.45.65.86.01.01.52.02.53.03.54.04.55.0

Constant Y

All y values are the same.

(-> {:x [1 2 3 4 5] :y [3 3 3 3 3]}
    (pj/lay-point :x :y))
yx1.01.52.02.53.03.54.04.55.02.02.22.42.62.83.03.23.43.63.84.0

Numeric Range

Negative Values

Data spanning positive and negative ranges.

(-> {:x [-5 -3 0 3 5] :y [-2 4 0 -4 2]}
    (pj/lay-point :x :y))
yx-5-4-3-2-1012345-4-3-2-101234

Very Large Values

(-> {:x [1e6 2e6 3e6] :y [1e9 2e9 3e9]}
    (pj/lay-point :x :y))
yx1000000120000014000001600000180000020000002200000240000026000002800000300000010000000001200000000140000000016000000001800000000200000000022000000002400000000260000000028000000003000000000

Very Small Values

(-> {:x [0.001 0.002 0.003] :y [0.0001 0.0002 0.0003]}
    (pj/lay-point :x :y))
yx0.0010.00120.00140.00160.00180.0020.00220.00240.00260.00280.0030.00010.000120.000140.000160.000180.00020.000220.000240.000260.000280.0003

Large Dataset

1000 random points, colored by group.

(def large-data
  (let [r (rng/rng :jdk 42)]
    {:x (repeatedly 1000 #(rng/drandom r))
     :y (repeatedly 1000 #(rng/drandom r))
     :group (repeatedly 1000 #([:a :b :c] (rng/irandom r 3)))}))
(-> large-data
    (pj/lay-point :x :y {:color :group}))
yxgroupacb0.00.20.40.60.81.00.00.10.20.30.40.50.60.70.80.91.0

Many Categories

A bar chart with 12 categories.

(-> (let [r (rng/rng :jdk 99)]
      {:category (map #(keyword (str "cat-" %)) (range 12))
       :value (repeatedly 12 #(+ 10 (rng/irandom r 90)))})
    (pj/lay-value-bar :category :value))
valuecategorycat-0cat-1cat-2cat-3cat-4cat-5cat-6cat-7cat-8cat-9cat-10cat-110102030405060708090100

Computed Columns

Derive a new column and plot it.

(-> (rdatasets/datasets-iris)
    (tc/map-columns :sepal-ratio [:sepal-length :sepal-width] /)
    (pj/lay-point :sepal-length :sepal-ratio {:color :species})
    (pj/options {:title "Sepal Length/Width Ratio"}))
Sepal Length/Width Ratiosepal ratiosepal lengthspeciessetosaversicolorvirginica4.55.05.56.06.57.07.58.01.21.41.61.82.02.22.42.62.83.0

Filtered Subset

Plot only one species.

(-> (rdatasets/datasets-iris)
    (tc/select-rows #(= "setosa" (% :species)))
    (pj/lay-point :sepal-length :sepal-width)
    (pj/lay-smooth {:stat :linear-model})
    (pj/options {:title "Setosa Only"}))
Setosa Onlysepal widthsepal length4.44.64.85.05.25.45.65.82.53.03.54.04.5

Position and Layout

Stacked bar – single group

Stack with only one color value – no actual stacking needed.

(-> {:category ["a" "b" "c"]
     :count [10 20 15]}
    (pj/lay-value-bar :category :count {:position :stack}))
countcategoryabc02468101214161820

Dodge – missing category in one group

Group “g1” has data for “a” and “b”, but “g2” only has “a”. Dodge should still align correctly.

(-> {:x ["a" "b" "a"]
     :g ["g1" "g1" "g2"]}
    (pj/lay-bar :x {:color :g}))
xgg1g2ab0.00.10.20.30.40.50.60.70.80.91.0

Fill – zero count category

One group has zero count for a category. Fill should handle the zero gracefully.

(-> {:x ["a" "a" "b" "b" "b"]
     :g ["g1" "g2" "g1" "g1" "g1"]}
    (pj/lay-bar :x {:position :fill :color :g}))
xgg1g2ab0.00.10.20.30.40.50.60.70.80.91.0

Nudge on scatter

Nudge-x on continuous data – shifts points without error.

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width {:nudge-x 0.1 :nudge-y -0.05}))
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Confidence ribbon – small n

Linear regression with se=true on exactly 3 points (minimum for lm – linear model).

(-> {:x [1 2 3] :y [2 4 5]}
    (pj/lay-point :x :y)
    (pj/lay-smooth {:stat :linear-model :confidence-band true}))
yx1.01.21.41.61.82.02.22.42.62.83.0-20246810

Stacked area – single series

Stack with a single color group – should render as a plain area.

(-> (let [r (rng/rng :jdk 55)]
      {:x (range 10)
       :y (repeatedly 10 #(rng/irandom r 20))})
    (pj/lay-area :x :y {:position :stack}))
yx0123456789024681012141618

Scale and Coordinate

Log scale with clean powers of 10

(-> {:x [1 10 100 1000 10000]
     :y [2 20 200 2000 20000]}
    (pj/lay-point :x :y)
    (pj/scale :x :log)
    (pj/scale :y :log))
yx110100100010000110100100010000100000

Log scale spanning decimals to large values

(-> {:x [0.001 0.01 0.1 1 10 100]
     :y [1 2 3 4 5 6]}
    (pj/lay-point :x :y)
    (pj/scale :x :log))
yx0.00010.0010.010.111010010001.01.52.02.53.03.54.04.55.05.56.0

Log scale with non-positive values

Non-positive values are filtered on log-scaled axes, since log requires positive inputs. Here x includes 0 and -1:

(-> {:x [0 -1 1 10 100] :y [1 2 3 4 5]}
    (pj/lay-point :x :y)
    (pj/scale :x :log))
yx1101003.03.23.43.63.84.04.24.44.64.85.0

Continuous color – constant value

All points have the same numeric color value. The gradient should still render and not divide by zero.

(-> {:x [1 2 3] :y [4 5 6] :c [5 5 5]}
    (pj/lay-point :x :y {:color :c}))
yxc5.0005.0001.01.21.41.61.82.02.22.42.62.83.04.04.24.44.64.85.05.25.45.65.86.0

Diverging color with midpoint at zero

(-> {:x (range 20)
     :y (map #(- % 10) (range 20))
     :val (map #(- % 10.0) (range 20))}
    (pj/lay-point :x :y {:color :val})
    (pj/options {:color-scale :diverging :color-midpoint 0}))
yxval-10.009.000024681012141618-10-8-6-4-202468

Dates with very narrow range (two days apart)

(-> {:date [(jt/local-date 2025 1 1)
            (jt/local-date 2025 1 2)]
     :val [10 20]}
    (pj/lay-point :date :val))
valdate01:0003:0005:0007:0009:0011:0013:0015:0017:0019:0021:0023:001011121314151617181920

Sub-day precision (LocalDateTime spanning hours)

LocalDateTime values preserve sub-day precision. Tick labels show HH:MM format when the range is less than a day.

(-> {:time (dt-dt/plus-temporal-amount
            (dtype/const-reader (jt/local-date-time 2025 3 15 8 0) 24)
            (map #(* (long %) 15) (range 24)) :minutes)
     :value (map #(+ 18.0 (* 4.0 (Math/sin (* % 0.3)))) (range 24))}
    (pj/lay-line :time :value)
    pj/lay-point)
valuetime08:0108:3909:1709:5510:3311:1111:4912:2713:0513:43141516171819202122

Instant with sub-day precision

java.time.Instant values are converted to LocalDateTime (UTC) for calendar-aware tick formatting. Tick labels show hours when the range spans less than a day.

(-> {:time (dt-dt/plus-temporal-amount
            (dtype/const-reader (jt/instant 1750003200000) 12)
            (range 12) :hours)
     :temp (map #(+ 20.0 (* 5.0 (Math/sin (* % 0.5)))) (range 12))}
    (pj/lay-line :time :temp)
    pj/lay-point)
temptime17:0018:0019:0020:0021:0022:0023:0000:0001:0002:001516171819202122232425

Multi-year date range

With a date range spanning several years, tick labels show year values.

(-> {:date (dt-dt/plus-temporal-amount
            (dtype/const-reader (jt/local-date 2020 1 1) 20)
            (map #(* (long %) 120) (range 20)) :days)
     :value (map #(+ 100 (* 50 (Math/sin (* % 0.4)))) (range 20))}
    (pj/lay-line :date :value)
    pj/lay-point)
valuedate2020-022020-112021-082022-052023-022023-112024-082025-052026-025060708090100110120130140150

Polar with many categories

(-> {:cat (map #(str "cat-" %) (range 12))
     :val (repeatedly 12 #(rand-int 100))}
    (pj/lay-value-bar :cat :val)
    (pj/coord :polar))

Log scale + coord flip combined

When log scale and coord flip are both applied, the panel should have log ticks on the (now vertical) axis and the domain should reflect the flipped layout.

(-> {:x [1 10 100 1000] :y [2 4 8 16]}
    (pj/lay-point :x :y)
    (pj/scale :x :log)
    (pj/coord :flip))
xy2468101214161101001000

Scale with explicit domain

(-> (rdatasets/datasets-iris)
    (pj/lay-point :sepal-length :sepal-width)
    (pj/scale :y {:domain [0 6]}))
sepal widthsepal length4.55.05.56.06.57.07.58.00123456

Fixed aspect ratio with extreme domain ratio

(-> {:x (range 100) :y (range 0 10 0.1)}
    (pj/lay-point :x :y)
    (pj/coord :fixed))
yx01020304050607080901000510

Full grid – cross plot

pj/cross produces a full NxN grid of panels. Column names appear as axis labels on each cell.

(-> (rdatasets/datasets-iris)
    (pj/pose (pj/cross [:sepal-length :sepal-width :petal-length]
                       [:sepal-length :sepal-width :petal-length])
             {:color :species}))
01062466246246sepal-lengthsepal-widthpetal-lengthsepal-lengthsepal-widthpetal-lengthspeciessetosaversicolorvirginica

Error Messages

Plotje produces clear error messages for common mistakes.

Non-existent column

(try
  (-> {:x [1 2 3] :y [4 5 6]}
      (pj/lay-point :nonexistent :y)
      pj/plot)
  (catch Exception e
    (ex-message e)))
"Column :nonexistent (from :x) not found in dataset. Available: (:x :y)"

Non-existent color column

(try
  (-> {:x [1 2 3] :y [4 5 6]}
      (pj/lay-point :x :y {:color :bogus})
      pj/plot)
  (catch Exception e
    (ex-message e)))
"Column :bogus (from :color) not found in dataset. Available: (:x :y)"

Unsupported polar mark

(try
  (-> {:x [1 2 3] :y [4 5 6]}
      (pj/lay-line :x :y)
      (pj/coord :polar)
      pj/plot)
  (catch Exception e
    (ex-message e)))
"Mark :line is not supported with polar coordinates. Supported polar marks: (:bar :point :rect :rug :text)"

Mismatched mark and stat

(try
  (-> {:x [1 2 3]}
      (pj/pose :x)
      (pj/lay {:mark :boxplot :stat :bin})
      pj/plot)
  (catch Exception e
    (ex-message e)))
"Stat result for :boxplot mark must contain :boxes, got keys: (:bins :max-count :x-domain :y-domain)"

x-only layer type with y column

Layer types that use only the x column (histogram, bar, density, rug) reject a y column with a clear message.

Histogram uses only the x column. Passing a y column is now an error:

(try
  (-> {:x [1 2 3] :y [4 5 6]}
      (pj/lay-histogram :x :y))
  (catch clojure.lang.ExceptionInfo e
    (ex-message e)))
"lay-histogram uses only the x column; do not pass a y column"
source: notebooks/plotje_book/edge_cases.clj