25  Edge Cases

This chapter tests how napkinsketch handles unusual or boundary inputs — missing values, extreme numbers, degenerate datasets, and uncommon configurations.

Testing robustness: missing data, extreme values, small datasets, many categories, computed columns, and other tricky scenarios.

(ns napkinsketch-book.edge-cases
  (:require
   ;; Shared datasets for these docs
   [napkinsketch-book.datasets :as data]
   ;; Tablecloth — dataset manipulation
   [tablecloth.api :as tc]
   ;; Kindly — notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Napkinsketch — composable plotting
   [scicloj.napkinsketch.api :as sk]
   ;; Fastmath — random number generation
   [fastmath.random :as rng]
   ;; Java-time — idiomatic date/time construction
   [java-time.api :as jt]
   ;; dtype-next datetime — vectorized temporal arithmetic
   [tech.v3.datatype.datetime :as dt-dt]
   ;; dtype-next core — const-reader for temporal sequences
   [tech.v3.datatype :as dtype]))

Missing Data

Rows with nil values are dropped gracefully.

(def with-missing
  {:x [1 2 nil 4 5 nil 7]
   :y [3 nil 5 6 nil 8 9]})
(-> with-missing
    (sk/lay-point :x :y))
yx12345673456789

Infinite Values

Rows with Double/POSITIVE_INFINITY or Double/NEGATIVE_INFINITY are filtered automatically with a warning — similar to log-scale filtering.

(def with-infinity
  {:x [1 2 3 4 5]
   :y [10.0 Double/POSITIVE_INFINITY 30.0 Double/NEGATIVE_INFINITY 50.0]})
(-> with-infinity
    (sk/lay-point :x :y))
yx1.01.52.02.53.03.54.04.55.0101520253035404550

Single Point

A lone data point should render without errors.

(-> {:x [3] :y [7]}
    (sk/lay-point :x :y))
yx2.02.22.42.62.83.03.23.43.63.84.06.06.26.46.66.87.07.27.47.67.88.0

Two Points with Regression

Regression requires at least 3 points. With only 2, the line is gracefully omitted.

(-> {:x [1 10] :y [5 50]}
    (sk/lay-point :x :y)
    sk/lay-lm)
yx123456789105101520253035404550

Three Points with Regression

With 3 points, the regression line appears.

(-> {:x [1 5 10] :y [5 25 50]}
    (sk/lay-point :x :y)
    sk/lay-lm)
yx123456789105101520253035404550

Constant X

All x values are the same — the plot should still render.

(-> {:x [5 5 5 5 5] :y [1 2 3 4 5]}
    (sk/lay-point :x :y))
yx4.04.24.44.64.85.05.25.45.65.86.01.01.52.02.53.03.54.04.55.0

Constant Y

All y values are the same.

(-> {:x [1 2 3 4 5] :y [3 3 3 3 3]}
    (sk/lay-point :x :y))
yx1.01.52.02.53.03.54.04.55.02.02.22.42.62.83.03.23.43.63.84.0

Negative Values

Data spanning positive and negative ranges.

(-> {:x [-5 -3 0 3 5] :y [-2 4 0 -4 2]}
    (sk/lay-point :x :y))
yx-5-4-3-2-1012345-4-3-2-101234

Very Large Values

(-> {:x [1e6 2e6 3e6] :y [1e9 2e9 3e9]}
    (sk/lay-point :x :y))
yx1000000120000014000001600000180000020000002200000240000026000002800000300000010000000001200000000140000000016000000001800000000200000000022000000002400000000260000000028000000003000000000

Very Small Values

(-> {:x [0.001 0.002 0.003] :y [0.0001 0.0002 0.0003]}
    (sk/lay-point :x :y))
yx0.0010.00120.00140.00160.00180.0020.00220.00240.00260.00280.0030.00010.000120.000140.000160.000180.00020.000220.000240.000260.000280.0003

Large Dataset

1000 random points, colored by group.

(def large-data
  (let [r (rng/rng :jdk 42)]
    {:x (repeatedly 1000 #(rng/drandom r))
     :y (repeatedly 1000 #(rng/drandom r))
     :group (repeatedly 1000 #([:a :b :c] (rng/irandom r 3)))}))
(-> large-data
    (sk/lay-point :x :y {:color :group}))
yxgroup:a:c:b0.00.10.20.30.40.50.60.70.80.91.00.00.10.20.30.40.50.60.70.80.91.0

Many Categories

A bar chart with 12 categories.

(-> (let [r (rng/rng :jdk 99)]
      {:category (map #(keyword (str "cat-" %)) (range 12))
       :value (repeatedly 12 #(+ 10 (rng/irandom r 90)))})
    (sk/lay-value-bar :category :value))
valuecategory:cat-0:cat-1:cat-2:cat-3:cat-4:cat-5:cat-6:cat-7:cat-8:cat-9:cat-10:cat-110102030405060708090100

Computed Columns

Derive a new column and plot it.

(-> data/iris
    (tc/map-columns :sepal_ratio [:sepal_length :sepal_width] /)
    (sk/lay-point :sepal_length :sepal_ratio {:color :species})
    (sk/options {:title "Sepal Length/Width Ratio"}))
Sepal Length/Width Ratiosepal ratiosepal lengthspeciessetosaversicolorvirginica4.55.05.56.06.57.07.58.01.21.41.61.82.02.22.42.62.83.0

Filtered Subset

Plot only one species.

(-> data/iris
    (tc/select-rows #(= "setosa" (% :species)))
    (sk/lay-point :sepal_length :sepal_width)
    sk/lay-lm
    (sk/options {:title "Setosa Only"}))
Setosa Onlysepal widthsepal length4.44.64.85.05.25.45.65.82.22.42.62.83.03.23.43.63.84.04.24.4

Position Edge Cases

Stacked bar — single group

Stack with only one color value — no actual stacking needed.

(-> {:category ["a" "b" "c"]
     :count [10 20 15]}
    (sk/lay-value-bar :category :count {:position :stack}))
countcategoryabc02468101214161820

Dodge — missing category in one group

Group “g1” has data for “a” and “b”, but “g2” only has “a”. Dodge should still align correctly.

(-> {:x ["a" "b" "a"]
     :g ["g1" "g1" "g2"]}
    (sk/lay-bar :x {:color :g}))
xgg1g2ab0.00.10.20.30.40.50.60.70.80.91.0

Fill — zero count category

One group has zero count for a category. Fill should handle the zero gracefully.

(-> {:x ["a" "a" "b" "b" "b"]
     :g ["g1" "g2" "g1" "g1" "g1"]}
    (sk/lay-stacked-bar-fill :x {:color :g}))
xgg1g2ab0.00.10.20.30.40.50.60.70.80.91.0

Nudge on scatter

Nudge-x on continuous data — shifts points without error.

(-> data/iris
    (sk/lay-point :sepal_length :sepal_width {:nudge-x 0.1 :nudge-y -0.05}))
sepal widthsepal length4.55.05.56.06.57.07.58.02.02.53.03.54.04.5

Confidence ribbon — small n

Linear regression with se=true on exactly 3 points (minimum for lm — linear model).

(-> {:x [1 2 3] :y [2 4 5]}
    (sk/lay-point :x :y)
    (sk/lay-lm {:se true}))
yx1.01.21.41.61.82.02.22.42.62.83.0-20246810

Stacked area — single series

Stack with a single color group — should render as a plain area.

(-> (let [r (rng/rng :jdk 55)]
      {:x (range 10)
       :y (repeatedly 10 #(rng/irandom r 20))})
    (sk/lay-stacked-area :x :y))
yx0123456789024681012141618

Log Scale Edge Cases

Log scale with clean powers of 10

(-> {:x [1 10 100 1000 10000]
     :y [2 20 200 2000 20000]}
    (sk/lay-point :x :y)
    (sk/scale :x :log)
    (sk/scale :y :log))
yx110100100010000110100100010000100000

Log scale spanning decimals to large values

(-> {:x [0.001 0.01 0.1 1 10 100]
     :y [1 2 3 4 5 6]}
    (sk/lay-point :x :y)
    (sk/scale :x :log))
yx0.00010.0010.010.111010010001.01.52.02.53.03.54.04.55.05.56.0

Log scale with non-positive values

Non-positive values are filtered on log-scaled axes (following ggplot2 behavior). Here x includes 0 and -1:

(-> {:x [0 -1 1 10 100] :y [1 2 3 4 5]}
    (sk/lay-point :x :y)
    (sk/scale :x :log))
yx1101003.03.23.43.63.84.04.24.44.64.85.0

Continuous Color Edge Cases

Continuous color — constant value

All points have the same numeric color value. The gradient should still render and not divide by zero.

(-> {:x [1 2 3] :y [4 5 6] :c [5 5 5]}
    (sk/lay-point :x :y {:color :c}))
yxc5.0005.0001.01.21.41.61.82.02.22.42.62.83.04.04.24.44.64.85.05.25.45.65.86.0

Diverging color with midpoint at zero

(-> {:x (range 20)
     :y (map #(- % 10) (range 20))
     :val (map #(- % 10.0) (range 20))}
    (sk/lay-point :x :y {:color :val})
    (sk/options {:color-scale :diverging :color-midpoint 0}))
yxval-10.009.000024681012141618-10-8-6-4-202468

Temporal Scale Edge Cases

Dates with very narrow range (two days apart)

(-> {:date [(jt/local-date 2025 1 1)
            (jt/local-date 2025 1 2)]
     :val [10 20]}
    (sk/lay-point :date :val))
valdate01:0003:0005:0007:0009:0011:0013:0015:0017:0019:0021:0023:001011121314151617181920

Sub-day precision (LocalDateTime spanning hours)

LocalDateTime values preserve sub-day precision. Tick labels show HH:MM format when the range is less than a day.

(-> {:time (dt-dt/plus-temporal-amount
            (dtype/const-reader (jt/local-date-time 2025 3 15 8 0) 24)
            (map #(* (long %) 15) (range 24)) :minutes)
     :value (map #(+ 18.0 (* 4.0 (Math/sin (* % 0.3)))) (range 24))}
    (sk/lay-line :time :value)
    sk/lay-point)
valuetime08:0108:3909:1709:5510:3311:1111:4912:2713:0513:43141516171819202122

Instant with sub-day precision

java.time.Instant values are converted to LocalDateTime (UTC) for calendar-aware tick formatting. Tick labels show hours when the range spans less than a day.

(-> {:time (dt-dt/plus-temporal-amount
            (dtype/const-reader (jt/instant 1750003200000) 12)
            (range 12) :hours)
     :temp (map #(+ 20.0 (* 5.0 (Math/sin (* % 0.5)))) (range 12))}
    (sk/lay-line :time :temp)
    sk/lay-point)
temptime17:0018:0019:0020:0021:0022:0023:0000:0001:0002:001516171819202122232425

Multi-year date range

With a date range spanning several years, tick labels show year values.

(-> {:date (dt-dt/plus-temporal-amount
            (dtype/const-reader (jt/local-date 2020 1 1) 20)
            (map #(* (long %) 120) (range 20)) :days)
     :value (map #(+ 100 (* 50 (Math/sin (* % 0.4)))) (range 20))}
    (sk/lay-line :date :value)
    sk/lay-point)
valuedate2020-022020-102021-062022-022022-102023-062024-022024-102025-062026-025060708090100110120130140150

Coordinate Edge Cases

Polar with many categories

(-> {:cat (map #(str "cat-" %) (range 12))
     :val (repeatedly 12 #(rand-int 100))}
    (sk/lay-value-bar :cat :val)
    (sk/coord :polar))

Fixed aspect ratio with extreme domain ratio

(-> {:x (range 100) :y (range 0 10 0.1)}
    (sk/lay-point :x :y)
    (sk/coord :fixed))
yx01020304050607080901000510

Multi-Panel Edge Cases

Full grid — cross plot

sk/cross produces a full N×N grid of panels. Strip labels must appear for every column and row.

(-> data/iris
    (sk/view (sk/cross [:sepal_length :sepal_width :petal_length] [:sepal_length :sepal_width :petal_length]))
    (sk/lay-point {:color :species}))
speciessetosaversicolorvirginica5678234682462345sepal lengthsepal widthpetal lengthsepal lengthsepal widthpetal length

Error Messages

Napkinsketch produces clear error messages for common mistakes.

Non-existent column

(try
  (-> {:x [1 2 3] :y [4 5 6]}
      (sk/lay-point :nonexistent :y)
      sk/plot)
  (catch Exception e
    (ex-message e)))
"Column :nonexistent (from :x) not found in dataset. Available: (:x :y)"

Non-existent color column

(try
  (-> {:x [1 2 3] :y [4 5 6]}
      (sk/lay-point :x :y {:color :bogus})
      sk/plot)
  (catch Exception e
    (ex-message e)))
"Column :bogus (from :color) not found in dataset. Available: (:x :y)"

Unsupported polar mark

(try
  (-> {:x [1 2 3] :y [4 5 6]}
      (sk/lay-line :x :y)
      (sk/coord :polar)
      sk/plot)
  (catch Exception e
    (ex-message e)))
"Mark :line is not supported with polar coordinates. Supported polar marks: (:bar :point :rect :rug :text)"

Mismatched mark and stat

(try
  (-> {:x [1 2 3]}
      (sk/view :x)
      (sk/lay {:mark :boxplot :stat :bin})
      sk/plot)
  (catch Exception e
    (ex-message e)))
"Stat result for :boxplot mark must contain :boxes, got keys: (:bins :max-count :x-domain :y-domain)"

x-only method with y column

Methods that use only the x column (histogram, bar, density, rug) reject a y column with a clear message.

(try
  (-> {:x [1 2 3] :y [4 5 6]}
      (sk/lay-histogram :x :y))
  (catch Exception e
    (ex-message e)))
"lay-histogram uses only the x column; :y column :y is not supported. Use (lay-histogram data :x) instead."
source: notebooks/napkinsketch_book/edge_cases.clj