3  Datasets

This namespace loads the standard datasets used throughout the book. Other notebooks require this namespace to avoid redundant loading.

All datasets come from the seaborn-data collection.

(ns napkinsketch-book.datasets
  (:require
   ;; Tablecloth — dataset manipulation
   [tablecloth.api :as tc]
   ;; Kindly — notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]))

Iris

150 iris flower measurements (sepal and petal length/width) across three species: setosa, versicolor, and virginica.

(def iris
  (tc/dataset "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
              {:key-fn keyword}))
iris

https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv [150 5]:

:sepal_length :sepal_width :petal_length :petal_width :species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica

Tips

244 restaurant bills with tip amount, party size, day, time, and smoker status.

(def tips
  (tc/dataset "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
              {:key-fn keyword}))
tips

https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv [244 7]:

:total_bill :tip :sex :smoker :day :time :size
16.99 1.01 Female No Sun Dinner 2
10.34 1.66 Male No Sun Dinner 3
21.01 3.50 Male No Sun Dinner 3
23.68 3.31 Male No Sun Dinner 2
24.59 3.61 Female No Sun Dinner 4
25.29 4.71 Male No Sun Dinner 4
8.77 2.00 Male No Sun Dinner 2
26.88 3.12 Male No Sun Dinner 4
15.04 1.96 Male No Sun Dinner 2
14.78 3.23 Male No Sun Dinner 2
10.77 1.47 Male No Sat Dinner 2
15.53 3.00 Male Yes Sat Dinner 2
10.07 1.25 Male No Sat Dinner 2
12.60 1.00 Male Yes Sat Dinner 2
32.83 1.17 Male Yes Sat Dinner 2
35.83 4.67 Female No Sat Dinner 3
29.03 5.92 Male No Sat Dinner 3
27.18 2.00 Female Yes Sat Dinner 2
22.67 2.00 Male Yes Sat Dinner 2
17.82 1.75 Male No Sat Dinner 2
18.78 3.00 Female No Thur Dinner 2

Penguins

344 penguin measurements (bill, flipper, body mass) across three species on three islands in the Palmer Archipelago.

(def penguins
  (tc/dataset "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv"
              {:key-fn keyword}))
penguins

https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv [344 7]:

:species :island :bill_length_mm :bill_depth_mm :flipper_length_mm :body_mass_g :sex
Adelie Torgersen 39.1 18.7 181 3750 MALE
Adelie Torgersen 39.5 17.4 186 3800 FEMALE
Adelie Torgersen 40.3 18.0 195 3250 FEMALE
Adelie Torgersen
Adelie Torgersen 36.7 19.3 193 3450 FEMALE
Adelie Torgersen 39.3 20.6 190 3650 MALE
Adelie Torgersen 38.9 17.8 181 3625 FEMALE
Adelie Torgersen 39.2 19.6 195 4675 MALE
Adelie Torgersen 34.1 18.1 193 3475
Adelie Torgersen 42.0 20.2 190 4250
Gentoo Biscoe 51.5 16.3 230 5500 MALE
Gentoo Biscoe 46.2 14.1 217 4375 FEMALE
Gentoo Biscoe 55.1 16.0 230 5850 MALE
Gentoo Biscoe 44.5 15.7 217 4875
Gentoo Biscoe 48.8 16.2 222 6000 MALE
Gentoo Biscoe 47.2 13.7 214 4925 FEMALE
Gentoo Biscoe
Gentoo Biscoe 46.8 14.3 215 4850 FEMALE
Gentoo Biscoe 50.4 15.7 222 5750 MALE
Gentoo Biscoe 45.2 14.8 212 5200 FEMALE
Gentoo Biscoe 49.9 16.1 213 5400 MALE

MPG

398 automobile records from the StatLib library: fuel efficiency, engine displacement, horsepower, weight, and model year.

(def mpg
  (tc/dataset "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv"
              {:key-fn keyword}))
mpg

https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv [398 9]:

:mpg :cylinders :displacement :horsepower :weight :acceleration :model_year :origin :name
18.0 8 307.0 130.0 3504 12.0 70 usa chevrolet chevelle malibu
15.0 8 350.0 165.0 3693 11.5 70 usa buick skylark 320
18.0 8 318.0 150.0 3436 11.0 70 usa plymouth satellite
16.0 8 304.0 150.0 3433 12.0 70 usa amc rebel sst
17.0 8 302.0 140.0 3449 10.5 70 usa ford torino
15.0 8 429.0 198.0 4341 10.0 70 usa ford galaxie 500
14.0 8 454.0 220.0 4354 9.0 70 usa chevrolet impala
14.0 8 440.0 215.0 4312 8.5 70 usa plymouth fury iii
14.0 8 455.0 225.0 4425 10.0 70 usa pontiac catalina
15.0 8 390.0 190.0 3850 8.5 70 usa amc ambassador dpl
38.0 6 262.0 85.0 3015 17.0 82 usa oldsmobile cutlass ciera (diesel)
26.0 4 156.0 92.0 2585 14.5 82 usa chrysler lebaron medallion
22.0 6 232.0 112.0 2835 14.7 82 usa ford granada l
32.0 4 144.0 96.0 2665 13.9 82 japan toyota celica gt
36.0 4 135.0 84.0 2370 13.0 82 usa dodge charger 2.2
27.0 4 151.0 90.0 2950 17.3 82 usa chevrolet camaro
27.0 4 140.0 86.0 2790 15.6 82 usa ford mustang gl
44.0 4 97.0 52.0 2130 24.6 82 europe vw pickup
32.0 4 135.0 84.0 2295 11.6 82 usa dodge rampage
28.0 4 120.0 79.0 2625 18.6 82 usa ford ranger
31.0 4 119.0 82.0 2720 19.4 82 usa chevy s-10
source: notebooks/napkinsketch_book/datasets.clj