10  Statistical Visualization (experimental 🛠)

author: Daniel Slutsky

(ns noj-book.statistical-visualization
  (:require [aerial.hanami.templates :as ht]
            [noj-book.datasets :as datasets]
            [scicloj.kindly.v4.kind :as kind]
            [scicloj.noj.v1.stats :as stats]
            [scicloj.noj.v1.vis.hanami :as vis.hanami]
            [scicloj.noj.v1.vis.stats :as vis.stats]
            [tablecloth.api :as tc]))

10.1 Linear regression

(-> datasets/mtcars
    (stats/add-predictions :mpg [:wt]
                           {:model-type :smile.regression/ordinary-least-square})
    (vis.hanami/combined-plot
     ht/layer-chart
     {:X :wt
      :MSIZE 200
      :HEIGHT 200}
     :LAYER [[ht/point-chart
              {:Y :mpg
               :WIDTH 200}]
             [ht/line-chart
              {:Y :mpg-prediction
               :MSIZE 5
               :MCOLOR "purple"
               :YTITLE :mpg}]]))

Alternatively:

(-> datasets/mtcars
    (vis.stats/linear-regression-plot
     :mpg :wt
     {:HEIGHT 200
      :WIDTH 200
      :point-options {:MSIZE 200}
      :line-options {:MSIZE 5
                     :MCOLOR "purple"}}))

And in a grouped dataset case:

(-> datasets/mtcars
    (tc/group-by [:gear])
    (vis.stats/linear-regression-plot
     :mpg :wt
     {:HEIGHT 200
      :WIDTH 200
      :point-options {:MSIZE 200}
      :line-options {:MSIZE 5
                     :MCOLOR "purple"}}))
gear plot
4
3
5

10.2 Histogram

A histogram groups values in bins, counts them, and creates a corresponding bar-chart.

The vis.stats/histogram functions does that behind the scenes, and generates a Vega-Lite spec using Hanami.

(-> datasets/iris
    (vis.stats/histogram :sepal-width
                         {:nbins 10}))
(-> datasets/iris
    (vis.stats/histogram :sepal-width
                         {:nbins 10})
    kind/pprint)
{:encoding
 {:y {:field :count, :type "quantitative"},
  :x
  {:field :left,
   :type "quantitative",
   :title :sepal-width,
   :bin {:binned true, :step 0.24000000000000005}},
  :x2 {:field :right, :type "quantitative"}},
 :usermeta {:embedOptions {:renderer :svg}},
 :mark {:type "bar", :tooltip true},
 :width 400,
 :background "floralwhite",
 :height 300,
 :data
 {:values
  "left,right,count\n2.0,2.24,4\n2.24,2.48,7\n2.48,2.72,22\n2.72,2.96,24\n2.96,3.2,37\n3.2,3.4400000000000004,31\n3.4400000000000004,3.68,10\n3.68,3.9200000000000004,11\n3.9200000000000004,4.16,2\n4.16,4.4,2\n",
  :format {:type "csv"}}}

The resulting spec can be customized further:

(-> datasets/iris
    (vis.stats/histogram :sepal-width
                         {:nbins 10})
    ;; varying the resulting vega-lite spec:
    (assoc :height 125
           :width 175))
:bye
:bye
source: notebooks/noj_book/statistical_visualization.clj