15 Visualizing correlation matrices (experimental) - DRAFT 🛠
This tutorial explores various ways to visualize a correlation matrix as a heatmap. It is inspired by the discussion at the Clojurians Zulip chat: #data-science > correlation matrix plot ?
author: Daniel Slutsky
ns noj-book.visualizing-correlation-matrices
(:require [fastmath.stats]
(:as fastmath]
[fastmath.core :as tc]
[tablecloth.api
[noj-book.datasets]:as kind]
[scicloj.kindly.v4.kind :as math]
[clojure.math :as str])) [clojure.string
15.1 Auxiliary functions
Rounding numbers:
defn round
(
[n scale rm]bigdec n)
(.setScale ^java.math.BigDecimal (int scale)
(if (instance? java.math.RoundingMode rm)
^RoundingMode (
rm
(java.math.RoundingMode/valueOfstr (if (ident? rm) (symbol rm) rm)))))) (
For example (see RoundingMode)
/ 2.0 3) 2 :DOWN) (round (
0.66M
/ 2.0 3) 2 :UP) (round (
0.67M
/ 2.0 3) 2 :HALF_EVEN) (round (
0.67M
15.2 Computing a correlation matrix and representing it as a dataset:
defn correlations-dataset [data columns-to-use]
(let [matrix (->> columns-to-use
(mapv #(get data %))
(
fastmath.stats/correlation-matrix)]->> matrix
(
(map-indexedfn [i row]
(let [coli (columns-to-use i)]
(->> row
(
(map-indexedfn [j corr]
(let [colj (columns-to-use j)]
(:i i
{:j j
:coli coli
:colj colj
:corr corr
:corr-round (round corr 2 :HALF_EVEN)})))))))
apply concat)
( tc/dataset)))
For example:
-> noj-book.datasets/iris
(:sepal-length :sepal-width :petal-length :petal-width])) (correlations-dataset [
_unnamed [16 6]:
:i | :j | :coli | :colj | :corr | :corr-round |
---|---|---|---|---|---|
0 | 0 | :sepal-length | :sepal-length | 1.00000000 | 1.000 |
0 | 1 | :sepal-length | :sepal-width | -0.11756978 | -0.1200 |
0 | 2 | :sepal-length | :petal-length | 0.87175378 | 0.8700 |
0 | 3 | :sepal-length | :petal-width | 0.81794113 | 0.8200 |
1 | 0 | :sepal-width | :sepal-length | -0.11756978 | -0.1200 |
1 | 1 | :sepal-width | :sepal-width | 1.00000000 | 1.000 |
1 | 2 | :sepal-width | :petal-length | -0.42844010 | -0.4300 |
1 | 3 | :sepal-width | :petal-width | -0.36612593 | -0.3700 |
2 | 0 | :petal-length | :sepal-length | 0.87175378 | 0.8700 |
2 | 1 | :petal-length | :sepal-width | -0.42844010 | -0.4300 |
2 | 2 | :petal-length | :petal-length | 1.00000000 | 1.000 |
2 | 3 | :petal-length | :petal-width | 0.96286543 | 0.9600 |
3 | 0 | :petal-width | :sepal-length | 0.81794113 | 0.8200 |
3 | 1 | :petal-width | :sepal-width | -0.36612593 | -0.3700 |
3 | 2 | :petal-width | :petal-length | 0.96286543 | 0.9600 |
3 | 3 | :petal-width | :petal-width | 1.00000000 | 1.000 |
15.3 Drawing a heatmap using Echarts
The following function is inspired by an Apache Echarts heatmap tutorial.
defn echarts-heatmap [{:keys [xyz-data xs ys
(min max
series-name]:or {series-name ""}}]
(kind/echarts:tooltip {}
{:xAxis {:type :category
:data xs}
:yAxis {:type :category
:data ys}
:visualMap {:min min
:max max
:calculable true
:splitNumber 8
:inRange {:color
"#313695" "#4575b4" "#74add1"
["#abd9e9" "#e0f3f8" "#ffffbf"
"#fee090" "#fdae61" "#f46d43"
"#d73027" "#a50026"]}}
:series [{:name series-name
:type :heatmap
:data xyz-data
:itemStyle {:emphasis {:borderColor "#333"
:borderWidth 2}}
:progressive 1000
:animation false}]}))
Here is an example using synthetic data:
let [n 30]
(
(echarts-heatmap:xyz-data (for [i (range n)
{range n)]
j (* (+ (- i j))
[i j (fastmath/logistic (rand)
(/ 2 (double n))))])
(:x-data (range n)
:y-data (range n)
:min 0
:max 1}))
Note the slider control and the tooltips.
Here is an example with an actual correlation matrix.
let [columns-for-correlations [:sepal-length :sepal-width
(:petal-length :petal-width]
-> noj-book.datasets/iris
correlations (
(correlations-dataset columns-for-correlations):coli :colj :corr-round])
(tc/select-columns [
tc/rows)]:xyz-data correlations
(echarts-heatmap {:xs columns-for-correlations
:ys columns-for-correlations
:min -1
:max 1
:series-name "correlation"}))
TODO: Improve the layout so that the slider control does not overlap the labels.
15.4 Drawing a heatmap using cljplot
coming soon
15.5 Drawing a heatmap using Vega
coming soon