15  Visualizing correlation matrices (experimental) - DRAFT 🛠

This tutorial explores various ways to visualize a correlation matrix as a heatmap. It is inspired by the discussion at the Clojurians Zulip chat: #data-science > correlation matrix plot ?

author: Daniel Slutsky

(ns noj-book.visualizing-correlation-matrices
  (:require [fastmath.stats]
            [fastmath.core :as fastmath]
            [tablecloth.api :as tc]
            [noj-book.datasets]
            [scicloj.kindly.v4.kind :as kind]
            [clojure.math :as math]
            [clojure.string :as str]))

15.1 Auxiliary functions

Rounding numbers:

(defn round
  [n scale rm]
  (.setScale ^java.math.BigDecimal (bigdec n)
             (int scale)
             ^RoundingMode (if (instance? java.math.RoundingMode rm)
                             rm
                             (java.math.RoundingMode/valueOf
                              (str (if (ident? rm) (symbol rm) rm))))))

For example (see RoundingMode)

(round (/ 2.0 3) 2 :DOWN)
0.66M
(round (/ 2.0 3) 2 :UP)
0.67M
(round (/ 2.0 3) 2 :HALF_EVEN)
0.67M

15.2 Computing a correlation matrix and representing it as a dataset:

(defn correlations-dataset [data columns-to-use]
  (let [matrix (->> columns-to-use
                    (mapv #(get data %))
                    fastmath.stats/correlation-matrix)]
    (->> matrix
         (map-indexed
          (fn [i row]
            (let [coli (columns-to-use i)]
              (->> row
                   (map-indexed
                    (fn [j corr]
                      (let [colj (columns-to-use j)]
                        {:i i
                         :j j
                         :coli coli
                         :colj colj
                         :corr corr
                         :corr-round (round corr 2 :HALF_EVEN)})))))))
         (apply concat)
         tc/dataset)))

For example:

(-> noj-book.datasets/iris
    (correlations-dataset [:sepal-length :sepal-width :petal-length :petal-width]))

_unnamed [16 6]:

:i :j :coli :colj :corr :corr-round
0 0 :sepal-length :sepal-length 1.00000000 1.000
0 1 :sepal-length :sepal-width -0.11756978 -0.1200
0 2 :sepal-length :petal-length 0.87175378 0.8700
0 3 :sepal-length :petal-width 0.81794113 0.8200
1 0 :sepal-width :sepal-length -0.11756978 -0.1200
1 1 :sepal-width :sepal-width 1.00000000 1.000
1 2 :sepal-width :petal-length -0.42844010 -0.4300
1 3 :sepal-width :petal-width -0.36612593 -0.3700
2 0 :petal-length :sepal-length 0.87175378 0.8700
2 1 :petal-length :sepal-width -0.42844010 -0.4300
2 2 :petal-length :petal-length 1.00000000 1.000
2 3 :petal-length :petal-width 0.96286543 0.9600
3 0 :petal-width :sepal-length 0.81794113 0.8200
3 1 :petal-width :sepal-width -0.36612593 -0.3700
3 2 :petal-width :petal-length 0.96286543 0.9600
3 3 :petal-width :petal-width 1.00000000 1.000

15.3 Drawing a heatmap using Echarts

The following function is inspired by an Apache Echarts heatmap tutorial.

(defn echarts-heatmap [{:keys [xyz-data xs ys
                               min max
                               series-name]
                        :or {series-name ""}}]
  (kind/echarts
   {:tooltip {}
    :xAxis {:type :category
            :data xs}
    :yAxis {:type :category
            :data ys}
    :visualMap {:min min
                :max max
                :calculable true
                :splitNumber 8
                :inRange {:color
                          ["#313695" "#4575b4" "#74add1"
                           "#abd9e9" "#e0f3f8" "#ffffbf"
                           "#fee090" "#fdae61" "#f46d43"
                           "#d73027" "#a50026"]}}
    :series [{:name series-name
              :type :heatmap
              :data xyz-data
              :itemStyle {:emphasis {:borderColor "#333"
                                     :borderWidth 2}}
              :progressive 1000
              :animation false}]}))

Here is an example using synthetic data:

(let [n 30]
  (echarts-heatmap
   {:xyz-data (for [i (range n)
                    j (range n)]
                [i j (fastmath/logistic (*  (+ (- i j))
                                            (rand)
                                            (/ 2 (double n))))])
    :x-data (range n)
    :y-data (range n)
    :min 0
    :max 1}))

Note the slider control and the tooltips.

Here is an example with an actual correlation matrix.

(let [columns-for-correlations [:sepal-length :sepal-width
                                :petal-length :petal-width]
      correlations (-> noj-book.datasets/iris
                       (correlations-dataset columns-for-correlations)
                       (tc/select-columns [:coli :colj :corr-round])
                       tc/rows)]
  (echarts-heatmap {:xyz-data correlations
                    :xs columns-for-correlations
                    :ys columns-for-correlations
                    :min -1
                    :max 1
                    :series-name "correlation"}))

TODO: Improve the layout so that the slider control does not overlap the labels.

15.4 Drawing a heatmap using cljplot

coming soon

15.5 Drawing a heatmap using Vega

coming soon

source: notebooks/noj_book/visualizing_correlation_matrices.clj