3 Plotly walkthrough π£
Tableplot offers a Clojure API for creating Plotly.js plots through layered pipelines.
The API uses Hanami templates but is completely separate from the classical Hanami templates and parameters.
Here, we provide a walkthrough of that API.
See also the more detailed reference π.
3.1 Setup
For this tutorial, we require:
The Tableplot plotly API namepace
Tablecloth for dataset processing
the datetime namespace of dtype-next
the print namespace of tech.ml.dataset for customized dataset printing
Kindly (to specify how certaiun values should be visualized)
the datasets defined in the Datasets chapter
ns tableplot-book.plotly-walkthrough
(:require [scicloj.tableplot.v1.plotly :as plotly]
(:as tc]
[tablecloth.api :as tcc]
[tablecloth.column.api :as datetime]
[tech.v3.datatype.datetime print :as print]
[tech.v3.dataset.:as kind]
[scicloj.kindly.v4.kind :as str]
[clojure.string :as kindly]
[scicloj.kindly.v4.api :as datasets]
[tableplot-book.datasets :as ht])) [aerial.hanami.templates
3.2 Basic usage
Plotly plots are created by passing datasets to a pipeline of layer functions.
Additional parameters to the functions are passed as maps. Map keys begin with =
(e.g., :=color
).
For example, let us plot a scatterplot (a layer of points) of 10 random items from the Iris dataset.
-> datasets/iris
(10 {:seed 1})
(tc/random
(plotly/layer-point:sepal-width
{:=x :sepal-length
:=y :species
:=color 20
:=mark-size 0.6})) :=mark-opacity
3.3 Templates and parameters
(π‘ You do neet need to understand these details for basic usage.)
Technically, the parameter maps contain Hanami substitution keys, which means they are processed by a simple set of rules, but you do not need to understand what this means yet.
The layer functions return a Hanami template. Let us print the resulting structure of the previous plot.
def example1
(-> datasets/iris
(10 {:seed 1})
(tc/random
(plotly/layer-point:sepal-width
{:=x :sepal-length
:=y :species
:=color 20
:=mark-size 0.6}))) :=mark-opacity
(kind/pprint example1)
:data :=traces,
{:layout :=layout,
:aerial.hanami.templates/defaults
:com.rpl.specter.impl/NONE,
{:=textfont :com.rpl.specter.impl/NONE,
:=x0 1],
:=y-type #function[clojure.lang.AFunction/:2d,
:=coordinates :com.rpl.specter.impl/NONE,
:=boxmode
:=x0-after-stat :=x0,
:=z-after-stat :=z,1],
:=splom-traces #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=zmax
:=layers:y :=y-after-stat,
[{:trace-base
:mode :=mode,
{:type :=type,
:opacity :=mark-opacity,
:textfont :=textfont},
:colorscale :=colorscale,
:color-type :=color-type,
:r :=r,
:coordinates :=coordinates,
:group :=group,
:color :=color,
:meanline-visible :=meanline-visible,
:mark :=mark,
:x-title :=x-title,
:symbol :=symbol,
:name :=name,
:fill :=mark-fill,
:y1 :=y1-after-stat,
:bar-width :=bar-width,
:boxmode :=boxmode,
:theta :=theta,
:size :=size,
:size-type :=size-type,
:z :=z-after-stat,
:lon :=lon,
:aerial.hanami.templates/defaults
:com.rpl.specter.impl/NONE,
{:=textfont :com.rpl.specter.impl/NONE,
:=x0 1],
:=y-type #function[clojure.lang.AFunction/:2d,
:=coordinates :com.rpl.specter.impl/NONE,
:=boxmode
:=x0-after-stat :=x0,
:=z-after-stat :=z,1],
:=splom-traces #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=zmax
:=layers [],:com.rpl.specter.impl/NONE,
:=mark-fill :com.rpl.specter.impl/NONE,
:=x1 :com.rpl.specter.impl/NONE,
:=title :com.rpl.specter.impl/NONE,
:=annotations 1],
:=z-type #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=y1 1],
:=y-type-after-stat #function[clojure.lang.AFunction/400,
:=height :com.rpl.specter.impl/NONE,
:=box-visible :com.rpl.specter.impl/NONE,
:=mark-symbol :com.rpl.specter.impl/NONE,
:=name 0.6,
:=mark-opacity 1],
:=inferred-group #function[clojure.lang.AFunction/true,
:=y-showgrid :com.rpl.specter.impl/NONE,
:=density-bandwidth 1],
:=mode #function[clojure.lang.AFunction/1],
:=splom-layout #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=y-title 1],
:=z-type-after-stat #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=size :model-type :fastmath/ols},
:=model-options {
:=group :=inferred-group,:com.rpl.specter.impl/NONE,
:=y0 20,
:=mark-size :com.rpl.specter.impl/NONE,
:=violinmode 1],
:=design-matrix #function[clojure.lang.AFunction/1],
:=size-type #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=zmin true,
:=x-showgrid :com.rpl.specter.impl/NONE,
:=r :species,
:=color :com.rpl.specter.impl/NONE,
:=mark-color :com.rpl.specter.impl/NONE,
:=bar-width
:=y1-after-stat :=y1,:sepal-width,
:=x :com.rpl.specter.impl/NONE,
:=symbol
:=x-after-stat :=x,"rgb(255,255,255)",
:=yaxis-gridcolor :com.rpl.specter.impl/NONE,
:=lon :com.rpl.specter.impl/NONE,
:=text 1],
:=type #function[clojure.lang.AFunction/1],
:=x-type-after-stat #function[clojure.lang.AFunction/1],
:=traces #function[clojure.lang.AFunction/1],
:=x-type #function[clojure.lang.AFunction/10,
:=histogram-nbins false,
:=automargin
:=stat :=dataset,:z,
:=z 500,
:=width :com.rpl.specter.impl/NONE,
:=lat :t 25},
:=margin {1],
:=color-type #function[clojure.lang.AFunction/"rgb(255,255,255)",
:=xaxis-gridcolor :point,
:=mark :com.rpl.specter.impl/NONE,
:=x-title :com.rpl.specter.impl/NONE,
:=colorscale 1],
:=layout #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=colnames :sepal-length,
:=y
:=x1-after-stat :=x1,10 6]:
:=dataset datasets/iris [
:rownames | :sepal-length | :sepal-width | :petal-length | :petal-width | :species |
|
|----------:|--------------:|-------------:|--------------:|-------------:|------------|27 | 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 97 | 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 127 | 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 92 | 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 95 | 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 125 | 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 61 | 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 73 | 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 42 | 4.5 | 2.3 | 1.3 | 0.3 | setosa |
|
,"rgb(235,235,235)",
:=background :com.rpl.specter.impl/NONE,
:=theta
:=y0-after-stat :=y0,
:=y-after-stat :=y,
:=predictors [:=x],1],
:=marker-size-key #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE},
:=meanline-visible :lat :=lat,
:y0 :=y0-after-stat,
:zmax :=zmax,
:annotations :=annotations,
:inferred-group :=inferred-group,
:marker-override
:color :=mark-color,
{
:=marker-size-key :=mark-size,:symbol :=mark-symbol},
:x :=x-after-stat,
:x1 :=x1-after-stat,
:x0 :=x0-after-stat,
:zmin :=zmin,
:y-title :=y-title,
:box-visible :=box-visible,
:dataset :=stat,
:violinmode :=violinmode,
:text :=text}],
:com.rpl.specter.impl/NONE,
:=mark-fill :com.rpl.specter.impl/NONE,
:=x1 :com.rpl.specter.impl/NONE,
:=title :com.rpl.specter.impl/NONE,
:=annotations 1],
:=z-type #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=y1 1],
:=y-type-after-stat #function[clojure.lang.AFunction/400,
:=height :com.rpl.specter.impl/NONE,
:=box-visible :com.rpl.specter.impl/NONE,
:=mark-symbol :com.rpl.specter.impl/NONE,
:=name :com.rpl.specter.impl/NONE,
:=mark-opacity 1],
:=inferred-group #function[clojure.lang.AFunction/true,
:=y-showgrid :com.rpl.specter.impl/NONE,
:=density-bandwidth 1],
:=mode #function[clojure.lang.AFunction/1],
:=splom-layout #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=y-title 1],
:=z-type-after-stat #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=size :model-type :fastmath/ols},
:=model-options {
:=group :=inferred-group,:com.rpl.specter.impl/NONE,
:=y0 :com.rpl.specter.impl/NONE,
:=mark-size :com.rpl.specter.impl/NONE,
:=violinmode 1],
:=design-matrix #function[clojure.lang.AFunction/1],
:=size-type #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=zmin true,
:=x-showgrid :com.rpl.specter.impl/NONE,
:=r :com.rpl.specter.impl/NONE,
:=color :com.rpl.specter.impl/NONE,
:=mark-color :com.rpl.specter.impl/NONE,
:=bar-width
:=y1-after-stat :=y1,:x,
:=x :com.rpl.specter.impl/NONE,
:=symbol
:=x-after-stat :=x,"rgb(255,255,255)",
:=yaxis-gridcolor :com.rpl.specter.impl/NONE,
:=lon :com.rpl.specter.impl/NONE,
:=text 1],
:=type #function[clojure.lang.AFunction/1],
:=x-type-after-stat #function[clojure.lang.AFunction/1],
:=traces #function[clojure.lang.AFunction/1],
:=x-type #function[clojure.lang.AFunction/10,
:=histogram-nbins false,
:=automargin
:=stat :=dataset,:z,
:=z 500,
:=width :com.rpl.specter.impl/NONE,
:=lat :t 25},
:=margin {1],
:=color-type #function[clojure.lang.AFunction/"rgb(255,255,255)",
:=xaxis-gridcolor :point,
:=mark :com.rpl.specter.impl/NONE,
:=x-title :com.rpl.specter.impl/NONE,
:=colorscale 1],
:=layout #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE,
:=colnames :y,
:=y
:=x1-after-stat :=x1,10 6]:
:=dataset datasets/iris [
:rownames | :sepal-length | :sepal-width | :petal-length | :petal-width | :species |
|
|----------:|--------------:|-------------:|--------------:|-------------:|------------|27 | 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 97 | 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 127 | 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 92 | 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 95 | 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 125 | 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 61 | 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 73 | 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 42 | 4.5 | 2.3 | 1.3 | 0.3 | setosa |
|
,"rgb(235,235,235)",
:=background :com.rpl.specter.impl/NONE,
:=theta
:=y0-after-stat :=y0,
:=y-after-stat :=y,
:=predictors [:=x],1],
:=marker-size-key #function[clojure.lang.AFunction/:com.rpl.specter.impl/NONE},
:=meanline-visible :kindly/f #'scicloj.tableplot.v1.plotly/plotly-xform}
This template has all the necessary knowledge, including the substitution keys, to turn into a plot. This happens when your visual tool (e.g., Clay) displays the plot. The tool knows what to do thanks to the Kindly metadata and a special function attached to the plot.
meta example1) (
:kindly{:kind :kind/fn, :options nil} #
:kindly/f example1) (
#'scicloj.tableplot.v1.plotly/plotly-xform
3.4 Realizing the plot
If you wish to see the resulting plot specification before displaying it as a plot, you can use the plot
function. In this case, it generates a Plotly.js plot:
-> example1
(
plotly/plot kind/pprint)
:data
{:y [5.0 4.6 4.5],
[{:r nil,
:name "setosa",
:marker {:color "#1B9E77", :size 20},
:fill nil,
:mode :markers,
:width nil,
:type "scatter",
:theta nil,
:z nil,
:opacity 0.6,
:lon nil,
:lat nil,
:x [3.4 3.4 2.3],
:text nil}
:y [5.7 6.1 5.6 5.0 6.3],
{:r nil,
:name "versicolor",
:marker {:color "#D95F02", :size 20},
:fill nil,
:mode :markers,
:width nil,
:type "scatter",
:theta nil,
:z nil,
:opacity 0.6,
:lon nil,
:lat nil,
:x [2.9 3.0 2.7 2.0 2.5],
:text nil}
:y [6.2 6.7],
{:r nil,
:name "virginica",
:marker {:color "#7570B3", :size 20},
:fill nil,
:mode :markers,
:width nil,
:type "scatter",
:theta nil,
:z nil,
:opacity 0.6,
:lon nil,
:lat nil,
:x [2.8 3.3],
:text nil}],
:layout
:width 500,
{:height 400,
:margin {:t 25},
:automargin false,
:plot_bgcolor "rgb(235,235,235)",
:xaxis
:gridcolor "rgb(255,255,255)", :title :sepal-width, :showgrid true},
{:yaxis
:gridcolor "rgb(255,255,255)",
{:title :sepal-length,
:showgrid true},
:title nil}}
It is annotated as kind/plotly
, so that visual tools know how to render it.
-> example1
(
plotly/plotmeta)
:kindly{:kind :kind/plotly, :options {:style {:height :auto}}} #
This can be useful if you wish to process the Actual Plotly.js spec rather than use the Tableplot Plotly API. Let us change the background colour, for example:
-> example1
(
plotly/plotassoc-in [:layout :plot_bgcolor] "#eeeedd")) (
For another example, let us use a logarithmic scale for the y axis:
-> example1
(
plotly/plotassoc-in [:layout :yaxis :type] "log")) (
3.5 Field type inference
Tableplot infers the type of relevant fields from the data.
The example above was colored as it were since :species
column was nominal, so it was assigned distinct colours.
In the following example, the coloring is by a quantitative column, so a color gradient is used:
-> datasets/mtcars
(
(plotly/layer-point:mpg
{:=x :disp
:=y :cyl
:=color 20})) :=mark-size
We can override the inferred types and thus affect the generated plot:
-> datasets/mtcars
(
(plotly/layer-point:mpg
{:=x :disp
:=y :cyl
:=color :nominal
:=color-type 20})) :=mark-size
3.6 More examples
3.6.1 Boxplot
-> datasets/mtcars
(
(plotly/layer-boxplot:cyl
{:=x :disp})) :=y
3.6.2 Area chart
-> datasets/mtcars
(:cyl])
(tc/group-by [:total-disp #(-> % :disp tcc/sum)})
(tc/aggregate {:cyl])
(tc/order-by [
(plotly/layer-line:cyl
{:=x :tozeroy
:=mark-fill :total-disp})) :=y
3.6.3 Bar chart
-> datasets/mtcars
(:cyl])
(tc/group-by [:total-disp #(-> % :disp tcc/sum)})
(tc/aggregate {
(plotly/layer-bar:cyl
{:=x :total-disp})) :=y
-> datasets/mtcars
(:cyl])
(tc/group-by [:total-disp #(-> % :disp tcc/sum)})
(tc/aggregate {:bar-width 0.5)
(tc/add-column
(plotly/layer-bar:cyl
{:=x :bar-width
:=bar-width :total-disp})) :=y
3.6.4 Text
-> datasets/mtcars
(
(plotly/layer-text:mpg
{:=x :disp
:=y :cyl
:=text 20})) :=mark-size
-> datasets/mtcars
(
(plotly/layer-text:mpg
{:=x :disp
:=y :cyl
:=text :family "Courier New, monospace"
:=textfont {:size 16
:color :purple}
20})) :=mark-size
3.6.5 Segment plot
-> datasets/iris
(
(plotly/layer-segment:sepal-width
{:=x0 :sepal-length
:=y0 :petal-width
:=x1 :petal-length
:=y1 0.4
:=mark-opacity 3
:=mark-size :species})) :=color
3.7 Varying color and size
-> {:x (range 10)}
(
tc/dataset:x
(plotly/layer-point {:=x :x
:=y range 15 65 5)
:=mark-size ("#bebada", "#fdb462", "#fb8072", "#d9d9d9", "#bc80bd",
:=mark-color ["#b3de69", "#8dd3c7", "#80b1d3", "#fccde5", "#ffffb3"]}))
-> {:ABCD (range 1 11)
(:EFGH [5 2.5 5 7.5 5 2.5 7.5 4.5 5.5 5]
:IJKL [:A :A :A :A :A :B :B :B :B :B]
:MNOP [:C :D :C :D :C :D :C :D :C :D]}
tc/dataset"IJKLMNOP"})
(plotly/base {:=title :ABCD
(plotly/layer-point {:=x :EFGH
:=y :IJKL
:=color :MNOP
:=size "QRST1"})
:=name
(plotly/layer-line"IJKL MNOP"
{:=title :ABCD
:=x :ABCD
:=y "QRST2"
:=name "magenta"
:=mark-color 20
:=mark-size 0.2})) :=mark-opacity
3.8 Time series
Date and time fields are handle appropriately. Let us, for example, draw the time series of unemployment counts.
-> datasets/economics-long
(-> % :variable (= "unemploy")))
(tc/select-rows #(
(plotly/layer-line:date
{:=x :value
:=y "purple"})) :=mark-color
3.9 Multiple layers
We can draw more than one layer:
-> datasets/economics-long
(-> % :variable (= "unemploy")))
(tc/select-rows #(:date
(plotly/layer-point {:=x :value
:=y "green"
:=mark-color 20
:=mark-size 0.5})
:=mark-opacity :date
(plotly/layer-line {:=x :value
:=y "purple"})) :=mark-color
We can also use the base
function for the common parameters across layers:
-> datasets/economics-long
(-> % :variable (= "unemploy")))
(tc/select-rows #(:date
(plotly/base {:=x :value})
:=y "green"
(plotly/layer-point {:=mark-color 20
:=mark-size 0.5})
:=mark-opacity "purple"})) (plotly/layer-line {:=mark-color
Layers can be named:
-> datasets/economics-long
(-> % :variable (= "unemploy")))
(tc/select-rows #(:date
(plotly/base {:=x :value})
:=y "green"
(plotly/layer-point {:=mark-color 20
:=mark-size 0.5
:=mark-opacity "points"})
:=name "purple"
(plotly/layer-line {:=mark-color "line"})) :=name
3.10 Updating data
We can use the update-data
function to vary the dataset along a plotting pipeline, affecting the layers that follow.
This functionality is inspired by ggbuilder and metamorph.
Here, for example, we draw a line, then sample 5 data rows, and draw them as points:
-> datasets/economics-long
(-> % :variable (= "unemploy")))
(tc/select-rows #(:date
(plotly/base {:=x :value})
:=y "purple"})
(plotly/layer-line {:=mark-color 5)
(plotly/update-data tc/random "green"
(plotly/layer-point {:=mark-color 15
:=mark-size 0.5})) :=mark-opacity
3.11 Overriding layer data
-> (tc/dataset {:x (range 4)
(:y [1 2 5 9]})
tc/dataset:y :x)
(tc/sq 20})
(plotly/layer-point {:=mark-size :x [0 3]
(plotly/layer-line {:=dataset (tc/dataset {:y [1 10]})
5})) :=mark-size
3.12 Smoothing
layer-smooth
is a layer that applies statistical regression methods to the dataset to model it as a smooth shape. It is inspired by ggplotβs geom_smooth.
-> datasets/iris
(:sepal-width
(plotly/base {:=x :sepal-length})
:=y "green"
(plotly/layer-point {:=mark-color "Actual"})
:=name "orange"
(plotly/layer-smooth {:=mark-color "Predicted"})) :=name
By default, the regression is computed with only one predictor variable, which is :=x
. But this can be overriden using the :=predictors
key. We may compute a regression with more than one predictor.
-> datasets/iris
(:sepal-width
(plotly/base {:=x :sepal-length})
:=y "green"
(plotly/layer-point {:=mark-color "Actual"})
:=name :petal-width
(plotly/layer-smooth {:=predictors [:petal-length]
0.5
:=mark-opacity "Predicted"})) :=name
We can also specify the predictor columns as expressions through the :=design-matrix
key. Here, we use the design matrix functionality of Metamorph.ml.
-> datasets/iris
(:sepal-width
(plotly/base {:=x :sepal-length})
:=y "green"
(plotly/layer-point {:=mark-color "Actual"})
:=name :sepal-width '(identity :sepal-width)]
(plotly/layer-smooth {:=design-matrix [[:sepal-width-2 '(* :sepal-width
[:sepal-width)]]
0.5
:=mark-opacity "Predicted"})) :=name
Inspired by Sami Kallinenβs Heart of Clojure talk:
-> datasets/iris
(:sepal-width
(plotly/base {:=x :sepal-length})
:=y "green"
(plotly/layer-point {:=mark-color "Actual"})
:=name :sepal-width '(identity :sepal-width)]
(plotly/layer-smooth {:=design-matrix [[:sepal-width-2 '(* :sepal-width
[:sepal-width)]
:sepal-width-3 '(* :sepal-width
[:sepal-width
:sepal-width)]]
0.5
:=mark-opacity "Predicted"})) :=name
One can also provide the regression model details through :=model-options
and use any regression model and parameters registered by Metamorph.ml.
require 'scicloj.ml.tribuo) (
def regression-tree-options
(:model-type :scicloj.ml.tribuo/regression
{:tribuo-components [{:name "cart"
:type "org.tribuo.regression.rtree.CARTRegressionTrainer"
:properties {:maxDepth "8"
:fractionFeaturesInSplit "1.0"
:seed "12345"
:impurity "mse"}}
:name "mse"
{:type "org.tribuo.regression.rtree.impurity.MeanSquaredError"}]
:tribuo-trainer-name "cart"})
-> datasets/iris
(:sepal-width
(plotly/base {:=x :sepal-length})
:=y "green"
(plotly/layer-point {:=mark-color "Actual"})
:=name
(plotly/layer-smooth {:=model-options regression-tree-options0.5
:=mark-opacity "Predicted"})) :=name
An example inspired by Plotlyβs ML Regressoin in Python example.
-> datasets/tips
(:holdout {:seed 1})
(tc/split :total_bill
(plotly/base {:=x :tip})
:=y
(plotly/layer-point {:=color :$split-name})fn [ds]
(plotly/update-data (-> ds
(-> % :$split-name (= :train))))))
(tc/select-rows #(
(plotly/layer-smooth {:=model-options regression-tree-options"prediction"
:=name "purple"})) :=mark-color
3.13 Grouping
The regression computed by layer-smooth
is affected by the inferred grouping of the data.
For example, here we recieve three regression lines, each for every species.
-> datasets/iris
("dummy"
(plotly/base {:=title :species
:=color :sepal-width
:=x :sepal-length})
:=y
plotly/layer-point plotly/layer-smooth)
This happened because the :color
field was :species
, which is of :nominal
type.
But we may override this using the :group
key. For example, let us avoid grouping:
-> datasets/iris
("dummy"
(plotly/base {:=title :species
:=color
:=group []:sepal-width
:=x :sepal-length})
:=y
plotly/layer-point plotly/layer-smooth)
Alternatively, we may assign the :=color
only to the points layer without affecting the smoothing layer.
-> datasets/iris
("dummy"
(plotly/base {:=title :sepal-width
:=x :sepal-length})
:=y :species})
(plotly/layer-point {:=color "Predicted"
(plotly/layer-smooth {:=name "blue"})) :=mark-color
3.14 Example: out-of-sample predictions
Here is a slighly more elaborate example inpired by the London Clojurians talk mentioned in the preface.
Assume we wish to predict the unemployment rate for 96 months. Let us add those months to our dataset, and mark them as Future
(considering the original data as Past
):
-> datasets/economics-long
(-> % :variable (= "unemploy")))
(tc/select-rows #(:relative-time "Past")
(tc/add-column :date (-> datasets/economics-long
(tc/concat (tc/dataset {:date
last
range 96) :days))
(datetime/plus-temporal-amount (:relative-time "Future"}))
6)) (print/print-range
ggplot2/economics_long [670 6]:
:rownames | :date | :variable | :value | :value01 | :relative-time |
---|---|---|---|---|---|
2297 | 1967-07-01 | unemploy | 2944.0 | 0.02044683 | Past |
2298 | 1967-08-01 | unemploy | 2945.0 | 0.02052578 | Past |
2299 | 1967-09-01 | unemploy | 2958.0 | 0.02155206 | Past |
β¦ | β¦ | β¦ | β¦ | β¦ | β¦ |
2015-07-02 | Future | ||||
2015-07-03 | Future | ||||
2015-07-04 | Future | ||||
2015-07-05 | Future |
Let us represent our dates as numbers, so that we can use them in linear regression:
-> datasets/economics-long
(-> % :variable (= "unemploy")))
(tc/select-rows #(:relative-time "Past")
(tc/add-column :date (-> datasets/economics-long
(tc/concat (tc/dataset {:date
last
range 96) :months))
(datetime/plus-temporal-amount (:relative-time "Future"}))
:year #(datetime/long-temporal-field :years (:date %)))
(tc/add-column :month #(datetime/long-temporal-field :months (:date %)))
(tc/add-column :yearmonth [:year :month] (fn [y m] (+ m (* 12 y))))
(tc/map-columns 6)) (print/print-range
ggplot2/economics_long [670 9]:
:rownames | :date | :variable | :value | :value01 | :relative-time | :year | :month | :yearmonth |
---|---|---|---|---|---|---|---|---|
2297 | 1967-07-01 | unemploy | 2944.0 | 0.02044683 | Past | 1967 | 7 | 23611 |
2298 | 1967-08-01 | unemploy | 2945.0 | 0.02052578 | Past | 1967 | 8 | 23612 |
2299 | 1967-09-01 | unemploy | 2958.0 | 0.02155206 | Past | 1967 | 9 | 23613 |
β¦ | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ |
2022-12-01 | Future | 2022 | 12 | 24276 | ||||
2023-01-01 | Future | 2023 | 1 | 24277 | ||||
2023-02-01 | Future | 2023 | 2 | 24278 | ||||
2023-03-01 | Future | 2023 | 3 | 24279 |
Let us use the same regression line for the Past
and Future
groups. To do this, we avoid grouping by assigning []
to :=group
. The line is affected only by the past, since in the Future, :=y
is missing. We use the numerical field :yearmonth
as the regression predictor, but for plotting, we still use the :temporal
field :date
.
-> datasets/economics-long
(-> % :variable (= "unemploy")))
(tc/select-rows #(:relative-time "Past")
(tc/add-column :date (-> datasets/economics-long
(tc/concat (tc/dataset {:date
last
range 96) :months))
(datetime/plus-temporal-amount (:relative-time "Future"}))
:year #(datetime/long-temporal-field :years (:date %)))
(tc/add-column :month #(datetime/long-temporal-field :months (:date %)))
(tc/add-column :yearmonth [:year :month] (fn [y m] (+ m (* 12 y))))
(tc/map-columns :date
(plotly/base {:=x :value})
:=y :relative-time
(plotly/layer-smooth {:=color 15
:=mark-size
:=group []:yearmonth]})
:=predictors [;; Keep only the past for the following layer:
fn [dataset]
(plotly/update-data (-> dataset
(fn [row]
(tc/select-rows (-> row :relative-time (= "Past")))))))
("purple"
(plotly/layer-line {:=mark-color 3
:=mark-size "Actual"})) :=name
3.15 Histograms
Histograms can also be represented as layers with statistical processing:
-> datasets/iris
(:sepal-width})) (plotly/layer-histogram {:=x
-> datasets/iris
(:sepal-width
(plotly/layer-histogram {:=x 30})) :=histogram-nbins
-> datasets/iris
(:sepal-width
(plotly/layer-histogram {:=x :species
:=color 0.5})) :=mark-opacity
3.16 Density
(experimental)
Density estimates are handled similarly to Histograms:
-> datasets/iris
(:sepal-width})) (plotly/layer-density {:=x
-> datasets/iris
(:sepal-width
(plotly/layer-density {:=x 0.05})) :=density-bandwidth
-> datasets/iris
(:sepal-width
(plotly/layer-density {:=x 1})) :=density-bandwidth
-> datasets/iris
(:sepal-width
(plotly/layer-density {:=x :species})) :=color
3.17 Coordinates
(WIP)
3.17.1 geo
Inspired by Plotlyβs tutorial for Scatter Plots on Maps in JavaScript:
-> {:lat [45.5, 43.4, 49.13, 51.1, 53.34, 45.24,
(44.64, 48.25, 49.89, 50.45]
:lon [-73.57, -79.24, -123.06, -114.1, -113.28,
75.43, -63.57, -123.21, -97.13, -104.6]
-:text ["Montreal", "Toronto", "Vancouver", "Calgary", "Edmonton",
"Ottawa", "Halifax", "Victoria", "Winnepeg", "Regina"],}
tc/dataset:geo
(plotly/base {:=coordinates :lat
:=lat :lon})
:=lon 0.8
(plotly/layer-point {:=mark-opacity "#bebada", "#fdb462", "#fb8072", "#d9d9d9", "#bc80bd",
:=mark-color ["#b3de69", "#8dd3c7", "#80b1d3", "#fccde5", "#ffffb3"]
20
:=mark-size "Canadian cities"})
:=name :text
(plotly/layer-text {:=text :size 7
:=textfont {:color :purple}})
plotly/plotassoc-in [:layout :geo]
(:scope "north america"
{:resolution 10
:lonaxis {:range [-130 -55]}
:lataxis {:range [40 60]}
:countrywidth 1.5
:showland true
:showlakes true
:showrivers true}))
3.17.2 3d
-> datasets/iris
(:sepal-width
(plotly/layer-point {:=x :sepal-length
:=y :petal-length
:=z :petal-width
:=color :3d})) :=coordinates
-> datasets/iris
(:sepal-width
(plotly/layer-point {:=x :sepal-length
:=y :petal-length
:=z :species
:=color :3d})) :=coordinates
3.17.3 polar
Monthly rain amounts - polar bar-chart
def rain-data
(
(tc/dataset:month [:Jan :Feb :Mar :Apr
{:May :Jun :Jul :Aug
:Sep :Oct :Nov :Dec]
:rain (repeatedly #(rand-int 200))}))
-> rain-data
(
(plotly/layer-bar:rain
{:=r :month
:=theta :polar
:=coordinates 20
:=mark-size 0.6})) :=mark-opacity
Controlling the polar layout (by manipulating the raw Plotly.js spec):
-> rain-data
(
(plotly/base
{})
(plotly/layer-bar:rain
{:=r :month
:=theta :polar
:=coordinates 20
:=mark-size 0.6})
:=mark-opacity
plotly/plotassoc-in [:layout :polar]
(:angularaxis {:tickfont {:size 16}
{:rotation 90
:direction "counterclockwise"}
:sector [0 180]}))
A polar random walk - polar line-chart
let [n 50]
(-> {:r (->> (repeatedly n #(- (rand) 0.5))
(+))
(reductions :theta (->> (repeatedly n #(* 10 (rand)))
+)
(reductions map #(rem % 360)))
(:color (range n)}
tc/dataset
(plotly/layer-point:r
{:=r :theta
:=theta :polar
:=coordinates 10
:=mark-size 0.6})
:=mark-opacity
(plotly/layer-line:r
{:=r :theta
:=theta :polar
:=coordinates 3
:=mark-size 0.6}))) :=mark-opacity
3.18 Debugging (WIP)
3.18.1 Viewing the computational dag of substitution keys:
def example-to-debug
(-> datasets/iris
(10 {:seed 1})
(tc/random :sepal-width
(plotly/layer-point {:=x :sepal-length
:=y :species}))) :=color
-> example-to-debug
( plotly/dag)
3.18.2 Viewing intermediate values in the computational dag:
Layers (tableplotβs intermediate data representation)
-> example-to-debug
(
(plotly/debug :=layers) kind/pprint)
:y :sepal-length,
[{:trace-base {:mode :markers, :type "scatter"},
:color-type :nominal,
:coordinates :2d,
:group (:species),
:color :species,
:mark :point,
:z :z,
:inferred-group (:species),
:x :sepal-width,
:dataset datasets/iris [10 6]:
:rownames | :sepal-length | :sepal-width | :petal-length | :petal-width | :species |
|
|----------:|--------------:|-------------:|--------------:|-------------:|------------|27 | 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 97 | 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 127 | 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 92 | 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 95 | 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 125 | 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 61 | 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 73 | 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 42 | 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| }]
Traces (part of the Plotly spec)
-> example-to-debug
(
(plotly/debug :=traces) kind/pprint)
:y [5.0 4.6 4.5],
[{:r nil,
:name "setosa",
:marker {:color "#1B9E77"},
:fill nil,
:mode :markers,
:width nil,
:type "scatter",
:theta nil,
:z nil,
:lon nil,
:lat nil,
:x [3.4 3.4 2.3],
:text nil}
:y [5.7 6.1 5.6 5.0 6.3],
{:r nil,
:name "versicolor",
:marker {:color "#D95F02"},
:fill nil,
:mode :markers,
:width nil,
:type "scatter",
:theta nil,
:z nil,
:lon nil,
:lat nil,
:x [2.9 3.0 2.7 2.0 2.5],
:text nil}
:y [6.2 6.7],
{:r nil,
:name "virginica",
:marker {:color "#7570B3"},
:fill nil,
:mode :markers,
:width nil,
:type "scatter",
:theta nil,
:z nil,
:lon nil,
:lat nil,
:x [2.8 3.3],
:text nil}]
Both
-> example-to-debug
(:layers :=layers
(plotly/debug {:traces :=traces})
kind/pprint)
:layers
{:y :sepal-length,
[{:trace-base {:mode :markers, :type "scatter"},
:color-type :nominal,
:coordinates :2d,
:group (:species),
:color :species,
:mark :point,
:z :z,
:inferred-group (:species),
:x :sepal-width,
:dataset datasets/iris [10 6]:
:rownames | :sepal-length | :sepal-width | :petal-length | :petal-width | :species |
|
|----------:|--------------:|-------------:|--------------:|-------------:|------------|27 | 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 97 | 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 127 | 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 92 | 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 95 | 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 125 | 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 61 | 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 73 | 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 42 | 4.5 | 2.3 | 1.3 | 0.3 | setosa |
|
}],:traces
:y [5.0 4.6 4.5],
[{:r nil,
:name "setosa",
:marker {:color "#1B9E77"},
:fill nil,
:mode :markers,
:width nil,
:type "scatter",
:theta nil,
:z nil,
:lon nil,
:lat nil,
:x [3.4 3.4 2.3],
:text nil}
:y [5.7 6.1 5.6 5.0 6.3],
{:r nil,
:name "versicolor",
:marker {:color "#D95F02"},
:fill nil,
:mode :markers,
:width nil,
:type "scatter",
:theta nil,
:z nil,
:lon nil,
:lat nil,
:x [2.9 3.0 2.7 2.0 2.5],
:text nil}
:y [6.2 6.7],
{:r nil,
:name "virginica",
:marker {:color "#7570B3"},
:fill nil,
:mode :markers,
:width nil,
:type "scatter",
:theta nil,
:z nil,
:lon nil,
:lat nil,
:x [2.8 3.3],
:text nil}]}
3.19 Coming soon
3.19.1 Facets
(coming soon)
3.19.2 Scales
(coming soon)