2 Intro
Here is our introduction to using PyMC from Clojure.
We follow the linear regression example from the Introductory Overview of PyMC.
2.1 Setup
Relevant Clojure namespaces:
(ns intro
(:require [libpython-clj2.require :refer [require-python]]
[libpython-clj2.python :refer [py. py.. py.-] :as py]
[fastmath.random :as random]
[tablecloth.api :as tc]
[tablecloth.column.api :as tcc]
[tech.v3.datatype :as dtype]
[scicloj.tableplot.v1.plotly :as plotly]
[scicloj.kind-pyplot.v1.api :as pyplot]
[scicloj.kindly.v4.kind :as kind]))Relevant Python modules:
(require-python '[builtins :as python]
'operator
'[arviz :as az]
'[arviz.style :as az.style]
'[pandas :as pd]
'[matplotlib.pyplot :as plt]
'[numpy :as np]
'[numpy.random :as np.random]
'[pymc :as pm]):okSome convenience functions to access Python idioms:
(defn brackets [obj entry]
(py. obj __getitem__ entry))(def colon
(python/slice nil nil))Theme for ArViZ visualizations:
(arviz.style/use "arviz-darkgrid")nil2.2 Synthetic data
(def random-seed 8927)(def dataset-size 101)(def true-parameter-values
{:alpha 1
:sigma 1
:beta [1 2.5]})We will generate a dataset by the following recipe:
(defn gen-dataset [{:keys [size random-seed
alpha sigma beta]}]
(let [rng (random/rng :isaac random-seed)]
(-> {:x1 (take size (random/->seq rng))
:x2 (-> (take size (random/->seq rng))
(tcc/* 0.2))}
tc/dataset
(tc/add-column :y
#(-> (tcc/+ alpha
(tcc/* (beta 0) (:x1 %))
(tcc/* (beta 1) (:x2 %))
(tcc/* sigma
(dtype/make-reader
:float32 size (rand)))))))))(def dataset
(gen-dataset (merge {:random-seed random-seed
:size dataset-size}
true-parameter-values)))(tc/head dataset)_unnamed [5 3]:
| :x1 | :x2 | :y |
|---|---|---|
| 0.55055707 | 0.12467207 | 2.46114559 |
| 0.33115609 | 0.12809006 | 1.79691978 |
| 0.75138350 | 0.16112505 | 2.33566515 |
| 0.84946778 | 0.04481401 | 2.30623415 |
| 0.48675832 | 0.13809987 | 2.59085654 |
Let us visualize our dataset:
(->> [:x1 :x2]
(mapv (fn [x]
(-> dataset
(plotly/layer-point
{:=x :x1}))))
kind/fragment)2.3 Using PyMC
pm/__version__"5.17.0"Let us define a Bayesian model for our data:
(def basic-model (pm/Model))(py/with [_ basic-model]
(let [{:keys [x1 x2 y]} (-> dataset
(update-vals np/array))
alpha (pm/Normal "alpha"
:mu 0
:sigma 10)
beta (pm/Normal "beta"
:mu 0
:sigma 10
:shape 2)
sigma (pm/HalfNormal "sigma"
:sigma 1)
mu (operator/add alpha
(operator/mul (brackets beta 0)
x1)
(operator/mul (brackets beta 0)
x2))
y_obs (pm/Normal "y_obs"
:mu mu
:sigma sigma
:observed y)]))nilNow we can sample from the posterior:
(def idata
(py/with [_ basic-model]
(pm/sample)))Here is the resulting structure:
(-> idata
(py.- posterior)
(py.- alpha)
(py. sel :draw (python/slice 0 4)))<xarray.DataArray 'alpha' (chain: 2, draw: 5)> Size: 80B
array([[ -4.20078972, 5.765901 , 6.6765565 , -6.9900269 ,
-6.9900269 ],
[-17.89999621, -12.05483791, -8.86032589, 5.65595574,
-5.7629998 ]])
Coordinates:
* chain (chain) int64 16B 0 1
* draw (draw) int64 40B 0 1 2 3 4Alternativelty, we could also use the Slice sampling algorithm instead of the default NUTS.
(def slice-idata
(py/with [_ basic-model]
(let [step (pm/Slice)]
(pm/sample 5000 :step step))))slice-idataInference data with groups:
> posterior
> sample_stats2.4 Posterior analysis
Let us plot our sampling using ArViZ:
(pyplot/pyplot
#(az/plot_trace idata :combined true))