2 Intro
Here is our introduction to using PyMC from Clojure.
We follow the linear regression example from the Introductory Overview of PyMC.
2.1 Setup
Relevant Clojure namespaces:
ns intro
(:require [libpython-clj2.require :refer [require-python]]
(:refer [py. py.. py.-] :as py]
[libpython-clj2.python :as random]
[fastmath.random :as tc]
[tablecloth.api :as tcc]
[tablecloth.column.api :as dtype]
[tech.v3.datatype :as plotly]
[scicloj.tableplot.v1.plotly :as pyplot]
[scicloj.kind-pyplot.v1.api :as kind])) [scicloj.kindly.v4.kind
Relevant Python modules:
:as python]
(require-python '[builtins 'operator
:as az]
'[arviz :as az.style]
'[arviz.style :as pd]
'[pandas :as plt]
'[matplotlib.pyplot :as np]
'[numpy :as np.random]
'[numpy.random :as pm]) '[pymc
:ok
Some convenience functions to access Python idioms:
defn brackets [obj entry]
( (py. obj __getitem__ entry))
def colon
(nil nil)) (python/slice
Theme for ArViZ visualizations:
"arviz-darkgrid") (arviz.style/use
nil
2.2 Synthetic data
def random-seed 8927) (
def dataset-size 101) (
def true-parameter-values
(:alpha 1
{:sigma 1
:beta [1 2.5]})
We will generate a dataset by the following recipe:
defn gen-dataset [{:keys [size random-seed
(
alpha sigma beta]}]let [rng (random/rng :isaac random-seed)]
(-> {:x1 (take size (random/->seq rng))
(:x2 (-> (take size (random/->seq rng))
0.2))}
(tcc/*
tc/dataset:y
(tc/add-column -> (tcc/+ alpha
#(0) (:x1 %))
(tcc/* (beta 1) (:x2 %))
(tcc/* (beta
(tcc/* sigma
(dtype/make-reader:float32 size (rand)))))))))
def dataset
(merge {:random-seed random-seed
(gen-dataset (:size dataset-size}
true-parameter-values)))
(tc/head dataset)
_unnamed [5 3]:
:x1 | :x2 | :y |
---|---|---|
0.55055707 | 0.12467207 | 2.46114559 |
0.33115609 | 0.12809006 | 1.79691978 |
0.75138350 | 0.16112505 | 2.33566515 |
0.84946778 | 0.04481401 | 2.30623415 |
0.48675832 | 0.13809987 | 2.59085654 |
Let us visualize our dataset:
->> [:x1 :x2]
(mapv (fn [x]
(-> dataset
(
(plotly/layer-point:x1}))))
{:=x kind/fragment)
2.3 Using PyMC
pm/__version__
"5.17.0"
Let us define a Bayesian model for our data:
def basic-model (pm/Model)) (
(py/with [_ basic-model]let [{:keys [x1 x2 y]} (-> dataset
(
(update-vals np/array))"alpha"
alpha (pm/Normal :mu 0
:sigma 10)
"beta"
beta (pm/Normal :mu 0
:sigma 10
:shape 2)
"sigma"
sigma (pm/HalfNormal :sigma 1)
mu (operator/add alpha0)
(operator/mul (brackets beta
x1)0)
(operator/mul (brackets beta
x2))"y_obs"
y_obs (pm/Normal :mu mu
:sigma sigma
:observed y)]))
nil
Now we can sample from the posterior:
def idata
(
(py/with [_ basic-model] (pm/sample)))
Here is the resulting structure:
-> idata
(- posterior)
(py.- alpha)
(py.:draw (python/slice 0 4))) (py. sel
'alpha' (chain: 2, draw: 5)> Size: 80B
<xarray.DataArray 4.20078972, 5.765901 , 6.6765565 , -6.9900269 ,
array([[ -6.9900269 ],
-17.89999621, -12.05483791, -8.86032589, 5.65595574,
[-5.7629998 ]])
-
Coordinates:* chain (chain) int64 16B 0 1
* draw (draw) int64 40B 0 1 2 3 4
Alternativelty, we could also use the Slice sampling algorithm instead of the default NUTS.
def slice-idata
(
(py/with [_ basic-model]let [step (pm/Slice)]
(5000 :step step)))) (pm/sample
slice-idata
Inference data with groups:> posterior
> sample_stats
2.4 Posterior analysis
Let us plot our sampling using ArViZ:
(pyplot/pyplot:combined true)) #(az/plot_trace idata