cmdstan-clj
cmdstan-clj is a Clojure wrapper of the Stan probabilistic programming language that uses the CmdStan CLI.
Status: an evolving proof-of-concept.
ns index
(:require
(;; cmdstan-clj API:
:as stan]
[scicloj.cmdstan-clj.v1.api ;; Tablecloth for table processing
:as tc]
[tablecloth.api ;; Tableplot for data visualization
:as plotly]
[scicloj.tableplot.v1.plotly ;; Kindly for specifying how to visualize things:
:as kind])) [scicloj.kindly.v4.kind
Walkthrough
Here we follow the introductory example from Stan’ documentation.
Let us define our model.
In our probabilistic model here, we have we have an observed vector \(y\) of \(N\) samples. We have an unobserved parameter \(\theta \sim Beta(1,1)\), and the elements of \(y\) are conditionally independent given \(\theta\), and distributed \(Bernoulli(\theta)\) each.
def model-code
("
data {
int<lower=0> N;
array[N] int<lower=0,upper=1> y;
}
parameters {
real<lower=0,upper=1> theta;
}
model {
theta ~ beta(1,2); // uniform prior on interval 0,1
y ~ bernoulli(theta);
}")
Now we may compile the model, if this has not been done yet.
def model
( (stan/model model-code))
Here is the output of compiling our model:
-> model
(:out
kind/code)
'/tmp/scicloj-files/session-dir-9036255698493098978/file-6729332977990631866' is up to date. make:
Here are some toy data:
def data
(:N 10
{:y [0 1 0 0 0 0 0 0 0 1]})
Let us sample from the posterior of \(\theta\) given out observed \(y\) in the data. (Soon we will support relevant options to control the sampling process.)
def sampling
(:num-chains 4
(stan/sample model data {:num-warmup 200
:num-samples 100}))
Here is the output of sampling process.
-> sampling
(:out
kind/code)
= sample (Default)
method
sample= 100
num_samples = 200
num_warmup = false (Default)
save_warmup = 1 (Default)
thin
adapt= true (Default)
engaged = 0.05 (Default)
gamma = 0.8 (Default)
delta = 0.75 (Default)
kappa = 10 (Default)
t0 = 75 (Default)
init_buffer = 50 (Default)
term_buffer = 25 (Default)
window = false (Default)
save_metric = hmc (Default)
algorithm
hmc= nuts (Default)
engine
nuts= 10 (Default)
max_depth = diag_e (Default)
metric = (Default)
metric_file = 1 (Default)
stepsize = 0 (Default)
stepsize_jitter = 4
num_chains = 1 (Default)
id
data= /tmp/scicloj-files/session-dir-9036255698493098978/file-1326435404593930035.json
file = 2 (Default)
init
random= 3957778942 (Default)
seed
output= /tmp/scicloj-files/session-dir-9036255698493098978/file-18377587113006538761.csv
file = (Default)
diagnostic_file = 100 (Default)
refresh = -1 (Default)
sig_figs = profile.csv (Default)
profile_file = false (Default)
save_cmdstan_config = 1 (Default)
num_threads
4e-06 seconds
Gradient evaluation took 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Adjust your expectations accordingly!
1e-06 seconds
Gradient evaluation took 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!
1e-06 seconds
Gradient evaluation took 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!
1e-06 seconds
Gradient evaluation took 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!
1] Iteration: 1 / 300 [ 0%] (Warmup)
Chain [1] Iteration: 100 / 300 [ 33%] (Warmup)
Chain [1] Iteration: 200 / 300 [ 66%] (Warmup)
Chain [1] Iteration: 201 / 300 [ 67%] (Sampling)
Chain [1] Iteration: 300 / 300 [100%] (Sampling)
Chain [
0 seconds (Warm-up)
Elapsed Time: 0.001 seconds (Sampling)
0.001 seconds (Total)
2] Iteration: 1 / 300 [ 0%] (Warmup)
Chain [2] Iteration: 100 / 300 [ 33%] (Warmup)
Chain [2] Iteration: 200 / 300 [ 66%] (Warmup)
Chain [2] Iteration: 201 / 300 [ 67%] (Sampling)
Chain [2] Iteration: 300 / 300 [100%] (Sampling)
Chain [
0.001 seconds (Warm-up)
Elapsed Time: 0.001 seconds (Sampling)
0.002 seconds (Total)
3] Iteration: 1 / 300 [ 0%] (Warmup)
Chain [3] Iteration: 100 / 300 [ 33%] (Warmup)
Chain [3] Iteration: 200 / 300 [ 66%] (Warmup)
Chain [3] Iteration: 201 / 300 [ 67%] (Sampling)
Chain [3] Iteration: 300 / 300 [100%] (Sampling)
Chain [
0.001 seconds (Warm-up)
Elapsed Time: 0.001 seconds (Sampling)
0.002 seconds (Total)
4] Iteration: 1 / 300 [ 0%] (Warmup)
Chain [4] Iteration: 100 / 300 [ 33%] (Warmup)
Chain [4] Iteration: 200 / 300 [ 66%] (Warmup)
Chain [4] Iteration: 201 / 300 [ 67%] (Sampling)
Chain [4] Iteration: 300 / 300 [100%] (Sampling)
Chain [
0.001 seconds (Warm-up)
Elapsed Time: 0.001 seconds (Sampling)
0.002 seconds (Total)
Here are the sampels:
-> sampling
(:samples)
model samples [400 10]:
:lp__ | :accept_stat__ | :stepsize__ | :treedepth__ | :n_leapfrog__ | :divergent__ | :energy__ | :theta | :i | :chain |
---|---|---|---|---|---|---|---|---|---|
-9.51599 | 0.805128 | 1.15844 | 1 | 3 | 0 | 9.69947 | 0.0496807 | 0 | 0 |
-8.85840 | 1.000000 | 1.15844 | 2 | 3 | 0 | 9.52782 | 0.0653906 | 1 | 0 |
-7.02438 | 1.000000 | 1.15844 | 2 | 3 | 0 | 8.50706 | 0.2239590 | 2 | 0 |
-7.04063 | 0.997923 | 1.15844 | 2 | 3 | 0 | 7.04199 | 0.2534120 | 3 | 0 |
-7.03067 | 1.000000 | 1.15844 | 1 | 1 | 0 | 7.03864 | 0.2457840 | 4 | 0 |
-7.02315 | 1.000000 | 1.15844 | 1 | 3 | 0 | 7.02901 | 0.2344560 | 5 | 0 |
-7.19824 | 0.847392 | 1.15844 | 2 | 3 | 0 | 7.99303 | 0.1667100 | 6 | 0 |
-7.02269 | 0.913278 | 1.15844 | 2 | 3 | 0 | 7.86257 | 0.2297840 | 7 | 0 |
-9.64694 | 0.459305 | 1.15844 | 2 | 7 | 0 | 11.37570 | 0.0471368 | 8 | 0 |
-9.96135 | 0.980419 | 1.15844 | 1 | 1 | 0 | 10.16410 | 0.0416407 | 9 | 0 |
… | … | … | … | … | … | … | … | … | … |
-7.39333 | 0.979176 | 0.76039 | 1 | 3 | 0 | 10.07890 | 0.1413660 | 89 | 3 |
-7.03313 | 0.965367 | 0.76039 | 1 | 3 | 0 | 7.86278 | 0.2479700 | 90 | 3 |
-7.02413 | 0.998220 | 0.76039 | 1 | 3 | 0 | 7.05091 | 0.2244690 | 91 | 3 |
-7.13274 | 0.980484 | 0.76039 | 1 | 3 | 0 | 7.20265 | 0.1791500 | 92 | 3 |
-7.09009 | 0.993959 | 0.76039 | 3 | 7 | 0 | 7.22993 | 0.2754480 | 93 | 3 |
-7.09009 | 0.792667 | 0.76039 | 1 | 3 | 0 | 8.67380 | 0.2754480 | 94 | 3 |
-7.11105 | 0.996362 | 0.76039 | 1 | 1 | 0 | 7.11682 | 0.2821960 | 95 | 3 |
-7.18368 | 0.987158 | 0.76039 | 1 | 1 | 0 | 7.18537 | 0.3011620 | 96 | 3 |
-7.22798 | 0.893079 | 0.76039 | 1 | 3 | 0 | 8.10309 | 0.1619810 | 97 | 3 |
-7.22149 | 1.000000 | 0.76039 | 2 | 3 | 0 | 7.25067 | 0.1629750 | 98 | 3 |
-7.08568 | 0.960621 | 0.76039 | 1 | 3 | 0 | 7.73196 | 0.2739090 | 99 | 3 |
The histogram of \(\theta\):
-> sampling
(:samples
:chain] {:result-type :as-map})
(tc/group-by [
(update-valsfn [chain-samples]
(-> chain-samples
(:theta
(plotly/layer-histogram {:=x 100}))))) :=histogram-nbins
{
|
|
|
|
}
The trace plot of \(\theta\):
-> sampling
(:samples
:i
(plotly/layer-line {:=x :theta
:=y :chain
:=color :nominal
:=color-type 0.5})) :=mark-opacity
source: notebooks/index.clj