cmdstan-clj
cmdstan-clj is a Clojure wrapper of Stan probabilistic programming language that uses the CmdStan CLI.
Status: an evolving proof-of-concept.
ns index
(:require [tablecloth.api :as tc]
(:as shell]
[clojure.java.shell :as charred]
[charred.api :as io]
[clojure.java.io print :as print]
[tech.v3.dataset.:as str]
[clojure.string :as hanami]
[scicloj.noj.v1.vis.hanami :as vis.stats ]
[scicloj.noj.v1.vis.stats :as ht]
[aerial.hanami.templates :as kind]
[scicloj.kindly.v4.kind :as stats]
[scicloj.noj.v1.stats :as stan])) [scicloj.cmdstan-clj.v1.api
Walkthrough
Let us define our model.
In our probabilistic model here, we have we have an observed vector \(y\) of \(N\) samples. We have an unobserved parameter \(\theta \sim Beta(1,1)\), and the elements of \(y\) are conditionally independent given \(\theta\), and distributed \(Bernoulli(\theta)\) each.
def model-code
("
data {
int<lower=0> N;
array[N] int<lower=0,upper=1> y;
}
parameters {
real<lower=0,upper=1> theta;
}
model {
theta ~ beta(1,2); // uniform prior on interval 0,1
y ~ bernoulli(theta);
}")
Now we may compile the model, if this has not been done yet.
def model
( (stan/model model-code))
Here is the output of compiling our model:
-> model
(:out
kind/code)
--- Translating Stan model to C++ code ----772334480016722166/file-16025118707950248201.hpp /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.stan
bin/stanc --o=/tmp/scicloj-files/session-dir
--- Compiling C++ code ---1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes -I stan/lib/stan_math/lib/tbb_2020.3/include -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.81.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -DBOOST_DISABLE_ASSERTS -c -Wno-ignored-attributes -x c++ -o /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.o /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.hpp
g++ -Wno-deprecated-declarations -std=c++
--- Linking model ---1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes -I stan/lib/stan_math/lib/tbb_2020.3/include -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.81.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -DBOOST_DISABLE_ASSERTS -Wl,-L,"/home/daslu/.cmdstan/cmdstan-2.34.1/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/daslu/.cmdstan/cmdstan-2.34.1/stan/lib/stan_math/lib/tbb" /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.o src/cmdstan/main.o -ltbb stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a stan/lib/stan_math/lib/tbb/libtbb.so.2 -o /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201
g++ -Wno-deprecated-declarations -std=c++-772334480016722166/file-16025118707950248201.hpp /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.o rm /tmp/scicloj-files/session-dir
Here are some toy data:
def data
(:N 10
{:y [0 1 0 0 0 0 0 0 0 1]})
Let us sample from the posterior of \(\theta\) given out observed \(y\) in the data. (Soon we will support relevant options to control the sampling process.)
def sampling
(:num-chains 4})) (stan/sample model data {
Here is the output of sampling process.
-> sampling
(:out
kind/code)
= sample (Default)
method
sample= 1000 (Default)
num_samples = 1000 (Default)
num_warmup = 0 (Default)
save_warmup = 1 (Default)
thin
adapt= 1 (Default)
engaged = 0.05 (Default)
gamma = 0.8 (Default)
delta = 0.75 (Default)
kappa = 10 (Default)
t0 = 75 (Default)
init_buffer = 50 (Default)
term_buffer = 25 (Default)
window = 0 (Default)
save_metric = hmc (Default)
algorithm
hmc= nuts (Default)
engine
nuts= 10 (Default)
max_depth = diag_e (Default)
metric = (Default)
metric_file = 1 (Default)
stepsize = 0 (Default)
stepsize_jitter = 4
num_chains = 1 (Default)
id
data= /tmp/scicloj-files/session-dir-772334480016722166/file-9398899209516897676.json
file = 2 (Default)
init
random= 4275893379 (Default)
seed
output= /tmp/scicloj-files/session-dir-772334480016722166/file-12383328597578324763.csv
file = (Default)
diagnostic_file = 100 (Default)
refresh = -1 (Default)
sig_figs = profile.csv (Default)
profile_file = 0 (Default)
save_cmdstan_config = 1 (Default)
num_threads
4e-06 seconds
Gradient evaluation took 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Adjust your expectations accordingly!
1e-06 seconds
Gradient evaluation took 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!
1e-06 seconds
Gradient evaluation took 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!
1e-06 seconds
Gradient evaluation took 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!
1] Iteration: 1 / 2000 [ 0%] (Warmup)
Chain [1] Iteration: 100 / 2000 [ 5%] (Warmup)
Chain [1] Iteration: 200 / 2000 [ 10%] (Warmup)
Chain [1] Iteration: 300 / 2000 [ 15%] (Warmup)
Chain [1] Iteration: 400 / 2000 [ 20%] (Warmup)
Chain [1] Iteration: 500 / 2000 [ 25%] (Warmup)
Chain [1] Iteration: 600 / 2000 [ 30%] (Warmup)
Chain [1] Iteration: 700 / 2000 [ 35%] (Warmup)
Chain [1] Iteration: 800 / 2000 [ 40%] (Warmup)
Chain [1] Iteration: 900 / 2000 [ 45%] (Warmup)
Chain [1] Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain [1] Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain [1] Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain [1] Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain [1] Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain [1] Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain [1] Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain [1] Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain [1] Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain [1] Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain [1] Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain [1] Iteration: 2000 / 2000 [100%] (Sampling)
Chain [
0.002 seconds (Warm-up)
Elapsed Time: 0.005 seconds (Sampling)
0.007 seconds (Total)
2] Iteration: 1 / 2000 [ 0%] (Warmup)
Chain [2] Iteration: 100 / 2000 [ 5%] (Warmup)
Chain [2] Iteration: 200 / 2000 [ 10%] (Warmup)
Chain [2] Iteration: 300 / 2000 [ 15%] (Warmup)
Chain [2] Iteration: 400 / 2000 [ 20%] (Warmup)
Chain [2] Iteration: 500 / 2000 [ 25%] (Warmup)
Chain [2] Iteration: 600 / 2000 [ 30%] (Warmup)
Chain [2] Iteration: 700 / 2000 [ 35%] (Warmup)
Chain [2] Iteration: 800 / 2000 [ 40%] (Warmup)
Chain [2] Iteration: 900 / 2000 [ 45%] (Warmup)
Chain [2] Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain [2] Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain [2] Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain [2] Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain [2] Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain [2] Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain [2] Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain [2] Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain [2] Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain [2] Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain [2] Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain [2] Iteration: 2000 / 2000 [100%] (Sampling)
Chain [
0.002 seconds (Warm-up)
Elapsed Time: 0.006 seconds (Sampling)
0.008 seconds (Total)
3] Iteration: 1 / 2000 [ 0%] (Warmup)
Chain [3] Iteration: 100 / 2000 [ 5%] (Warmup)
Chain [3] Iteration: 200 / 2000 [ 10%] (Warmup)
Chain [3] Iteration: 300 / 2000 [ 15%] (Warmup)
Chain [3] Iteration: 400 / 2000 [ 20%] (Warmup)
Chain [3] Iteration: 500 / 2000 [ 25%] (Warmup)
Chain [3] Iteration: 600 / 2000 [ 30%] (Warmup)
Chain [3] Iteration: 700 / 2000 [ 35%] (Warmup)
Chain [3] Iteration: 800 / 2000 [ 40%] (Warmup)
Chain [3] Iteration: 900 / 2000 [ 45%] (Warmup)
Chain [3] Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain [3] Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain [3] Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain [3] Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain [3] Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain [3] Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain [3] Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain [3] Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain [3] Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain [3] Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain [3] Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain [3] Iteration: 2000 / 2000 [100%] (Sampling)
Chain [
0.002 seconds (Warm-up)
Elapsed Time: 0.006 seconds (Sampling)
0.008 seconds (Total)
4] Iteration: 1 / 2000 [ 0%] (Warmup)
Chain [4] Iteration: 100 / 2000 [ 5%] (Warmup)
Chain [4] Iteration: 200 / 2000 [ 10%] (Warmup)
Chain [4] Iteration: 300 / 2000 [ 15%] (Warmup)
Chain [4] Iteration: 400 / 2000 [ 20%] (Warmup)
Chain [4] Iteration: 500 / 2000 [ 25%] (Warmup)
Chain [4] Iteration: 600 / 2000 [ 30%] (Warmup)
Chain [4] Iteration: 700 / 2000 [ 35%] (Warmup)
Chain [4] Iteration: 800 / 2000 [ 40%] (Warmup)
Chain [4] Iteration: 900 / 2000 [ 45%] (Warmup)
Chain [4] Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain [4] Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain [4] Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain [4] Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain [4] Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain [4] Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain [4] Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain [4] Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain [4] Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain [4] Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain [4] Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain [4] Iteration: 2000 / 2000 [100%] (Sampling)
Chain [
0.002 seconds (Warm-up)
Elapsed Time: 0.005 seconds (Sampling)
0.007 seconds (Total)
Here are the sampels:
-> sampling
(:samples)
model samples [4000 10]:
:lp__ | :accept_stat__ | :stepsize__ | :treedepth__ | :n_leapfrog__ | :divergent__ | :energy__ | :theta | :i | :chain |
---|---|---|---|---|---|---|---|---|---|
-7.12545 | 0.975350 | 0.961381 | 1 | 3 | 0 | 7.25170 | 0.286407 | 0 | 0 |
-7.12545 | 0.413974 | 0.961381 | 1 | 3 | 0 | 10.85510 | 0.286407 | 1 | 0 |
-7.50087 | 0.891145 | 0.961381 | 1 | 1 | 0 | 7.50260 | 0.356258 | 2 | 0 |
-7.33047 | 1.000000 | 0.961381 | 1 | 1 | 0 | 7.51297 | 0.329950 | 3 | 0 |
-8.20482 | 0.747192 | 0.961381 | 1 | 1 | 0 | 8.22687 | 0.434842 | 4 | 0 |
-7.31109 | 1.000000 | 0.961381 | 2 | 3 | 0 | 8.00672 | 0.150667 | 5 | 0 |
-7.25856 | 0.944143 | 0.961381 | 1 | 3 | 0 | 7.83573 | 0.316883 | 6 | 0 |
-7.16760 | 0.934508 | 0.961381 | 1 | 3 | 0 | 7.62292 | 0.172114 | 7 | 0 |
-7.60978 | 0.920585 | 0.961381 | 2 | 3 | 0 | 8.07348 | 0.370857 | 8 | 0 |
-7.12582 | 1.000000 | 0.961381 | 1 | 1 | 0 | 7.46068 | 0.286512 | 9 | 0 |
… | … | … | … | … | … | … | … | … | … |
-7.05018 | 0.750290 | 1.147290 | 1 | 3 | 0 | 8.30659 | 0.204133 | 989 | 3 |
-7.11741 | 0.936305 | 1.147290 | 2 | 3 | 0 | 7.45500 | 0.182656 | 990 | 3 |
-7.18810 | 0.994007 | 1.147290 | 2 | 3 | 0 | 7.20178 | 0.168430 | 991 | 3 |
-7.23906 | 0.987696 | 1.147290 | 1 | 1 | 0 | 7.27223 | 0.160325 | 992 | 3 |
-7.03316 | 0.807169 | 1.147290 | 2 | 3 | 0 | 8.64586 | 0.214127 | 993 | 3 |
-7.11876 | 0.975819 | 1.147290 | 1 | 3 | 0 | 7.15130 | 0.284487 | 994 | 3 |
-7.02329 | 1.000000 | 1.147290 | 1 | 3 | 0 | 7.10092 | 0.226623 | 995 | 3 |
-7.78634 | 0.863495 | 1.147290 | 2 | 3 | 0 | 7.94797 | 0.110051 | 996 | 3 |
-8.54957 | 0.882045 | 1.147290 | 2 | 3 | 0 | 9.42805 | 0.075029 | 997 | 3 |
-7.18714 | 0.840866 | 1.147290 | 2 | 3 | 0 | 9.04786 | 0.301954 | 998 | 3 |
-7.03367 | 1.000000 | 1.147290 | 1 | 3 | 0 | 7.13470 | 0.248416 | 999 | 3 |
The histogram of \(\theta\):
-> sampling
(:samples
:chain] {:result-type :as-map})
(tc/group-by [
(update-valsfn [chain-samples]
(-> chain-samples
(:theta {:nbins 100}))))) (vis.stats/histogram
{
|
|
|
|
}
The trace plot of \(\theta\):
-> sampling
(:samples
:X :i
(hanami/plot ht/line-chart {:Y :theta
:COLOR "chain"
:OPACITY 0.5}))
source: notebooks/index.clj