cmdstan-clj

cmdstan-clj is a Clojure wrapper of Stan probabilistic programming language that uses the CmdStan CLI.

Source: (GitHub repo)

Artifact: (Clojars coordinates)

Status: an evolving proof-of-concept.

(ns index
  (:require [tablecloth.api :as tc]
            [clojure.java.shell :as shell]
            [charred.api :as charred]
            [clojure.java.io :as io]
            [tech.v3.dataset.print :as print]
            [clojure.string :as str]
            [scicloj.noj.v1.vis.hanami :as hanami]
            [scicloj.noj.v1.vis.stats :as vis.stats ]
            [aerial.hanami.templates :as ht]
            [scicloj.kindly.v4.kind :as kind]
            [scicloj.noj.v1.stats :as stats]
            [scicloj.cmdstan-clj.v1.api :as stan]))

Walkthrough

Let us define our model.

In our probabilistic model here, we have we have an observed vector \(y\) of \(N\) samples. We have an unobserved parameter \(\theta \sim Beta(1,1)\), and the elements of \(y\) are conditionally independent given \(\theta\), and distributed \(Bernoulli(\theta)\) each.

(def model-code
  "
data {
      int<lower=0> N;
      array[N] int<lower=0,upper=1> y;
      }
parameters {
            real<lower=0,upper=1> theta;
            }
model {
       theta ~ beta(1,2);  // uniform prior on interval 0,1
       y ~ bernoulli(theta);
}")

Now we may compile the model, if this has not been done yet.

(def model
  (stan/model model-code))

Here is the output of compiling our model:

(-> model
    :out
    kind/code)

--- Translating Stan model to C++ code ---
bin/stanc  --o=/tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.hpp /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.stan

--- Compiling C++ code ---
g++ -Wno-deprecated-declarations -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.81.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS          -c -Wno-ignored-attributes   -x c++ -o /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.o /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.hpp

--- Linking model ---
g++ -Wno-deprecated-declarations -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.81.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS               -Wl,-L,"/home/daslu/.cmdstan/cmdstan-2.34.1/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/daslu/.cmdstan/cmdstan-2.34.1/stan/lib/stan_math/lib/tbb"        /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.o src/cmdstan/main.o       -ltbb   stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a  stan/lib/stan_math/lib/tbb/libtbb.so.2 -o /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201
rm /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.hpp /tmp/scicloj-files/session-dir-772334480016722166/file-16025118707950248201.o

Here are some toy data:

(def data
  {:N 10
   :y [0 1 0 0 0 0 0 0 0 1]})

Let us sample from the posterior of \(\theta\) given out observed \(y\) in the data. (Soon we will support relevant options to control the sampling process.)

(def sampling
  (stan/sample model data {:num-chains 4}))

Here is the output of sampling process.

(-> sampling
    :out
    kind/code)
method = sample (Default)
  sample
    num_samples = 1000 (Default)
    num_warmup = 1000 (Default)
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.05 (Default)
      delta = 0.8 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
      save_metric = 0 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 4
id = 1 (Default)
data
  file = /tmp/scicloj-files/session-dir-772334480016722166/file-9398899209516897676.json
init = 2 (Default)
random
  seed = 4275893379 (Default)
output
  file = /tmp/scicloj-files/session-dir-772334480016722166/file-12383328597578324763.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
  save_cmdstan_config = 0 (Default)
num_threads = 1 (Default)


Gradient evaluation took 4e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 1e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 1e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 1e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!


Chain [1] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [1] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [1] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [1] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [1] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [1] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [1] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [1] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [1] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [1] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [1] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [1] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [1] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [1] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [1] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [1] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [1] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [1] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [1] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [1] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [1] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [1] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 0.002 seconds (Warm-up)
               0.005 seconds (Sampling)
               0.007 seconds (Total)

Chain [2] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [2] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [2] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [2] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [2] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [2] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [2] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [2] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [2] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [2] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [2] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [2] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [2] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [2] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [2] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [2] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [2] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [2] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [2] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [2] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [2] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [2] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 0.002 seconds (Warm-up)
               0.006 seconds (Sampling)
               0.008 seconds (Total)

Chain [3] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [3] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [3] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [3] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [3] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [3] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [3] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [3] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [3] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [3] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [3] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [3] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [3] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [3] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [3] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [3] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [3] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [3] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [3] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [3] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [3] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [3] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 0.002 seconds (Warm-up)
               0.006 seconds (Sampling)
               0.008 seconds (Total)

Chain [4] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [4] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [4] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [4] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [4] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [4] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [4] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [4] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [4] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [4] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [4] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [4] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [4] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [4] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [4] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [4] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [4] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [4] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [4] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [4] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [4] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [4] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 0.002 seconds (Warm-up)
               0.005 seconds (Sampling)
               0.007 seconds (Total)

Here are the sampels:

(-> sampling
    :samples)

model samples [4000 10]:

:lp__ :accept_stat__ :stepsize__ :treedepth__ :n_leapfrog__ :divergent__ :energy__ :theta :i :chain
-7.12545 0.975350 0.961381 1 3 0 7.25170 0.286407 0 0
-7.12545 0.413974 0.961381 1 3 0 10.85510 0.286407 1 0
-7.50087 0.891145 0.961381 1 1 0 7.50260 0.356258 2 0
-7.33047 1.000000 0.961381 1 1 0 7.51297 0.329950 3 0
-8.20482 0.747192 0.961381 1 1 0 8.22687 0.434842 4 0
-7.31109 1.000000 0.961381 2 3 0 8.00672 0.150667 5 0
-7.25856 0.944143 0.961381 1 3 0 7.83573 0.316883 6 0
-7.16760 0.934508 0.961381 1 3 0 7.62292 0.172114 7 0
-7.60978 0.920585 0.961381 2 3 0 8.07348 0.370857 8 0
-7.12582 1.000000 0.961381 1 1 0 7.46068 0.286512 9 0
-7.05018 0.750290 1.147290 1 3 0 8.30659 0.204133 989 3
-7.11741 0.936305 1.147290 2 3 0 7.45500 0.182656 990 3
-7.18810 0.994007 1.147290 2 3 0 7.20178 0.168430 991 3
-7.23906 0.987696 1.147290 1 1 0 7.27223 0.160325 992 3
-7.03316 0.807169 1.147290 2 3 0 8.64586 0.214127 993 3
-7.11876 0.975819 1.147290 1 3 0 7.15130 0.284487 994 3
-7.02329 1.000000 1.147290 1 3 0 7.10092 0.226623 995 3
-7.78634 0.863495 1.147290 2 3 0 7.94797 0.110051 996 3
-8.54957 0.882045 1.147290 2 3 0 9.42805 0.075029 997 3
-7.18714 0.840866 1.147290 2 3 0 9.04786 0.301954 998 3
-7.03367 1.000000 1.147290 1 3 0 7.13470 0.248416 999 3

The histogram of \(\theta\):

(-> sampling
    :samples
    (tc/group-by [:chain] {:result-type :as-map})
    (update-vals
     (fn [chain-samples]
       (-> chain-samples
           (vis.stats/histogram :theta {:nbins 100})))))

{

{:chain 0}
{:chain 1}
{:chain 2}
{:chain 3}

}

The trace plot of \(\theta\):

(-> sampling
    :samples
    (hanami/plot ht/line-chart {:X :i
                                :Y :theta
                                :COLOR "chain"
                                :OPACITY 0.5}))
source: notebooks/index.clj