cmdstan-clj

cmdstan-clj is a Clojure wrapper of the Stan probabilistic programming language that uses the CmdStan CLI.

Source: (GitHub repo)

Artifact: (Clojars coordinates)

Status: an evolving proof-of-concept.

(ns index
  (:require
   ;; cmdstan-clj API:
   [scicloj.cmdstan-clj.v1.api :as stan]
   ;; Tablecloth for table processing
   [tablecloth.api :as tc]
   ;; Tableplot for data visualization
   [scicloj.tableplot.v1.plotly :as plotly]
   ;; Kindly for specifying how to visualize things:
   [scicloj.kindly.v4.kind :as kind]))

Walkthrough

Here we follow the introductory example from Stan’ documentation.

Let us define our model.

In our probabilistic model here, we have we have an observed vector \(y\) of \(N\) samples. We have an unobserved parameter \(\theta \sim Beta(1,1)\), and the elements of \(y\) are conditionally independent given \(\theta\), and distributed \(Bernoulli(\theta)\) each.

(def model-code
  "
data {
      int<lower=0> N;
      array[N] int<lower=0,upper=1> y;
      }
parameters {
            real<lower=0,upper=1> theta;
            }
model {
       theta ~ beta(1,2);  // uniform prior on interval 0,1
       y ~ bernoulli(theta);
}")

Now we may compile the model, if this has not been done yet.

(def model
  (stan/model model-code))

Here is the output of compiling our model:

(-> model
    :out
    kind/code)
make: '/tmp/scicloj-files/session-dir-9036255698493098978/file-6729332977990631866' is up to date.

Here are some toy data:

(def data
  {:N 10
   :y [0 1 0 0 0 0 0 0 0 1]})

Let us sample from the posterior of \(\theta\) given out observed \(y\) in the data. (Soon we will support relevant options to control the sampling process.)

(def sampling
  (stan/sample model data {:num-chains 4
                           :num-warmup 200
                           :num-samples 100}))

Here is the output of sampling process.

(-> sampling
    :out
    kind/code)
method = sample (Default)
  sample
    num_samples = 100
    num_warmup = 200
    save_warmup = false (Default)
    thin = 1 (Default)
    adapt
      engaged = true (Default)
      gamma = 0.05 (Default)
      delta = 0.8 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
      save_metric = false (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 4
id = 1 (Default)
data
  file = /tmp/scicloj-files/session-dir-9036255698493098978/file-1326435404593930035.json
init = 2 (Default)
random
  seed = 3957778942 (Default)
output
  file = /tmp/scicloj-files/session-dir-9036255698493098978/file-18377587113006538761.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
  save_cmdstan_config = false (Default)
num_threads = 1 (Default)


Gradient evaluation took 4e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 1e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 1e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 1e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!


Chain [1] Iteration:   1 / 300 [  0%]  (Warmup)
Chain [1] Iteration: 100 / 300 [ 33%]  (Warmup)
Chain [1] Iteration: 200 / 300 [ 66%]  (Warmup)
Chain [1] Iteration: 201 / 300 [ 67%]  (Sampling)
Chain [1] Iteration: 300 / 300 [100%]  (Sampling)

 Elapsed Time: 0 seconds (Warm-up)
               0.001 seconds (Sampling)
               0.001 seconds (Total)

Chain [2] Iteration:   1 / 300 [  0%]  (Warmup)
Chain [2] Iteration: 100 / 300 [ 33%]  (Warmup)
Chain [2] Iteration: 200 / 300 [ 66%]  (Warmup)
Chain [2] Iteration: 201 / 300 [ 67%]  (Sampling)
Chain [2] Iteration: 300 / 300 [100%]  (Sampling)

 Elapsed Time: 0.001 seconds (Warm-up)
               0.001 seconds (Sampling)
               0.002 seconds (Total)

Chain [3] Iteration:   1 / 300 [  0%]  (Warmup)
Chain [3] Iteration: 100 / 300 [ 33%]  (Warmup)
Chain [3] Iteration: 200 / 300 [ 66%]  (Warmup)
Chain [3] Iteration: 201 / 300 [ 67%]  (Sampling)
Chain [3] Iteration: 300 / 300 [100%]  (Sampling)

 Elapsed Time: 0.001 seconds (Warm-up)
               0.001 seconds (Sampling)
               0.002 seconds (Total)

Chain [4] Iteration:   1 / 300 [  0%]  (Warmup)
Chain [4] Iteration: 100 / 300 [ 33%]  (Warmup)
Chain [4] Iteration: 200 / 300 [ 66%]  (Warmup)
Chain [4] Iteration: 201 / 300 [ 67%]  (Sampling)
Chain [4] Iteration: 300 / 300 [100%]  (Sampling)

 Elapsed Time: 0.001 seconds (Warm-up)
               0.001 seconds (Sampling)
               0.002 seconds (Total)

Here are the sampels:

(-> sampling
    :samples)

model samples [400 10]:

:lp__ :accept_stat__ :stepsize__ :treedepth__ :n_leapfrog__ :divergent__ :energy__ :theta :i :chain
-9.51599 0.805128 1.15844 1 3 0 9.69947 0.0496807 0 0
-8.85840 1.000000 1.15844 2 3 0 9.52782 0.0653906 1 0
-7.02438 1.000000 1.15844 2 3 0 8.50706 0.2239590 2 0
-7.04063 0.997923 1.15844 2 3 0 7.04199 0.2534120 3 0
-7.03067 1.000000 1.15844 1 1 0 7.03864 0.2457840 4 0
-7.02315 1.000000 1.15844 1 3 0 7.02901 0.2344560 5 0
-7.19824 0.847392 1.15844 2 3 0 7.99303 0.1667100 6 0
-7.02269 0.913278 1.15844 2 3 0 7.86257 0.2297840 7 0
-9.64694 0.459305 1.15844 2 7 0 11.37570 0.0471368 8 0
-9.96135 0.980419 1.15844 1 1 0 10.16410 0.0416407 9 0
-7.39333 0.979176 0.76039 1 3 0 10.07890 0.1413660 89 3
-7.03313 0.965367 0.76039 1 3 0 7.86278 0.2479700 90 3
-7.02413 0.998220 0.76039 1 3 0 7.05091 0.2244690 91 3
-7.13274 0.980484 0.76039 1 3 0 7.20265 0.1791500 92 3
-7.09009 0.993959 0.76039 3 7 0 7.22993 0.2754480 93 3
-7.09009 0.792667 0.76039 1 3 0 8.67380 0.2754480 94 3
-7.11105 0.996362 0.76039 1 1 0 7.11682 0.2821960 95 3
-7.18368 0.987158 0.76039 1 1 0 7.18537 0.3011620 96 3
-7.22798 0.893079 0.76039 1 3 0 8.10309 0.1619810 97 3
-7.22149 1.000000 0.76039 2 3 0 7.25067 0.1629750 98 3
-7.08568 0.960621 0.76039 1 3 0 7.73196 0.2739090 99 3

The histogram of \(\theta\):

(-> sampling
    :samples
    (tc/group-by [:chain] {:result-type :as-map})
    (update-vals
     (fn [chain-samples]
       (-> chain-samples
           (plotly/layer-histogram {:=x :theta
                                    :=histogram-nbins 100})))))

{

{:chain 0}
{:chain 1}
{:chain 2}
{:chain 3}

}

The trace plot of \(\theta\):

(-> sampling
    :samples
    (plotly/layer-line {:=x :i
                        :=y :theta
                        :=color :chain
                        :=color-type :nominal
                        :=mark-opacity 0.5}))
source: notebooks/index.clj