cmdstan-clj

cmdstan-clj is a Clojure wrapper of the Stan probabilistic programming language that uses the CmdStan CLI.

Source:

Artifact:

Status: an evolving proof-of-concept.

(ns index
  (:require
   ;; cmdstan-clj API:
   [scicloj.cmdstan-clj.v1.api :as stan]
   ;; Tablecloth for table processing
   [tablecloth.api :as tc]
   ;; Tableplot for data visualization
   [scicloj.tableplot.v1.plotly :as plotly]
   ;; Kindly for specifying how to visualize things:
   [scicloj.kindly.v4.kind :as kind]))

Walkthrough

Here we follow the introductory example from Stan’ documentation.

Let us define our model.

In our probabilistic model here, we have we have an observed vector \(y\) of \(N\) samples. We have an unobserved parameter \(\theta \sim Beta(1,1)\), and the elements of \(y\) are conditionally independent given \(\theta\), and distributed \(Bernoulli(\theta)\) each.

(def model-code
  "
data {
      int<lower=0> N;
      array[N] int<lower=0,upper=1> y;
      }
parameters {
            real<lower=0,upper=1> theta;
            }
model {
       theta ~ beta(1,2);  // uniform prior on interval 0,1
       y ~ bernoulli(theta);
}")

Now we may compile the model, if this has not been done yet.

(def model
  (stan/model model-code))

Here is the output of compiling our model:

(-> model
    :out
    kind/code)

make: '/tmp/scicloj-files/session-dir-9036255698493098978/file-6729332977990631866' is up to date.

Here are some toy data:

(def data
  {:N 10
   :y [0 1 0 0 0 0 0 0 0 1]})

Let us sample from the posterior of \(\theta\) given out observed \(y\) in the data. (Soon we will support relevant options to control the sampling process.)

(def sampling
  (stan/sample model data {:num-chains 4
                           :num-warmup 200
                           :num-samples 100}))

Here is the output of sampling process.

(-> sampling
    :out
    kind/code)

method = sample (Default)
  sample
    num_samples = 100
    num_warmup = 200
    save_warmup = false (Default)
    thin = 1 (Default)
    adapt
      engaged = true (Default)
      gamma = 0.05 (Default)
      delta = 0.8 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
      save_metric = false (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 4
id = 1 (Default)
data
  file = /tmp/scicloj-files/session-dir-9036255698493098978/file-1326435404593930035.json
init = 2 (Default)
random
  seed = 3957778942 (Default)
output
  file = /tmp/scicloj-files/session-dir-9036255698493098978/file-18377587113006538761.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
  save_cmdstan_config = false (Default)
num_threads = 1 (Default)


Gradient evaluation took 4e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 1e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 1e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 1e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Adjust your expectations accordingly!


Chain [1] Iteration:   1 / 300 [  0%]  (Warmup)
Chain [1] Iteration: 100 / 300 [ 33%]  (Warmup)
Chain [1] Iteration: 200 / 300 [ 66%]  (Warmup)
Chain [1] Iteration: 201 / 300 [ 67%]  (Sampling)
Chain [1] Iteration: 300 / 300 [100%]  (Sampling)

 Elapsed Time: 0 seconds (Warm-up)
               0.001 seconds (Sampling)
               0.001 seconds (Total)

Chain [2] Iteration:   1 / 300 [  0%]  (Warmup)
Chain [2] Iteration: 100 / 300 [ 33%]  (Warmup)
Chain [2] Iteration: 200 / 300 [ 66%]  (Warmup)
Chain [2] Iteration: 201 / 300 [ 67%]  (Sampling)
Chain [2] Iteration: 300 / 300 [100%]  (Sampling)

 Elapsed Time: 0.001 seconds (Warm-up)
               0.001 seconds (Sampling)
               0.002 seconds (Total)

Chain [3] Iteration:   1 / 300 [  0%]  (Warmup)
Chain [3] Iteration: 100 / 300 [ 33%]  (Warmup)
Chain [3] Iteration: 200 / 300 [ 66%]  (Warmup)
Chain [3] Iteration: 201 / 300 [ 67%]  (Sampling)
Chain [3] Iteration: 300 / 300 [100%]  (Sampling)

 Elapsed Time: 0.001 seconds (Warm-up)
               0.001 seconds (Sampling)
               0.002 seconds (Total)

Chain [4] Iteration:   1 / 300 [  0%]  (Warmup)
Chain [4] Iteration: 100 / 300 [ 33%]  (Warmup)
Chain [4] Iteration: 200 / 300 [ 66%]  (Warmup)
Chain [4] Iteration: 201 / 300 [ 67%]  (Sampling)
Chain [4] Iteration: 300 / 300 [100%]  (Sampling)

 Elapsed Time: 0.001 seconds (Warm-up)
               0.001 seconds (Sampling)
               0.002 seconds (Total)

Here are the sampels:

(-> sampling
    :samples)

model samples [400 10]:

:lp__	:accept_stat__	:stepsize__	:treedepth__	:n_leapfrog__	:divergent__	:energy__	:theta	:i	:chain
-9.51599	0.805128	1.15844	1	3	0	9.69947	0.0496807	0	0
-8.85840	1.000000	1.15844	2	3	0	9.52782	0.0653906	1	0
-7.02438	1.000000	1.15844	2	3	0	8.50706	0.2239590	2	0
-7.04063	0.997923	1.15844	2	3	0	7.04199	0.2534120	3	0
-7.03067	1.000000	1.15844	1	1	0	7.03864	0.2457840	4	0
-7.02315	1.000000	1.15844	1	3	0	7.02901	0.2344560	5	0
-7.19824	0.847392	1.15844	2	3	0	7.99303	0.1667100	6	0
-7.02269	0.913278	1.15844	2	3	0	7.86257	0.2297840	7	0
-9.64694	0.459305	1.15844	2	7	0	11.37570	0.0471368	8	0
-9.96135	0.980419	1.15844	1	1	0	10.16410	0.0416407	9	0
…	…	…	…	…	…	…	…	…	…
-7.39333	0.979176	0.76039	1	3	0	10.07890	0.1413660	89	3
-7.03313	0.965367	0.76039	1	3	0	7.86278	0.2479700	90	3
-7.02413	0.998220	0.76039	1	3	0	7.05091	0.2244690	91	3
-7.13274	0.980484	0.76039	1	3	0	7.20265	0.1791500	92	3
-7.09009	0.993959	0.76039	3	7	0	7.22993	0.2754480	93	3
-7.09009	0.792667	0.76039	1	3	0	8.67380	0.2754480	94	3
-7.11105	0.996362	0.76039	1	1	0	7.11682	0.2821960	95	3
-7.18368	0.987158	0.76039	1	1	0	7.18537	0.3011620	96	3
-7.22798	0.893079	0.76039	1	3	0	8.10309	0.1619810	97	3
-7.22149	1.000000	0.76039	2	3	0	7.25067	0.1629750	98	3
-7.08568	0.960621	0.76039	1	3	0	7.73196	0.2739090	99	3

The histogram of \(\theta\):

(-> sampling
    :samples
    (tc/group-by [:chain] {:result-type :as-map})
    (update-vals
     (fn [chain-samples]
       (-> chain-samples
           (plotly/layer-histogram {:=x :theta
                                    :=histogram-nbins 100})))))

{

{:chain 0}

{:chain 1}

{:chain 2}

{:chain 3}

}

The trace plot of \(\theta\):

(-> sampling
    :samples
    (plotly/layer-line {:=x :i
                        :=y :theta
                        :=color :chain
                        :=color-type :nominal
                        :=mark-opacity 0.5}))

source: notebooks/index.clj