clojisr.v1.codegen-test

clojisr.v1.codegen-test - created by notespace, Sat Aug 13 18:45:39 CEST 2022.
Checks: 74 PASSED
Table of contents

R code generation from the Clojure forms

R code in clojisr library can be represented in three main ways:

  • as string containing R code or script
  • as RObject
  • as Clojure form

RObject is clojisr data structure which keeps reference to R objects. Also can act as a function when referenced object is R function. RObject is returned always when R code is executed.

Let's see what is possible in detail.

First, require the necessary namespaces.

Also, let us make sure we are using a clean session.

(require '[clojisr.v1.r :as r :refer [r ->code r->clj]]
          '[notespace.v2.note :refer [check]]
          '[tech.v3.dataset :as dataset])
(r/set-default-session-type! :rserve) (r/discard-all-sessions)

R code as a string

To run any R code as string or Clojure form we use clojisr.v1.r/r function

(r "mean(rnorm(100000,mean=1.0,sd=3.0))")
[1] 1.012373
(r
   "abc <- runif(1000);
          f <- function(x) {mean(log(x))};
          f(abc)")
[1] -0.9888527

As mentioned above, every r call creates RObject and R variable which keeps result of the execution.

(def result (r "rnorm(10)"))
#'clojisr.v1.codegen-test/result
(class result)
clojisr.v1.robject.RObject
(:object-name result)
".MEM$x97fa6b3ba71241dd"

Let's use the var name string to see what it represents.

(r (:object-name result))
 [1] -0.4150646  0.1906046  0.4917641 -0.6419359 -1.1444925  0.5567565
 [7] -0.2977476 -3.0026305 -0.3829881  1.1909041

Now let us move to discussing the ROBject data type.

RObject

Every RObject acts as Clojure reference to an R variable. All these variables are held in an R environment called .MEM. An RObject can represent anything and can be used for further evaluation, even acting as a function if it corresponds to an R function. Here are some examples:

An r-object holding some R data:

(def dataset (r "nhtemp"))
#'clojisr.v1.codegen-test/dataset

An r-object holding an R function:

(def function (r "mean"))
#'clojisr.v1.codegen-test/function

Printing the data:

dataset
Time Series:
Start = 1912 
End = 1971 
Frequency = 1 
 [1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4
[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7
[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9
[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0

Equivalently:

(r dataset)
Time Series:
Start = 1912 
End = 1971 
Frequency = 1 
 [1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4
[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7
[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9
[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0

We use r->clj to transfer data from R to Clojure (converting an R object to Clojure data):

(-> (r->clj dataset)
     (dataset/select-rows 0)
     (dataset/mapseq-reader)
     (->> (check = [{:$series 49.9, :$time 1912.0}])))
[:PASSED [{:$series 49.9, :$time 1912.0}]]

Creating an R object, applying the function to it, and conveting to Clojure data (in this pipeline, both function and r return an RObject):

(->> "c(1,2,3,4,5,6)"
      r
      function
      r->clj
      (check = [3.5]))
[:PASSED [3.5]]

Clojure forms

Calling R with the code as a string is quite limited. You can't easily inject Clojure data into the code. Also, editor support is very limited for this way of writing. So we enable the use of Clojure forms as a DSL to simplify the construnction of R code.

In generating R code from Clojure forms, clojisr operates on both the var and the symbol level, and can also digest primitive types and basic data structures. There are some special symbols which help in creating R formulas and defining R functions. We will go through all of these in detail.

The ->code function is responsible for turning Clojure forms into R code.

(->> [1 2 4]
      ->code
      (check = "c(1L,2L,4L)"))
[:PASSED "c(1L,2L,4L)"]

When the r function gets an argument that is not a string, it uses ->code behind the scenes to turn that argument into code as a string.

(r [1 2 4])
[1] 1 2 4
(->> [1 2 4]
      r
      r->clj
      (check = [1 2 4]))
[:PASSED [1 2 4]]

Equivalently:

(->> [1 2 4]
      ->code
      r
      r->clj
      (check = [1 2 4]))
[:PASSED [1 2 4]]

Primitive data types

(->> (r 1)
      r->clj
      (check = [1]))
[:PASSED [1]]
(->> (r 2.0)
      r->clj
      (check = [2.0]))
[:PASSED [2.0]]
(->> (r 3/4)
      r->clj
      (check = [0.75]))
[:PASSED [0.75]]
(->> (r true)
      r->clj
      (check = [true]))
[:PASSED [true]]
(->> (r false)
      r->clj
      (check = [false]))
[:PASSED [false]]

nil is converted to NULL or NA (in vectors or maps)

(->> (r nil)
      r->clj
      (check = nil))
[:PASSED nil]
(->> (->code nil)
      (check = "NULL"))
[:PASSED "NULL"]

When you pass a string to r, it is treated as code. So we have to escape double quotes if we actually mean to represent an R string (or an R character object, as it is called in R). However, when string is used inside a more complex form, it is escaped automatically.

(->> (->code "\"this is a string\"")
      (check = "\"\"this is a string\"\""))
[:PASSED "\"\"this is a string\"\""]
(->> (r "\"this is a string\"")
      r->clj
      (check = ["this is a string"]))
[:PASSED ["this is a string"]]
(->> (->code '(paste "this is a string"))
      (check = "paste(\"this is a string\")"))
[:PASSED "paste(\"this is a string\")"]
(->> (r '(paste "this is a string"))
      r->clj
      (check = ["this is a string"]))
[:PASSED ["this is a string"]]

Any Named Clojure object that is not a String (like a keyword or a symbol) is converted to a R symbol.

(->> (->code :keyword)
      (check = "keyword"))
[:PASSED "keyword"]
(->> (->code 'symb)
      (check = "symb"))
[:PASSED "symb"]

An RObject is converted to a R variable.

(->code (r "1+2"))
".MEM$xa98396ebec3e400c"

Date/time is converted to a string.

(->> #inst "2031-02-03T11:22:33"
      ->code
      (check re-matches #"'2031-02-03 1.:22:33'"))
[:PASSED "'2031-02-03 12:22:33'"]
(r #inst "2031-02-03T11:22:33")
[1] "2031-02-03 12:22:33"
(->> #inst "2031-02-03T11:22:33"
      r
      r->clj
      (check (fn [v]
               (and (vector? v)
                    (-> v
                        count
                        (= 1))
                    (->> v
                         first
                         (re-matches #"2031-02-03 1.:22:33"))))))
[:PASSED ["2031-02-03 12:22:33"]]

Vectors

A Clojure vector is converted to an R vector created using the c function. That means that nested vectors are flattened. All the values inside are translated to R recursively.

(->> (->code [1 2 3])
      (check = "c(1L,2L,3L)"))
[:PASSED "c(1L,2L,3L)"]
(->> (r [[1] [2 [3]]])
      r->clj
      (check = [1 2 3]))
[:PASSED [1 2 3]]

Some Clojure sequences are interpreted as function calls, if it makes sense for their first element. However, sequences beginning with numbers or strings are treated as vectors.

(r (range 11))
 [1]  0  1  2  3  4  5  6  7  8  9 10
(r (map str (range 11)))
 [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Tagged vectors

When the first element of a vector or a sequence is a keyword starting with :!, some special conversion takes place.

keywordmeaning
:!stringvector of strings
:!booleanvector of logicals
:!intvector of integers
:!doublevector of doubles
:!namednamed vector
:!listpartially named list
:!calltreat the rest of the vector as callable sequence
:!ctvector of POSIXct classes
:!ltvector of POSIXlt classes

nil in a vector is converted to NA

(->> (r [:!string 1 nil 3])
      r->clj
      (check = ["1" nil "3"]))
[:PASSED ["1" nil "3"]]
(->> (r [:!boolean 1 true nil false])
      r->clj
      (check = [true true nil false]))
[:PASSED [true true nil false]]
(->> (r [:!double 1.0 nil 3])
      r->clj
      (check = [1.0 nil 3.0]))
[:PASSED [1.0 nil 3.0]]
(->> (r [:!int 1.0 nil 3])
      r->clj
      (check = [1 nil 3]))
[:PASSED [1 nil 3]]
(->> (r [:!named 1 2 :abc 3])
      r->clj
      (check = [1 2 3]))
[:PASSED [1 2 3]]
(->> (r [:!list :a 1 :b [:!list 1 2 :c ["a" "b"]]])
      r->clj
      (check = {:a [1], :b {0 [1], 1 [2], :c ["a" "b"]}}))
[:PASSED {:a [1], :b {0 [1], 1 [2], :c ["a" "b"]}}]
(->> (r [:!ct #inst "2011-11-01T22:33:11"])
      r->clj)
[#object[java.time.LocalDateTime 0x18da2d74 "2011-11-01T23:33:11"]]
(->> (r [:!lt #inst "2011-11-01T22:33:11"])
      r->clj)
[#object[java.time.LocalDateTime 0x17b6737c "2011-11-01T23:33:11"]]

When a vector is big enough, it is transfered not directly as code, but as the name of a newly created R variable holding the corresponding vector data, converted via the Java conversion layer.

(->code (range 10000))
".MEM$x1cd1fd29dfc4412f"
(->> (r (conj (range 10000) :!string))
      r->clj
      first
      (check = "0"))
[:PASSED "0"]

Treat vector as callable.

(->> (r [:!call 'mean [1 2 3 4]])
      r->clj
      (check = [2.5]))
[:PASSED [2.5]]

Maps

A Clojue Map is transformed to an R named list. As with vectors, all data elements inside are processed recursively.

(r {:a 1, :b nil})
$a
[1] 1

$b
[1] NA
(->> (r {:a 1, :b nil, :c [2.0 3 4]})
      r->clj
      (check = {:a [1], :b [nil], :c [2.0 3.0 4.0]}))
[:PASSED {:a [1], :b [nil], :c [2.0 3.0 4.0]}]

Bigger maps are transfered to R variables via the Java conversion layer.

(->code (zipmap (map #(str "key" %) (range 100)) (range 1000 1100)))
".MEM$xb590b7ec808c495a"
(->> (r (zipmap (map #(str "key" %) (range 100)) (range 1000 1100)))
      r->clj
      :key23
      (check = [1023]))
[:PASSED [1023]]

Calls, operators and special symbols

Now we come to the most important part, using sequences to represent function calls. One way to do that is using a list, where the first element is a symbol corresponding to the name of an R function, or an RObject corresponding to an R function. To create a function call we use the same structure as in clojure. The two examples below are are equivalent.

Recall that symbols are converted to R variable names on the R side.

(r "mean(c(1,2,3))")
[1] 2
(r '(mean [1 2 3]))
[1] 2
(->> (->code '(mean [1 2 3]))
      (check = "mean(c(1L,2L,3L))"))
[:PASSED "mean(c(1L,2L,3L))"]

Here is another example.

(r '(<- x (mean [1 2 3])))
[1] 2
(->> (r 'x)
      r->clj
      (check = [2.0]))
[:PASSED [2.0]]

Here is another example.

Recall that RObjects are converted to the names of the corresponding R objects.

(-> (list (r 'median) [1 2 4])
     ->code)
".MEM$x7e3cb1dfd933499d(c(1L,2L,4L))"
(->> (list (r 'median) [1 2 4])
      r
      r->clj
      (check = [2]))
[:PASSED [2]]

You can call using special names (surrounded by backquote) as strings

(->> (r '("`^`" 10 2))
      r->clj
      (check = [100.0]))
[:PASSED [100.0]]

There are some special symbols which get a special meaning on,:

symbolmeaning
'( )Wrap first element of the quoted list into parentheses
functionR function definition
dojoin all forms using ";" and wrap into {}
forfor loop with multiple bindings
whilewhile loop
ifif or if-else
tilde or formulaR formula
coloncolon (:)
rsymbolqualified and/or backticked symbol wrapper
bra[
brabra[[
bra<-[<-
brabra<-[[<-

Sometimes symbols are represented as string with spaces inside, also can be prepend with package name. Tick ' in clojure is not enough for that, for that purpose you can use rsymbol.

(->> (r/->code '(rsymbol name))
      (check = "name"))
[:PASSED "name"]
(->> (r/->code '(rsymbol "name with spaces"))
      (check = "`name with spaces`"))
[:PASSED "`name with spaces`"]
(->> (r/->code '(rsymbol package name))
      (check = "package::name"))
[:PASSED "package::name"]
(->> (r/->code '(rsymbol "package with spaces" name))
      (check = "`package with spaces`::name"))
[:PASSED "`package with spaces`::name"]
(->> ((r/rsymbol 'base 'mean) [1 2 3 4])
      r->clj
      (check = [2.5]))
[:PASSED [2.5]]
(->> ((r/rsymbol "[") 'iris 1)
      r->clj
      dataset/mapseq-reader
      first
      :Sepal.Length
      (check = 5.1))
[:PASSED 5.1]
(->> ((r/rsymbol 'base "[") 'iris 1)
      r->clj
      dataset/mapseq-reader
      first
      :Sepal.Length
      (check = 5.1))
[:PASSED 5.1]

All bra... functions accept nil or empty-symbol to mark empty selector.

(def m
   (r '(matrix (colon 1 6)
               :nrow 2
               :dimnames [:!list ["a" "b"]
                          (bra LETTERS (colon 1 3))])))
m
  A B C
a 1 3 5
b 2 4 6
(->> (r '(bra ~m nil 1))
      r->clj
      (check = [1 2]))
[:PASSED [1 2]]
(->> (r '(bra ~m 1 nil))
      r->clj
      (check = [1 3 5]))
[:PASSED [1 3 5]]
(->> (r '(bra ~m 1 nil :drop false))
      r->clj
      dataset/value-reader
      (check = [["a" 1 3 5]]))
[:PASSED [["a" 1 3 5]]]
(->> (r '(bra<- ~m 1 nil [11 22 33]))
      r->clj
      dataset/value-reader
      (check = [["a" 11 22 33] ["b" 2 4 6]]))
[:PASSED [["a" 11 22 33] ["b" 2 4 6]]]
(->> (r '(bra<- ~m nil 1 [22 33]))
      r->clj
      dataset/value-reader
      (check = [["a" 22 3 5] ["b" 33 4 6]]))
[:PASSED [["a" 22 3 5] ["b" 33 4 6]]]
(->> (r/bra m nil 1)
      r->clj
      (check = [1 2]))
[:PASSED [1 2]]
(->> (r/bra m 1 nil)
      r->clj
      (check = [1 3 5]))
[:PASSED [1 3 5]]
(->> (r/bra m 1 nil :drop false)
      r->clj
      dataset/value-reader
      (check = [["a" 1 3 5]]))
[:PASSED [["a" 1 3 5]]]
(->> (r/bra<- m 1 nil [11 22 33])
      r->clj
      dataset/value-reader
      (check = [["a" 11 22 33] ["b" 2 4 6]]))
[:PASSED [["a" 11 22 33] ["b" 2 4 6]]]
(->> (r/bra<- m nil 1 [22 33])
      r->clj
      dataset/value-reader
      (check = [["a" 22 3 5] ["b" 33 4 6]]))
[:PASSED [["a" 22 3 5] ["b" 33 4 6]]]
(def l (r [:!list "a" "b" "c"]))
l
[[1]]
[1] "a"

[[2]]
[1] "b"

[[3]]
[1] "c"
(->> (r '(brabra ~l 2))
      r->clj
      (check = ["b"]))
[:PASSED ["b"]]
(->> (r '(brabra<- ~l 2 nil))
      r->clj
      (check = [["a"] ["c"]]))
[:PASSED [["a"] ["c"]]]
(->> (r '(brabra<- ~l 5 "fifth"))
      r->clj
      (check = [["a"] ["b"] ["c"] nil ["fifth"]]))
[:PASSED [["a"] ["b"] ["c"] nil ["fifth"]]]
(->> (r/brabra l 2)
      r->clj
      (check = ["b"]))
[:PASSED ["b"]]
(->> (r/brabra<- l 2 nil)
      r->clj
      (check = [["a"] ["c"]]))
[:PASSED [["a"] ["c"]]]
(->> (r/brabra<- l 5 "fifth")
      r->clj
      (check = [["a"] ["b"] ["c"] nil ["fifth"]]))
[:PASSED [["a"] ["b"] ["c"] nil ["fifth"]]]

You can use if with optional else form. Use do to create block of operations

(->> (r '(if true 11 22))
      r->clj
      (check = [11]))
[:PASSED [11]]
(->> (r '(if false 11 22))
      r->clj
      (check = [22]))
[:PASSED [22]]
(->> (r '(if true 11))
      r->clj
      (check = [11]))
[:PASSED [11]]
(->> (r '(if false 11))
      r->clj
      (check = nil))
[:PASSED nil]
(->> (r '(if true (do (<- x [1 2 3 4]) (mean x))))
      r->clj
      (check = [2.5]))
[:PASSED [2.5]]

do wraps everything into curly braces {}

(check = (->code '(do (<- x 1) (<- x (+ x 1)))) "{x<-1L;x<-(x+1L)}")
[:PASSED "{x<-1L;x<-(x+1L)}"]

Loops

(->> (r '(do (<- v 3)
              (<- coll [v])
              (while (> v 0) (<- v (- v 1)) (<- coll [coll v]))
              coll))
      r->clj
      (check = [3 2 1 0]))
[:PASSED [3 2 1 0]]
(def for-form
   '(do (<- coll [])
        (for [a [1 2] b [3 4]] (<- coll [coll (* a b)]))
        coll))
(->code for-form)
"{coll<-c();for(a in c(1L,2L)){for(b in c(3L,4L)){coll<-c(coll,(a*b))\n}\n};coll}"
(->> (r for-form)
      r->clj
      (check = [3 4 6 8]))
[:PASSED [3 4 6 8]]

Sometimes wrapping into parentheses is needed.

(check = (->code '(:!wrap z)) "(z)")
[:PASSED "(z)"]
(check =
        (->code '[:!list 1.0 2.0 3.0 (:!wrap inside)])
        "list(1.0,2.0,3.0,(inside))")
[:PASSED "list(1.0,2.0,3.0,(inside))"]

Function definitions

To define a function, use the function symbol with a following vector of argument names, and then the body. Arguments are treated as a partially named list.

(r '(<- stat
         (function [x :median false ...]
                   (ifelse median (median x ...) (mean x ...)))))
function (x, median = FALSE, ...) 
{
    ifelse(median, median(x, ...), mean(x, ...))
}
(->> (r '(stat [100 33 22 44 55]))
      r->clj
      (check = [50.8]))
[:PASSED [50.8]]
(->> (r '(stat [100 33 22 44 55] :median true))
      r->clj
      (check = [44]))
[:PASSED [44]]
(->> (r '(stat [100 33 22 44 55 nil]))
      r->clj
      first
      (check nil?))
[:PASSED nil]
(->> (r '(stat [100 33 22 44 55 nil] :na.rm true))
      r->clj
      (check = [50.8]))
[:PASSED [50.8]]

Formulas

To create an R formula, use tilde or formula with two arguments, for the left and right sides (to skip one, just use nil).

(r '(formula y x))
y ~ x
(r '(formula y (| (+ a b c d) e)))
y ~ a + b + c + d | e
(r '(formula nil (| x y)))
~x | y

Operators

(->code '(+ 1 2 3 4 5))
"((((1L+2L)+3L)+4L)+5L)"
(->code '(/ 1 2 3 4 5))
"((((1L/2L)/3L)/4L)/5L)"
(->code '(- [1 2 3]))
"-(c(1L,2L,3L))"
(->code '(<- a b c 123))
"a<-b<-c<-123L"
(->code '($ a b c d))
"a$b$c$d"

Unquoting

Sometimes we want to use objects created outside our form (defined earlier or in let). For this case you can use the unqote (~) symbol. There are two options:

  • when using quoting ', unqote evaluates the uquoted form using eval. eval has some constrains, the most important is that local bindings (let bindings) can't be use.
  • when using syntax quoting (backquote `), unqote acts as in clojure macros – all unquoted forms are evaluated instantly.
(def v (r '(+ 1 2 3 4))) (r '(* 22.0 ~v))
[1] 220
(let [local-v (r '(+ 1 2 3 4))
       local-list [4 5 6]]
   (r `(* 22.0 ~local-v ~@local-list)))
[1] 26400

Calling R functions

You are not limited to the use code forms. When an RObject correspinds to an R function, it can be used and called as normal Clojure functions.

(def square (r '(function [x] (* x x))))
#'clojisr.v1.codegen-test/square
(->> (square 123)
      r->clj
      first
      (check = 15129))
[:PASSED 15129]

Checks: 74 PASSED
clojisr.v1.codegen-test - created by notespace, Sat Aug 13 18:45:39 CEST 2022.