3  R code generation from Clojure forms

(ns clojisr.v1.tutorials.codegen
  (:require [clojisr.v1.r :as r :refer [r ->code r->clj]]
            [tech.v3.dataset :as dataset]
            [scicloj.kindly.v4.kind :as kind]
            [scicloj.kindly.v4.api :as kindly]))

R code in clojisr library can be represented in three main ways:

RObject is clojisr data structure which keeps reference to R objects. Also can act as a function when referenced object is R function. RObject is returned always when R code is executed.

Let’s see what is possible in detail.

First, require the necessary namespaces.

Also, let us make sure we are using a clean session.

(r/set-default-session-type! :rserve)
{:session-type :rserve}
(r/discard-all-sessions)
{}

3.1 R code as a string

To run any R code as string or Clojure form we use clojisr.v1.r/r function

(r "mean(rnorm(100000,mean=1.0,sd=3.0))")
[1] 0.995323
(r "abc <- runif(1000);
          f <- function(x) {mean(log(x))};
          f(abc)")
[1] -1.014055

As mentioned above, every r call creates RObject and R variable which keeps result of the execution.

(def result (r "rnorm(10)"))
(class result)
clojisr.v1.robject.RObject
(:object-name result)
".MEM$xeaf880945946451b"

Let’s use the var name string to see what it represents.

(r (:object-name result))
 [1] -0.98451042 -0.01969949  0.33094814 -0.93603276 -1.26566080 -1.38561694
 [7]  0.95174281  2.05033309 -0.95599948 -0.81838708

Now let us move to discussing the ROBject data type.

3.2 RObject

Every RObject acts as Clojure reference to an R variable. All these variables are held in an R environment called .MEM. An RObject can represent anything and can be used for further evaluation, even acting as a function if it corresponds to an R function. Here are some examples:

An r-object holding some R data:

(def dataset (r "nhtemp"))

An r-object holding an R function:

(def function (r "mean"))

Printing the data:

dataset
Time Series:
Start = 1912 
End = 1971 
Frequency = 1 
 [1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4
[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7
[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9
[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0

Equivalently:

(r dataset)
Time Series:
Start = 1912 
End = 1971 
Frequency = 1 
 [1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4
[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7
[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9
[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0

We use r->clj to transfer data from R to Clojure (converting an R object to Clojure data):

(-> (r->clj dataset)
    (dataset/select-rows 0)
    (dataset/mapseq-reader))
[{:$time 1912.0, :$series 49.9}]

Creating an R object, applying the function to it, and conveting to Clojure data (in this pipeline, both function and r return an RObject):

(-> "c(1,2,3,4,5,6)"
    r
    function
    r->clj)
[3.5]

3.3 Clojure forms

Calling R with the code as a string is quite limited. You can’t easily inject Clojure data into the code. Also, editor support is very limited for this way of writing. So we enable the use of Clojure forms as a DSL to simplify the construnction of R code.

In generating R code from Clojure forms, clojisr operates on both the var and the symbol level, and can also digest primitive types and basic data structures. There are some special symbols which help in creating R formulas and defining R functions. We will go through all of these in detail.

The ->code function is responsible for turning Clojure forms into R code.

(->code [1 2 4])
"c(1L,2L,4L)"

When the r function gets an argument that is not a string, it uses ->code behind the scenes to turn that argument into code as a string.

(r [1 2 4])
[1] 1 2 4
(-> [1 2 4]
    r
    r->clj)
[1 2 4]

Equivalently:

(-> [1 2 4]
    ->code
    r
    r->clj)
[1 2 4]

3.3.1 Primitive data types

(-> 1 r r->clj)
[1]
(-> 2.0 r r->clj)
[2.0]
(-> 3/4 r r->clj)
[0.75]
(-> true r r->clj)
[true]
(-> false r r->clj)
[false]

nil is converted to NULL or NA (in vectors or maps)

(-> nil r r->clj)
nil
(->code nil)
"NULL"

Infinities etc.

(-> ##Inf r r->clj)
[##Inf]
(->> ##-Inf r r->clj)
[##-Inf]
(->> ##NaN r r->clj first)
##NaN

When you pass a string to r, it is treated as code. So we have to escape double quotes if we actually mean to represent an R string (or an R character object, as it is called in R). However, when string is used inside a more complex form, it is escaped automatically.

(->code "\"this is a string\"")
"\"\"this is a string\"\""
(-> "\"this is a string\"" r r->clj)
["this is a string"]
(->code '(paste "this is a string"))
"paste(\"this is a string\")"
(-> '(paste "this is a string") r r->clj)
["this is a string"]

Any Named Clojure object that is not a String (like a keyword or a symbol) is converted to a R symbol.

(->code :keyword)
"keyword"
(->code 'symb)
"symb"

An RObject is converted to a R variable.

(->code (r "1+2"))
".MEM$x7ae96b8eb5c44428"

Date/time is converted to a string.

(->code #inst "2031-02-03T11:22:33")
"'2031-02-03 13:22:33'"
(r #inst "2031-02-03T11:22:33")
[1] "2031-02-03 13:22:33"
(-> #inst "2031-02-03T11:22:33"
    r
    r->clj)
["2031-02-03 13:22:33"]

3.3.2 Vectors

A Clojure vector is converted to an R vector created using the c function. That means that nested vectors are flattened. All the values inside are translated to R recursively.

(->code [1 2 3])
"c(1L,2L,3L)"
(-> [[1] [2 [3]]] r r->clj)
[1 2 3]

Some Clojure sequences are interpreted as function calls, if it makes sense for their first element. However, sequences beginning with numbers or strings are treated as vectors.

(r (range 11))
 [1]  0  1  2  3  4  5  6  7  8  9 10
(r (map str (range 11)))
 [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

3.3.2.1 Tagged vectors

When the first element of a vector or a sequence is a keyword starting with :!, some special conversion takes place.

keyword meaning
:!string vector of strings
:!boolean vector of logicals
:!int vector of integers
:!double vector of doubles
:!named named vector
:!list partially named list
:!call treat the rest of the vector as callable sequence
:!ct vector of POSIXct classes
:!lt vector of POSIXlt classes

nil in a vector is converted to NA

(-> [:!string 1 nil 3]
    r r->clj)
["1" nil "3"]
(-> [:!boolean 1 true nil false]
    r r->clj)
[true true nil false]
(-> [:!double 1.0 nil 3]
    r r->clj)
[1.0 nil 3.0]
(-> [:!int 1.0 nil 3]
    r r->clj)
[1 nil 3]
(-> [:!named 1 2 :abc 3]
    r r->clj)
[1 2 3]

I think here we should return map maybe?

(-> [:!list :a 1 :b [:!list 1 2 :c ["a" "b"]]]
    r r->clj)
{:a [1], :b {0 [1], 1 [2], :c ["a" "b"]}}
(-> [:!ct #inst "2011-11-01T22:33:11"]
    r r->clj)
[#object[java.time.LocalDateTime 0xec13b60 "2011-11-02T00:33:11"]]
(-> [:!lt #inst "2011-11-01T22:33:11"]
    r r->clj)
[#object[java.time.LocalDateTime 0x1de59d42 "2011-11-02T00:33:11"]]

When a vector is big enough, it is transfered not directly as code, but as the name of a newly created R variable holding the corresponding vector data, converted via the Java conversion layer.

(->code (range 10000))
".MEM$xebcb81eef5cf4400"
(-> (conj (range 10000) :!string)
    r r->clj first)
"0"

Treat vector as callable.

(-> [:!call 'mean [1 2 3 4]]
    r r->clj)
[2.5]

3.3.3 Maps

A Clojue Map is transformed to an R named list. As with vectors, all data elements inside are processed recursively.

(r {:a 1 :b nil})
$a
[1] 1

$b
[1] NA

(-> {:a 1 :b nil :c [2.0 3 4]}
    r r->clj)
{:a [1], :b [nil], :c [2.0 3.0 4.0]}

Bigger maps are transfered to R variables via the Java conversion layer.

(->code (zipmap (map #(str "key" %) (range 100))
                (range 1000 1100)))
".MEM$x39e413404a614806"
(-> (r (zipmap (map #(str "key" %) (range 100))
               (range 1000 1100)))
    r->clj
    :key23)
[1023]

3.3.4 Calls, operators and special symbols

Now we come to the most important part, using sequences to represent function calls. One way to do that is using a list, where the first element is a symbol corresponding to the name of an R function, or an RObject corresponding to an R function. To create a function call we use the same structure as in clojure. The two examples below are are equivalent.

Recall that symbols are converted to R variable names on the R side.

(r "mean(c(1,2,3))")
[1] 2
(r '(mean [1 2 3]))
[1] 2
(->code '(mean [1 2 3]))
"mean(c(1L,2L,3L))"

Here is another example.

(r '(<- x (mean [1 2 3])))
[1] 2
(->> 'x r r->clj)
[2.0]

Here is another example.

Recall that RObjects are converted to the names of the corresponding R objects.

(-> (list (r 'median) [1 2 4])
    ->code)
".MEM$xf90c011edb8d4b6c(c(1L,2L,4L))"
(-> (list (r 'median) [1 2 4])
    r
    r->clj)
[2]

You can call using special names (surrounded by backquote) as strings

(-> '("`^`" 10 2) r r->clj)
[100.0]

There are some special symbols which get a special meaning on,:

symbol meaning
'( ) Wrap first element of the quoted list into parentheses
function R function definition
do join all forms using “;” and wrap into {}
for for loop with multiple bindings
while while loop
if if or if-else
tilde or formula R formula
colon colon (:)
rsymbol qualified and/or backticked symbol wrapper
bra [
brabra [[
bra<- [<-
brabra<- [[<-

Sometimes symbols are represented as string with spaces inside, also can be prepend with package name. Tick ' in clojure is not enough for that, for that purpose you can use rsymbol.

(r/->code '(rsymbol name))
"name"
(r/->code '(rsymbol "name with spaces"))
"`name with spaces`"
(r/->code '(rsymbol package name))
"package::name"
(r/->code '(rsymbol "package with spaces" name))
"`package with spaces`::name"
(-> ((r/rsymbol 'base 'mean) [1 2 3 4])
    r->clj)
[2.5]
(-> ((r/rsymbol "[") 'iris 1) r->clj dataset/mapseq-reader first :Sepal.Length)
5.1
(-> ((r/rsymbol 'base "[") 'iris 1) r->clj dataset/mapseq-reader first :Sepal.Length)
5.1

All bra... functions accept nil or empty-symbol to mark empty selector.

(def m (r '(matrix (colon 1 6)
                   :nrow 2
                   :dimnames [:!list ["a" "b"] (bra LETTERS (colon 1 3))])))
m
  A B C
a 1 3 5
b 2 4 6
(-> '(bra ~m nil 1)
    r r->clj)
[1 2]
(-> '(bra ~m 1 nil)
    r r->clj)
[1 3 5]
(-> '(bra ~m 1 nil :drop false)
    r
    r->clj
    dataset/value-reader)
[["a" 1 3 5]]
(-> '(bra<- ~m 1 nil [11 22 33])
    r
    r->clj
    dataset/value-reader)
[["a" 11 22 33] ["b" 2 4 6]]
(-> '(bra<- ~m nil 1 [22 33])
    r
    r->clj
    dataset/value-reader)
[["a" 22 3 5] ["b" 33 4 6]]
(-> (r/bra m nil 1)
    r->clj)
[1 2]
(-> (r/bra m 1 nil)
    r->clj)
[1 3 5]
(-> (r/bra m 1 nil :drop false)
    r->clj
    dataset/value-reader)
[["a" 1 3 5]]
(-> (r/bra<- m 1 nil [11 22 33])
    r->clj
    dataset/value-reader)
[["a" 11 22 33] ["b" 2 4 6]]
(-> (r/bra<- m nil 1 [22 33])
    r->clj
    dataset/value-reader)
[["a" 22 3 5] ["b" 33 4 6]]
(def l (r [:!list "a" "b" "c"]))
l
[[1]]
[1] "a"

[[2]]
[1] "b"

[[3]]
[1] "c"

(-> '(brabra ~l 2)
    r r->clj)
["b"]
(-> '(brabra<- ~l 2 nil)
    r r->clj)
[["a"] ["c"]]
(-> '(brabra<- ~l 5 "fifth")
    r r->clj)
[["a"] ["b"] ["c"] nil ["fifth"]]
(-> (r/brabra l 2)
    r->clj)
["b"]
(-> (r/brabra<- l 2 nil)
    r->clj)
[["a"] ["c"]]
(-> (r/brabra<- l 5 "fifth")
    r->clj)
[["a"] ["b"] ["c"] nil ["fifth"]]

You can use if with optional else form. Use do to create block of operations

(-> '(if true 11 22)
    r r->clj)
[11]
(-> '(if false 11 22)
    r r->clj)
[22]
(-> '(if true 11)
    r r->clj)
[11]
(-> '(if false 11)
    r r->clj)
nil
(-> '(if true (do (<- x [1 2 3 4])
                  (mean x)))
    r r->clj)
[2.5]

do wraps everything into curly braces {}

(->code '(do (<- x 1)
             (<- x (+ x 1))))
"{x<-1L;x<-(x+1L)}"

Loops

(-> '(do
       (<- v 3)
       (<- coll [v])
       (while (> v 0)
         (<- v (- v 1))
         (<- coll [coll v]))
       coll)
    r
    r->clj)
[3 2 1 0]
(def for-form '(do
                 (<- coll [])
                 (for [a [1 2]
                       b [3 4]]
                   (<- coll [coll (* a b)]))
                 coll))
(->code for-form)
"{coll<-c();for(a in c(1L,2L)){for(b in c(3L,4L)){coll<-c(coll,(a*b))\n}\n};coll}"
(-> for-form r r->clj)
[3 4 6 8]

Sometimes wrapping into parentheses is needed.

(->code '(:!wrap z))
"(z)"
(->code '[:!list 1.0 2.0 3.0 (:!wrap inside)])
"list(1.0,2.0,3.0,(inside))"

3.3.4.1 Function definitions

To define a function, use the function symbol with a following vector of argument names, and then the body. Arguments are treated as a partially named list.

(r '(<- stat (function [x :median false ...]
                       (ifelse median
                               (median x ...)
                               (mean x ...)))))
function (x, median = FALSE, ...) 
{
    ifelse(median, median(x, ...), mean(x, ...))
}
(-> '(stat [100 33 22 44 55])
    r r->clj)
[50.8]
(-> '(stat [100 33 22 44 55] :median true)
    r r->clj)
[44]
(-> '(stat [100 33 22 44 55 nil])
    r r->clj)
[nil]
(-> '(stat [100 33 22 44 55 nil] :na.rm true)
    r r->clj)
[50.8]

3.3.4.2 Formulas

To create an R formula, use tilde or formula with two arguments, for the left and right sides (to skip one, just use nil).

(r '(formula y x))
y ~ x
(r '(formula y (| (+ a b c d) e)))
y ~ a + b + c + d | e
(r '(formula nil (| x y)))
~x | y

3.3.4.3 Operators

(->code '(+ 1 2 3 4 5))
"((((1L+2L)+3L)+4L)+5L)"
(->code '(/ 1 2 3 4 5))
"((((1L/2L)/3L)/4L)/5L)"
(->code '(- [1 2 3]))
"-(c(1L,2L,3L))"
(->code '(<- a b c 123))
"a<-b<-c<-123L"
(->code '($ a b c d))
"a$b$c$d"

3.3.4.4 Unquoting

Sometimes we want to use objects created outside our form (defined earlier or in let). For this case you can use the unqote (~) symbol. There are two options:

  • when using quoting ', unqote evaluates the uquoted form using eval. eval has some constrains, the most important is that local bindings (let bindings) can’t be use.
  • when using syntax quoting (backquote `), unqote acts as in clojure macros – all unquoted forms are evaluated instantly.
(def v (r '(+ 1 2 3 4)))
(-> '(* 22.0 ~v)
    r r->clj)
[220.0]
(let [local-v (r '(+ 1 2 3 4))
      local-list [4 5 6]]
  (-> `(* 22.0 ~local-v ~@local-list)
      r r->clj))
[26400.0]

3.4 Calling R functions

You are not limited to the use code forms. When an RObject correspinds to an R function, it can be used and called as normal Clojure functions.

(def square (r '(function [x] (* x x))))
(-> 123 square r->clj)
[15129]
source: notebooks/clojisr/v1/tutorials/codegen.clj