3 R code generation from Clojure forms
ns clojisr.v1.tutorials.codegen
(:require [clojisr.v1.r :as r :refer [r ->code r->clj]]
(:as dataset]
[tech.v3.dataset :as kind]
[scicloj.kindly.v4.kind :as kindly])) [scicloj.kindly.v4.api
R code in clojisr
library can be represented in three main ways:
- as string containing R code or script
- as RObject
- as Clojure form
RObject is clojisr
data structure which keeps reference to R objects. Also can act as a function when referenced object is R function. RObject is returned always when R code is executed.
Let’s see what is possible in detail.
First, require the necessary namespaces.
Also, let us make sure we are using a clean session.
:rserve) (r/set-default-session-type!
:session-type :rserve} {
(r/discard-all-sessions)
{}
3.1 R code as a string
To run any R code as string or Clojure form we use clojisr.v1.r/r
function
"mean(rnorm(100000,mean=1.0,sd=3.0))") (r
1] 0.995323
[
"abc <- runif(1000);
(r f <- function(x) {mean(log(x))};
f(abc)")
1] -1.014055
[
As mentioned above, every r
call creates RObject and R variable which keeps result of the execution.
def result (r "rnorm(10)")) (
class result) (
clojisr.v1.robject.RObject
:object-name result) (
".MEM$xeaf880945946451b"
Let’s use the var name string to see what it represents.
:object-name result)) (r (
1] -0.98451042 -0.01969949 0.33094814 -0.93603276 -1.26566080 -1.38561694
[7] 0.95174281 2.05033309 -0.95599948 -0.81838708
[
Now let us move to discussing the ROBject data type.
3.2 RObject
Every RObject acts as Clojure reference to an R variable. All these variables are held in an R environment called .MEM
. An RObject can represent anything and can be used for further evaluation, even acting as a function if it corresponds to an R function. Here are some examples:
An r-object holding some R data:
def dataset (r "nhtemp")) (
An r-object holding an R function:
def function (r "mean")) (
Printing the data:
dataset
Time Series:= 1912
Start = 1971
End = 1
Frequency 1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4
[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7
[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9
[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0
[
Equivalently:
(r dataset)
Time Series:= 1912
Start = 1971
End = 1
Frequency 1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4
[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7
[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9
[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0
[
We use r->clj
to transfer data from R to Clojure (converting an R object to Clojure data):
-> (r->clj dataset)
(0)
(dataset/select-rows (dataset/mapseq-reader))
1912.0, :$series 49.9}] [{:$time
Creating an R object, applying the function to it, and conveting to Clojure data (in this pipeline, both function
and r
return an RObject):
-> "c(1,2,3,4,5,6)"
(
r
function r->clj)
3.5] [
3.3 Clojure forms
Calling R with the code as a string is quite limited. You can’t easily inject Clojure data into the code. Also, editor support is very limited for this way of writing. So we enable the use of Clojure forms as a DSL to simplify the construnction of R code.
In generating R code from Clojure forms, clojisr
operates on both the var and the symbol level, and can also digest primitive types and basic data structures. There are some special symbols which help in creating R formulas and defining R functions. We will go through all of these in detail.
The ->code
function is responsible for turning Clojure forms into R code.
1 2 4]) (->code [
"c(1L,2L,4L)"
When the r
function gets an argument that is not a string, it uses ->code
behind the scenes to turn that argument into code as a string.
1 2 4]) (r [
1] 1 2 4
[
-> [1 2 4]
(
r r->clj)
1 2 4] [
Equivalently:
-> [1 2 4]
(
->code
r r->clj)
1 2 4] [
3.3.1 Primitive data types
-> 1 r r->clj) (
1] [
-> 2.0 r r->clj) (
2.0] [
-> 3/4 r r->clj) (
0.75] [
-> true r r->clj) (
true] [
-> false r r->clj) (
false] [
nil
is converted to NULL or NA (in vectors or maps)
-> nil r r->clj) (
nil
nil) (->code
"NULL"
Infinities etc.
-> ##Inf r r->clj) (
##Inf] [
->> ##-Inf r r->clj) (
##-Inf] [
->> ##NaN r r->clj first) (
##NaN
When you pass a string to r
, it is treated as code. So we have to escape double quotes if we actually mean to represent an R string (or an R character object, as it is called in R). However, when string is used inside a more complex form, it is escaped automatically.
"\"this is a string\"") (->code
"\"\"this is a string\"\""
-> "\"this is a string\"" r r->clj) (
"this is a string"] [
"this is a string")) (->code '(paste
"paste(\"this is a string\")"
-> '(paste "this is a string") r r->clj) (
"this is a string"] [
Any Named
Clojure object that is not a String (like a keyword or a symbol) is converted to a R symbol.
:keyword) (->code
"keyword"
'symb) (->code
"symb"
An RObject is converted to a R variable.
"1+2")) (->code (r
".MEM$x7ae96b8eb5c44428"
Date/time is converted to a string.
"2031-02-03T11:22:33") (->code #inst
"'2031-02-03 13:22:33'"
"2031-02-03T11:22:33") (r #inst
1] "2031-02-03 13:22:33"
[
-> #inst "2031-02-03T11:22:33"
(
r r->clj)
"2031-02-03 13:22:33"] [
3.3.2 Vectors
A Clojure vector is converted to an R vector created using the c
function. That means that nested vectors are flattened. All the values inside are translated to R recursively.
1 2 3]) (->code [
"c(1L,2L,3L)"
-> [[1] [2 [3]]] r r->clj) (
1 2 3] [
Some Clojure sequences are interpreted as function calls, if it makes sense for their first element. However, sequences beginning with numbers or strings are treated as vectors.
range 11)) (r (
1] 0 1 2 3 4 5 6 7 8 9 10
[
map str (range 11))) (r (
1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
[
3.3.2.1 Tagged vectors
When the first element of a vector or a sequence is a keyword starting with :!
, some special conversion takes place.
keyword | meaning |
---|---|
:!string |
vector of strings |
:!boolean |
vector of logicals |
:!int |
vector of integers |
:!double |
vector of doubles |
:!named |
named vector |
:!list |
partially named list |
:!call |
treat the rest of the vector as callable sequence |
:!ct |
vector of POSIXct classes |
:!lt |
vector of POSIXlt classes |
nil
in a vector is converted to NA
-> [:!string 1 nil 3]
( r r->clj)
"1" nil "3"] [
-> [:!boolean 1 true nil false]
( r r->clj)
true true nil false] [
-> [:!double 1.0 nil 3]
( r r->clj)
1.0 nil 3.0] [
-> [:!int 1.0 nil 3]
( r r->clj)
1 nil 3] [
-> [:!named 1 2 :abc 3]
( r r->clj)
1 2 3] [
I think here we should return map maybe?
-> [:!list :a 1 :b [:!list 1 2 :c ["a" "b"]]]
( r r->clj)
:a [1], :b {0 [1], 1 [2], :c ["a" "b"]}} {
-> [:!ct #inst "2011-11-01T22:33:11"]
( r r->clj)
time.LocalDateTime 0xec13b60 "2011-11-02T00:33:11"]] [#object[java.
-> [:!lt #inst "2011-11-01T22:33:11"]
( r r->clj)
time.LocalDateTime 0x1de59d42 "2011-11-02T00:33:11"]] [#object[java.
When a vector is big enough, it is transfered not directly as code, but as the name of a newly created R variable holding the corresponding vector data, converted via the Java conversion layer.
range 10000)) (->code (
".MEM$xebcb81eef5cf4400"
-> (conj (range 10000) :!string)
(first) r r->clj
"0"
Treat vector as callable.
-> [:!call 'mean [1 2 3 4]]
( r r->clj)
2.5] [
3.3.3 Maps
A Clojue Map is transformed to an R named list. As with vectors, all data elements inside are processed recursively.
:a 1 :b nil}) (r {
$a1] 1
[
$b1] NA
[
-> {:a 1 :b nil :c [2.0 3 4]}
( r r->clj)
:a [1], :b [nil], :c [2.0 3.0 4.0]} {
Bigger maps are transfered to R variables via the Java conversion layer.
zipmap (map #(str "key" %) (range 100))
(->code (range 1000 1100))) (
".MEM$x39e413404a614806"
-> (r (zipmap (map #(str "key" %) (range 100))
(range 1000 1100)))
(
r->clj:key23)
1023] [
3.3.4 Calls, operators and special symbols
Now we come to the most important part, using sequences to represent function calls. One way to do that is using a list, where the first element is a symbol corresponding to the name of an R function, or an RObject corresponding to an R function. To create a function call we use the same structure as in clojure. The two examples below are are equivalent.
Recall that symbols are converted to R variable names on the R side.
"mean(c(1,2,3))") (r
1] 2
[
1 2 3])) (r '(mean [
1] 2
[
1 2 3])) (->code '(mean [
"mean(c(1L,2L,3L))"
Here is another example.
1 2 3]))) (r '(<- x (mean [
1] 2
[
->> 'x r r->clj) (
2.0] [
Here is another example.
Recall that RObjects are converted to the names of the corresponding R objects.
-> (list (r 'median) [1 2 4])
( ->code)
".MEM$xf90c011edb8d4b6c(c(1L,2L,4L))"
-> (list (r 'median) [1 2 4])
(
r r->clj)
2] [
You can call using special names (surrounded by backquote) as strings
-> '("`^`" 10 2) r r->clj) (
100.0] [
There are some special symbols which get a special meaning on,:
symbol | meaning |
---|---|
'( ) |
Wrap first element of the quoted list into parentheses |
function |
R function definition |
do |
join all forms using “;” and wrap into {} |
for |
for loop with multiple bindings |
while |
while loop |
if |
if or if-else |
tilde or formula |
R formula |
colon |
colon (: ) |
rsymbol |
qualified and/or backticked symbol wrapper |
bra |
[ |
brabra |
[[ |
bra<- |
[<- |
brabra<- |
[[<- |
Sometimes symbols are represented as string with spaces inside, also can be prepend with package name. Tick '
in clojure is not enough for that, for that purpose you can use rsymbol
.
name)) (r/->code '(rsymbol
"name"
"name with spaces")) (r/->code '(rsymbol
"`name with spaces`"
name)) (r/->code '(rsymbol package
"package::name"
"package with spaces" name)) (r/->code '(rsymbol
"`package with spaces`::name"
-> ((r/rsymbol 'base 'mean) [1 2 3 4])
( r->clj)
2.5] [
-> ((r/rsymbol "[") 'iris 1) r->clj dataset/mapseq-reader first :Sepal.Length) (
5.1
-> ((r/rsymbol 'base "[") 'iris 1) r->clj dataset/mapseq-reader first :Sepal.Length) (
5.1
All bra...
functions accept nil
or empty-symbol
to mark empty selector.
def m (r '(matrix (colon 1 6)
(:nrow 2
:dimnames [:!list ["a" "b"] (bra LETTERS (colon 1 3))])))
m
A B C1 3 5
a 2 4 6
b
-> '(bra ~m nil 1)
( r r->clj)
1 2] [
-> '(bra ~m 1 nil)
( r r->clj)
1 3 5] [
-> '(bra ~m 1 nil :drop false)
(
r
r->clj dataset/value-reader)
"a" 1 3 5]] [[
-> '(bra<- ~m 1 nil [11 22 33])
(
r
r->clj dataset/value-reader)
"a" 11 22 33] ["b" 2 4 6]] [[
-> '(bra<- ~m nil 1 [22 33])
(
r
r->clj dataset/value-reader)
"a" 22 3 5] ["b" 33 4 6]] [[
-> (r/bra m nil 1)
( r->clj)
1 2] [
-> (r/bra m 1 nil)
( r->clj)
1 3 5] [
-> (r/bra m 1 nil :drop false)
(
r->clj dataset/value-reader)
"a" 1 3 5]] [[
-> (r/bra<- m 1 nil [11 22 33])
(
r->clj dataset/value-reader)
"a" 11 22 33] ["b" 2 4 6]] [[
-> (r/bra<- m nil 1 [22 33])
(
r->clj dataset/value-reader)
"a" 22 3 5] ["b" 33 4 6]] [[
def l (r [:!list "a" "b" "c"])) (
l
1]]
[[1] "a"
[
2]]
[[1] "b"
[
3]]
[[1] "c"
[
-> '(brabra ~l 2)
( r r->clj)
"b"] [
-> '(brabra<- ~l 2 nil)
( r r->clj)
"a"] ["c"]] [[
-> '(brabra<- ~l 5 "fifth")
( r r->clj)
"a"] ["b"] ["c"] nil ["fifth"]] [[
-> (r/brabra l 2)
( r->clj)
"b"] [
-> (r/brabra<- l 2 nil)
( r->clj)
"a"] ["c"]] [[
-> (r/brabra<- l 5 "fifth")
( r->clj)
"a"] ["b"] ["c"] nil ["fifth"]] [[
You can use if
with optional else
form. Use do
to create block of operations
-> '(if true 11 22)
( r r->clj)
11] [
-> '(if false 11 22)
( r r->clj)
22] [
-> '(if true 11)
( r r->clj)
11] [
-> '(if false 11)
( r r->clj)
nil
-> '(if true (do (<- x [1 2 3 4])
(
(mean x))) r r->clj)
2.5] [
do
wraps everything into curly braces {}
do (<- x 1)
(->code '(+ x 1)))) (<- x (
"{x<-1L;x<-(x+1L)}"
Loops
-> '(do
(3)
(<- v
(<- coll [v])while (> v 0)
(- v 1))
(<- v (
(<- coll [coll v]))
coll)
r r->clj)
3 2 1 0] [
def for-form '(do
(
(<- coll [])for [a [1 2]
(3 4]]
b [* a b)]))
(<- coll [coll ( coll))
(->code for-form)
"{coll<-c();for(a in c(1L,2L)){for(b in c(3L,4L)){coll<-c(coll,(a*b))\n}\n};coll}"
-> for-form r r->clj) (
3 4 6 8] [
Sometimes wrapping into parentheses is needed.
(->code '(:!wrap z))
"(z)"
1.0 2.0 3.0 (:!wrap inside)]) (->code '[:!list
"list(1.0,2.0,3.0,(inside))"
3.3.4.1 Function definitions
To define a function, use the function
symbol with a following vector of argument names, and then the body. Arguments are treated as a partially named list.
:median false ...]
(r '(<- stat (function [x
(ifelse median
(median x ...) (mean x ...)))))
= FALSE, ...)
function (x, median
{
ifelse(median, median(x, ...), mean(x, ...))
}
-> '(stat [100 33 22 44 55])
( r r->clj)
50.8] [
-> '(stat [100 33 22 44 55] :median true)
( r r->clj)
44] [
-> '(stat [100 33 22 44 55 nil])
( r r->clj)
nil] [
-> '(stat [100 33 22 44 55 nil] :na.rm true)
( r r->clj)
50.8] [
3.3.4.2 Formulas
To create an R formula, use tilde
or formula
with two arguments, for the left and right sides (to skip one, just use nil
).
(r '(formula y x))
y ~ x
+ a b c d) e))) (r '(formula y (| (
+ b + c + d | e
y ~ a
nil (| x y))) (r '(formula
~x | y
3.3.4.3 Operators
+ 1 2 3 4 5)) (->code '(
"((((1L+2L)+3L)+4L)+5L)"
/ 1 2 3 4 5)) (->code '(
"((((1L/2L)/3L)/4L)/5L)"
- [1 2 3])) (->code '(
"-(c(1L,2L,3L))"
123)) (->code '(<- a b c
"a<-b<-c<-123L"
(->code '($ a b c d))
"a$b$c$d"
3.3.4.4 Unquoting
Sometimes we want to use objects created outside our form (defined earlier or in let
). For this case you can use the unqote
(~
) symbol. There are two options:
- when using quoting
'
, unqote evaluates the uquoted form usingeval
.eval
has some constrains, the most important is that local bindings (let
bindings) can’t be use. - when using syntax quoting (backquote `), unqote acts as in clojure macros – all unquoted forms are evaluated instantly.
def v (r '(+ 1 2 3 4))) (
-> '(* 22.0 ~v)
( r r->clj)
220.0] [
let [local-v (r '(+ 1 2 3 4))
(4 5 6]]
local-list [-> `(* 22.0 ~local-v ~@local-list)
( r r->clj))
26400.0] [
3.4 Calling R functions
You are not limited to the use code forms. When an RObject correspinds to an R function, it can be used and called as normal Clojure functions.
def square (r '(function [x] (* x x)))) (
-> 123 square r->clj) (
15129] [