9  Computation tape

The tape namespace records every la/, t/, and el/ operation as a directed acyclic graph (DAG). This serves two purposes: it powers automatic differentiation (reverse-mode autodiff walks the tape backward), and it helps you inspect what the runtime actually computed — which results share memory, which are lazy, and where copies happen.

Setup

(ns lalinea-book.computation-tape
  (:require [scicloj.lalinea.linalg :as la]
            [scicloj.lalinea.tensor :as t]
            [scicloj.lalinea.elementwise :as el]
            [scicloj.lalinea.tape :as tape]
            [scicloj.kindly.v4.kind :as kind])
  (:import [org.ejml.data DMatrixRMaj]))

Inspecting memory status

tape/memory-status classifies a tensor’s memory backing without needing a tape. Three states:

  • :contiguous — backed by a double[], fast, mutable
  • :strided — shares a double[] but with reordered strides
  • :lazy — no backing array, recomputes on each access
(def A (t/matrix [[1 2] [3 4]]))
(tape/memory-status A)
:contiguous

la/transpose returns a strided view — same backing data, different stride order.

(tape/memory-status (la/transpose A))
:strided

el/+ returns a lazy reader — no allocation, recomputes on every access.

(def B (t/matrix [[5 6] [7 8]]))
(tape/memory-status (el/+ A B))
:lazy

la/mmul goes through EJML and returns a fresh contiguous tensor.

(tape/memory-status (la/mmul A B))
:contiguous

Memory relation

tape/memory-relation classifies the relationship between two tensors:

  • :shared — same backing double[]
  • :independent — separate backing arrays
  • :unknown-lazy — at least one is lazy; relationship indeterminate

A matrix and its transpose share backing memory:

(tape/memory-relation A (la/transpose A))
:shared

Two separate matrices are independent:

(tape/memory-relation A B)
:independent

Two t/column calls on the same double[] share memory:

(def arr (double-array [10 20 30]))
(tape/memory-relation (t/column arr) (t/column arr))
:shared

A lazy result (from el/+) actually reads through to its inputs on every access — but without a tape, dtype-next does not expose that dependency chain:

(tape/memory-relation A (el/+ A B))
:unknown-lazy

With a tape, detect-memory-status gives the complete answer. It knows the inputs of each operation, so it can report :reads-through instead of :unknown-lazy:

(let [tr (tape/with-tape
           (let [M (t/matrix [[1 2] [3 4]])
                 S (el/+ M M)]
             S))]
  (tape/detect-memory-status (last (:entries tr))))
:reads-through

Recording a computation tape

tape/with-tape records all la/, t/, and el/ operations within its scope. It returns {:result ... :entries ...}.

(def tape-result
  (tape/with-tape
    (let [M (t/matrix [[1 2] [3 4]])
          S (el/scale M 2.0)
          I (t/eye 2)
          C (el/+ S I)
          D (la/mmul C (la/transpose M))]
      D)))
(dissoc tape-result :registry)
{:entries
 [{:id "t1",
   :op :t/matrix,
   :inputs [{:external true}],
   :input-tensors [[[1 2] [3 4]]],
   :output #la/R [:float64 [2 2]
       [[1.000 2.000]
        [3.000 4.000]]]
,
   :shape [2 2],
   :complex? false}
  {:id "t2",
   :op :el/scale,
   :inputs [{:id "t1"} {:external true}],
   :input-tensors [#la/R [:float64 [2 2]
       [[1.000 2.000]
        [3.000 4.000]]]
 2.0],
   :output #la/R [:float64 [2 2]
       [[2.000 4.000]
        [6.000 8.000]]]
,
   :shape [2 2],
   :complex? false}
  {:id "t3",
   :op :t/eye,
   :inputs [{:external true}],
   :input-tensors [2],
   :output #la/R [:float64 [2 2]
       [[1.000 0.000]
        [0.000 1.000]]]
,
   :shape [2 2],
   :complex? false}
  {:id "t4",
   :op :el/+,
   :inputs [{:id "t2"} {:id "t3"}],
   :input-tensors [#la/R [:float64 [2 2]
       [[2.000 4.000]
        [6.000 8.000]]]
 #la/R [:float64 [2 2]
       [[1.000 0.000]
        [0.000 1.000]]]
],
   :output #la/R [:float64 [2 2]
       [[3.000 4.000]
        [6.000 9.000]]]
,
   :shape [2 2],
   :complex? false}
  {:id "t5",
   :op :la/transpose,
   :inputs [{:id "t1"}],
   :input-tensors [#la/R [:float64 [2 2]
       [[1.000 2.000]
        [3.000 4.000]]]
],
   :output #la/R [:float64 [2 2]
       [[1.000 3.000]
        [2.000 4.000]]]
,
   :shape [2 2],
   :complex? false}
  {:id "t6",
   :op :la/mmul,
   :inputs [{:id "t4"} {:id "t5"}],
   :input-tensors [#la/R [:float64 [2 2]
       [[3.000 4.000]
        [6.000 9.000]]]
 #la/R [:float64 [2 2]
       [[1.000 3.000]
        [2.000 4.000]]]
],
   :output #la/R [:float64 [2 2]
       [[11.00 25.00]
        [24.00 54.00]]]
,
   :shape [2 2],
   :complex? false}],
 :result #la/R [:float64 [2 2]
       [[11.00 25.00]
        [24.00 54.00]]]
}

The result is a map with :result (the computed value) and :entries (the recorded operations). Each entry tracks the operation, its inputs (linked by ID or marked :external), and the output shape.

External inputs

The tape tracks la/, t/, and el/ operations. Inputs that originate outside La Linea (raw arrays, Clojure data structures, dtype-next operations, EJML objects) appear as {:external true}.

Double arrays

A double[] passed to t/column is external — the tape records t/column but not the array construction.

(def array-tape
  (tape/with-tape
    (let [v (t/column [1 2 3])
          w (el/scale v 5.0)]
      w)))
(mapv (fn [e] (select-keys e [:id :op :inputs]))
      (:entries array-tape))
[{:id "t1", :op :t/column, :inputs [{:external true}]}
 {:id "t2", :op :el/scale, :inputs [{:id "t1"} {:external true}]}]

Clojure vectors and sequences

Clojure data structures are external too. t/matrix wraps them into tensors — the tape records that wrapping.

(def seq-tape
  (tape/with-tape
    (let [M (t/matrix (for [i (range 3)]
                        (for [j (range 3)]
                          (* (inc i) (inc j)))))
          v (t/column (repeat 3 1.0))]
      (la/mmul M v))))
(mapv :op (:entries seq-tape))
[:t/matrix :t/column :la/mmul]

dtype-next operations

Using el/* instead of raw dfn/* means the tape captures the full chain. The t/matrix wrapper around the result is also recorded.

(def mul-tape
  (tape/with-tape
    (let [A (t/matrix [[1 2] [3 4]])
          doubled (el/* A 2.0)
          result (el/+ (t/matrix doubled) A)]
      result)))
(mapv (fn [e] (select-keys e [:id :op :inputs]))
      (:entries mul-tape))
[{:id "t1", :op :t/matrix, :inputs [{:external true}]}
 {:id "t2", :op :el/*, :inputs [{:id "t1"} {:external true}]}
 {:id "t3", :op :t/matrix, :inputs [{:id "t2"}]}
 {:id "t4", :op :el/+, :inputs [{:id "t3"} {:id "t1"}]}]

el/* computes the element-wise product, and t/matrix wraps its result. All four operations (t/matrix, el/*, t/matrix, el/+) are tracked on the tape.

EJML structures

EJML’s DMatrixRMaj can be converted to a tensor via t/dmat->tensor. The conversion itself is not instrumented (it is a low-level interop function), so the resulting tensor is external to the tape.

(def ejml-tape
  (tape/with-tape
    (let [dm (doto (DMatrixRMaj. 2 2)
               (.setData (double-array [1 0 0 1])))
          I  (t/dmat->tensor dm)
          result (el/+ (t/matrix [[5 6] [7 8]]) I)]
      result)))
(mapv (fn [e] (select-keys e [:id :op :inputs]))
      (:entries ejml-tape))
[{:id "t1", :op :t/matrix, :inputs [{:external true}]}
 {:id "t2", :op :el/+, :inputs [{:id "t1"} {:id "x3", :external true}]}]

The tensor from t/dmat->tensor enters the tape as external input to el/+.

Complex operations

Complex constructors are recorded as t/ operations on the tape. The tape shows the full chain: t/matrixt/complex-tensorel/+.

(def complex-tape
  (tape/with-tape
    (let [z1 (t/complex-tensor (t/matrix [[1 0] [0 1]]))
          z2 (t/complex-tensor (t/matrix [[0 1] [1 0]]))
          s  (el/+ z1 z2)]
      s)))
(mapv :op (:entries complex-tape))
[:t/matrix :t/complex-tensor :t/matrix :t/complex-tensor :el/+]

The polymorphic el/+ works for both real and complex inputs. The tape always records as :el/+ regardless of the input type.

Complex inputs produce the same :el/+ tape key:

(mapv :op (:entries (tape/with-tape
                      (el/+ (t/complex-tensor [1 2])
                              (t/complex-tensor [3 4])))))
[:t/complex-tensor :t/complex-tensor :el/+]

Tape summary

tape/summary aggregates statistics about the tape: operation counts and memory status breakdown.

(tape/summary tape-result)
{:total 6,
 :by-op
 {:t/matrix 1,
  :el/scale 1,
  :t/eye 1,
  :el/+ 1,
  :la/transpose 1,
  :la/mmul 1},
 :by-memory {:independent 2, :reads-through 3, :shared 1}}

The summary shows:

  • :by-op — one of each operation
  • :by-memory — which operations read through, share memory, or are independent

Origin DAG

tape/origin walks the tape backwards from a value to reconstruct its computation DAG.

(tape/origin tape-result (:result tape-result))
{:id "t6",
 :op :la/mmul,
 :shape [2 2],
 :inputs
 [{:id "t4",
   :op :el/+,
   :shape [2 2],
   :inputs
   [{:id "t2",
     :op :el/scale,
     :shape [2 2],
     :inputs
     [{:id "t1",
       :op :t/matrix,
       :shape [2 2],
       :inputs [{:external true}]}
      {:external true}]}
    {:id "t3", :op :t/eye, :shape [2 2], :inputs [{:external true}]}]}
  {:id "t5", :op :la/transpose, :shape [2 2], :inputs [{:ref "t1"}]}]}

The DAG shows that mmul depends on add and transpose, which both trace back to the original matrix. The shared reference to matrix appears as {:ref "t1"} in the transpose branch — a proper DAG, not a tree.

Mermaid visualization

tape/mermaid renders the DAG as a Mermaid flowchart. Memory status annotations:

  • ~ reads-through (lazy, recomputes on access from inputs)
  • = shared (same backing array as an input)
  • + independent (fresh allocation)
(tape/mermaid tape-result (:result tape-result))
flowchart TD t6["mmul [2 2] +"] t4["+ [2 2] ~"] t2["scale [2 2] ~"] t1["matrix [2 2] +"] ext4[/"external"/] ext4 --> t1 t1 --> t2 ext7[/"external"/] ext7 --> t2 t2 --> t4 t3["eye [2 2] ~"] ext11[/"external"/] ext11 --> t3 t3 --> t4 t4 --> t6 t5["transpose [2 2] ="] t1 --> t5 t5 --> t6

Practical example: tracing a pipeline

A common question: “after this chain of operations, what is materialized and what is lazy?”

(def pipeline-result
  (tape/with-tape
    (let [data (t/matrix [[1 0 2]
                          [0 3 0]
                          [4 0 5]])
          centered (el/- data (el/scale (t/matrix [[1 1 1]
                                                     [1 1 1]
                                                     [1 1 1]])
                                          (/ (double (la/trace data)) 3.0)))
          {:keys [U S Vt]} (la/svd centered)
          projection (la/mmul (la/transpose Vt) (t/column [1 0 0]))]
      projection)))
(tape/summary pipeline-result)
{:total 9,
 :by-op
 {:t/matrix 2,
  :la/trace 1,
  :el/scale 1,
  :el/- 1,
  :la/svd 1,
  :la/transpose 1,
  :t/column 1,
  :la/mmul 1},
 :by-memory {:independent 4, :reads-through 5}}

The summary reveals the mix of lazy intermediate operations and EJML-backed materializations.

(tape/mermaid pipeline-result (:result pipeline-result))
flowchart TD t10["mmul [3 1] +"] t7["transpose [3 3] +"] x8-ext[/"external"/] x8-ext --> t7 t7 --> t10 t9["column [3 1] ~"] ext6[/"external"/] ext6 --> t9 t9 --> t10
source: notebooks/lalinea_book/computation_tape.clj