SciCloj logo
This is part of the Scicloj Clojure Data Scrapbook.

Reading HDF files

Original discussion at the Clojurians Zulip chat: #data-science > import hdf files.

(ns index
  (:require [babashka.fs :as fs]
            [tech.v3.tensor :as tensor]
            [clojure.java.io :as io]
            [clojure.string :as string]
            [scicloj.noj.v1.vis.image :as vis.image]
            [tech.v3.datatype.functional :as fun])
  (:import io.jhdf.HdfFile
           java.io.File))
(set! *warn-on-reflection* true)
true

We will use the followig function to read an HDF file using jHDF, and convert it to dtype-next tensors.

(defn hdf5->tensors [path]
  (let [file ^File (io/file path)
        hdf-file ^HdfFile (HdfFile. file)
        children ^java.util.Map (.getChildren hdf-file)]
    (->> children
         keys
         (mapv (fn [key]
                 (let [child ^io.jhdf.dataset.ContiguousDatasetImpl (.get children key)
                       knew (keyword (first (string/split key
                                                          #" ")))]
                   {:key key
                    :data (-> child
                              .getData
                              tensor/->tensor)}))))))

Let us apply the function to a test file:

(def tensors
  (hdf5->tensors "data/test.h5"))
(count tensors)
20
(take 3 tensors)
({:key "0", :data #tech.v3.tensor<int32>[600 800]
[[112 110 111 ... 114 118 124]
 [105 112 106 ... 115 120 119]
 [107 109 108 ... 117 123 115]
 ...
 [132 138 134 ... 109 107 109]
 [130 129 129 ... 115 111 103]
 [134 135 134 ... 110 111 110]]}
 {:key "1", :data #tech.v3.tensor<int32>[600 800]
[[124 120 121 ... 117 121 124]
 [118 123 117 ... 118 122 121]
 [120 123 117 ... 119 123 118]
 ...
 [133 139 136 ... 111 106 108]
 [133 132 131 ... 116 111 103]
 [136 137 135 ... 114 111 108]]}
 {:key "10", :data #tech.v3.tensor<int32>[600 800]
[[125 120 120 ... 116 121 127]
 [116 122 116 ... 117 121 119]
 [120 121 115 ... 120 123 118]
 ...
 [132 139 133 ... 112 107 108]
 [135 132 132 ... 115 112 104]
 [134 135 135 ... 114 112 108]]})

Let us visualize a few of the tensors as images:

(->> tensors
     (take 3)
     (mapcat (fn [row]
               [row
                (-> row
                    :data
                    (fun/* 200)
                    (vis.image/tensor->image
                     :ushort-gray))])))

(

{:key "0", :data #tech.v3.tensor<int32>[600 800]
[[112 110 111 ... 114 118 124]
 [105 112 106 ... 115 120 119]
 [107 109 108 ... 117 123 115]
 ...
 [132 138 134 ... 109 107 109]
 [130 129 129 ... 115 111 103]
 [134 135 134 ... 110 111 110]]}
{:key "1", :data #tech.v3.tensor<int32>[600 800]
[[124 120 121 ... 117 121 124]
 [118 123 117 ... 118 122 121]
 [120 123 117 ... 119 123 118]
 ...
 [133 139 136 ... 111 106 108]
 [133 132 131 ... 116 111 103]
 [136 137 135 ... 114 111 108]]}
{:key "10", :data #tech.v3.tensor<int32>[600 800]
[[125 120 120 ... 116 121 127]
 [116 122 116 ... 117 121 119]
 [120 121 115 ... 120 123 118]
 ...
 [132 139 133 ... 112 107 108]
 [135 132 132 ... 115 112 104]
 [134 135 135 ... 114 112 108]]}

)

source: projects/data-formats/hdf/notebooks/index.clj