This is part of the Scicloj Clojure Data Scrapbook. |
Reading HDF files
Original discussion at the Clojurians Zulip chat: #data-science > import hdf files.
ns index
(:require [babashka.fs :as fs]
(:as tensor]
[tech.v3.tensor :as io]
[clojure.java.io :as string]
[clojure.string :as vis.image]
[scicloj.noj.v1.vis.image :as fun])
[tech.v3.datatype.functional :import io.jhdf.HdfFile
( java.io.File))
*warn-on-reflection* true) (set!
true
We will use the followig function to read an HDF file using jHDF, and convert it to dtype-next tensors.
defn hdf5->tensors [path]
(let [file ^File (io/file path)
(
hdf-file ^HdfFile (HdfFile. file)children ^java.util.Map (.getChildren hdf-file)]
->> children
(keys
mapv (fn [key]
(let [child ^io.jhdf.dataset.ContiguousDatasetImpl (.get children key)
(keyword (first (string/split key
knew (#" ")))]
:key key
{:data (-> child
.getData tensor/->tensor)}))))))
Let us apply the function to a test file:
def tensors
("data/test.h5")) (hdf5->tensors
count tensors) (
20
take 3 tensors) (
:key "0", :data #tech.v3.tensor<int32>[600 800]
({112 110 111 ... 114 118 124]
[[105 112 106 ... 115 120 119]
[107 109 108 ... 117 123 115]
[
...132 138 134 ... 109 107 109]
[130 129 129 ... 115 111 103]
[134 135 134 ... 110 111 110]]}
[:key "1", :data #tech.v3.tensor<int32>[600 800]
{124 120 121 ... 117 121 124]
[[118 123 117 ... 118 122 121]
[120 123 117 ... 119 123 118]
[
...133 139 136 ... 111 106 108]
[133 132 131 ... 116 111 103]
[136 137 135 ... 114 111 108]]}
[:key "10", :data #tech.v3.tensor<int32>[600 800]
{125 120 120 ... 116 121 127]
[[116 122 116 ... 117 121 119]
[120 121 115 ... 120 123 118]
[
...132 139 133 ... 112 107 108]
[135 132 132 ... 115 112 104]
[134 135 135 ... 114 112 108]]}) [
Let us visualize a few of the tensors as images:
->> tensors
(take 3)
(mapcat (fn [row]
(
[row-> row
(:data
200)
(fun/*
(vis.image/tensor->image:ushort-gray))])))
(
{:key "0", :data #tech.v3.tensor<int32>[600 800]
[[112 110 111 ... 114 118 124]
[105 112 106 ... 115 120 119]
[107 109 108 ... 117 123 115]
...
[132 138 134 ... 109 107 109]
[130 129 129 ... 115 111 103]
[134 135 134 ... 110 111 110]]}
{:key "1", :data #tech.v3.tensor<int32>[600 800]
[[124 120 121 ... 117 121 124]
[118 123 117 ... 118 122 121]
[120 123 117 ... 119 123 118]
...
[133 139 136 ... 111 106 108]
[133 132 131 ... 116 111 103]
[136 137 135 ... 114 111 108]]}
{:key "10", :data #tech.v3.tensor<int32>[600 800]
[[125 120 120 ... 116 121 127]
[116 122 116 ... 117 121 119]
[120 121 115 ... 120 123 118]
...
[132 139 133 ... 112 107 108]
[135 132 132 ... 115 112 104]
[134 135 135 ... 114 112 108]]}
)