8 Data manipulation
ns chapter-3-data-manipulation.3-data-manipulation
(;; {:nextjournal.clerk/visibility {:code :hide}
;; :nextjournal.clerk/toc true}
:require [tablecloth.api :as tc]
(:as fun]
[tech.v3.datatype.functional :as tdsc]
[tech.v3.dataset.column :as rolling]
[tech.v3.datatype.rolling :as str]
[clojure.string :as stats]
[fastmath.stats :as kind-clerk])) [scicloj.kind-clerk.api
(kind-clerk/setup!)
:ok
This is a work in progress of the code examples that will make up chapter 3 of the Clojure data cookbook
Once data is loaded and ready to work with, here’s how to do some of the most common data manipulation tasks.
8.1 Sorting
def dataset (tc/dataset [{:country "Canada"
(:size 10000000}
:country "USA"
{:size 9000000}
:country "Germany"
{:size 80000}]))
8.1.1 Sorting columns
Give the column headers in the order you want
-> dataset
(:country :size])) (tc/reorder-columns [
_unnamed [3 2]:
:country | :size |
---|---|
Canada | 10000000 |
USA | 9000000 |
Germany | 80000 |
8.1.2 Sorting rows
-> dataset
(:size] [:desc])) (tc/order-by [
_unnamed [3 2]:
:country | :size |
---|---|
Canada | 10000000 |
USA | 9000000 |
Germany | 80000 |
8.1.3 Custom sorting functions
e.g. length of the country name
-> dataset
(fn [row] (-> row :country count))
(tc/order-by (:desc))
_unnamed [3 2]:
:country | :size |
---|---|
Germany | 80000 |
Canada | 10000000 |
USA | 9000000 |
8.2 Selecting one column or multiple columns
-> dataset
(:country])) (tc/select-columns [
_unnamed [3 1]:
:country |
---|
Canada |
USA |
Germany |
8.3 Randomizing order
-> dataset
( tc/shuffle)
_unnamed [3 2]:
:country | :size |
---|---|
USA | 9000000 |
Canada | 10000000 |
Germany | 80000 |
8.4 Repeatable randomisation
-> dataset
(:seed 100})) (tc/shuffle {
_unnamed [3 2]:
:country | :size |
---|---|
Canada | 10000000 |
Germany | 80000 |
USA | 9000000 |
Finding unique rows
def dupes (tc/dataset [{:country "Canada"
(:size 10000000}
:country "Canada"
{:size 10000303}
:country "United states"
{:size 9000000}
:country "United States"
{:size 9000000}
:country "Germany"
{:size 80000}]))
(def “USA” #{“USA” “United States” “United states of America”}) https://scicloj.github.io/tablecloth/index.html#Unique
-> dupes
( tc/unique-by)
_unnamed [5 2]:
:country | :size |
---|---|
Canada | 10000000 |
Canada | 10000303 |
United states | 9000000 |
United States | 9000000 |
Germany | 80000 |
-> dupes
(:size)) (tc/unique-by
_unnamed [4 2]:
:country | :size |
---|---|
Canada | 10000000 |
Canada | 10000303 |
United states | 9000000 |
Germany | 80000 |
-> dupes
(:country)) (tc/unique-by
_unnamed [4 2]:
:country | :size |
---|---|
Canada | 10000000 |
United states | 9000000 |
United States | 9000000 |
Germany | 80000 |
-> dupes
(-> % :country str/lower-case))) (tc/unique-by #(
_unnamed [3 2]:
:country | :size |
---|---|
Canada | 10000000 |
United states | 9000000 |
Germany | 80000 |
-> dupes
(-> % :country str/lower-case)
(tc/unique-by #(:strategy (fn [vals]
{case (tdsc/column-name vals)
(:size (apply max vals)
:country (last vals)))}))
_unnamed [3 2]:
:country | :size |
---|---|
Canada | 10000303 |
United States | 9000000 |
Germany | 80000 |
could use this to rename vals to a canonical one (e.g. convert everything that matches set of USA to “USA”) Adding computed columns to data “lengthening” or “widening” data, making it “tidy” e.g. converting a column with numbers to a category (>5 “yes”, <5 “no”), summing multiple columns into a new one
-> dataset
(:area [9000000 8000000 1000000])) (tc/add-column
_unnamed [3 3]:
:country | :size | :area |
---|---|---|
Canada | 10000000 | 9000000 |
USA | 9000000 | 8000000 |
Germany | 80000 | 1000000 |
-> dataset
(:population [40000000 100000000 80000000])
(tc/add-column :size :area})
(tc/rename-columns {:population :double)
(tc/convert-types :density (fn [d]
(tc/add-column :population d) (:area d))))) (fun// (
_unnamed [3 4]:
:country | :area | :population | :density |
---|---|---|---|
Canada | 10000000 | 4.0e07 | 4.00000000 |
USA | 9000000 | 1.0e08 | 11.11111111 |
Germany | 80000 | 8.0e07 | 1000.00000000 |
vs, probably preferable
-> dataset
(:population [40000000 100000000 80000000])
(tc/add-column :size :area})
(tc/rename-columns {:density (fn [ds]
(tc/add-column 1.0 (:population ds)) (:area ds))))) (fun// (fun/*
_unnamed [3 4]:
:country | :area | :population | :density |
---|---|---|---|
Canada | 10000000 | 40000000 | 4.00000000 |
USA | 9000000 | 100000000 | 11.11111111 |
Germany | 80000 | 80000000 | 1000.00000000 |
- Removing columns
-> dataset
(:size)) (tc/drop-columns
_unnamed [3 1]:
:country |
---|
Canada |
USA |
Germany |
- Transforming values
- Working with nested data structures, really nice libraries in Clojure for doing this (specter, meander)
- All values in a column
- Conditional transformation (e.g. “truncate only 11 digit phone numbers to 10 digits”)
- Rearranging order of columns
- Renaming columns
- Filtering rows
- Single filter, multiple filters
-> dataset
(fn [row]
(tc/select-rows (< 1000000 (:size row))))) (
_unnamed [2 2]:
:country | :size |
---|---|
Canada | 10000000 |
USA | 9000000 |
- Aggregating rows (counts, groups)
def co2-over-time (tc/dataset "data/co2_over_time.csv")) (
-> co2-over-time
(:average-co2 (fn [ds]
(tc/aggregate {/ (reduce + (get ds "CO2"))
(count (get ds "CO2"))))})) (
_unnamed [1 1]:
:average-co2 |
---|
355.31093117 |
Add a column for year
-> co2-over-time
("Year" "Date" (memfn getYear))) (tc/map-columns
data/co2_over_time.csv [741 4]:
Date | CO2 | adjusted CO2 | Year |
---|---|---|---|
1958-03-01 | 315.70 | 314.44 | 1958 |
1958-04-01 | 317.46 | 315.16 | 1958 |
1958-05-01 | 317.51 | 314.71 | 1958 |
1958-07-01 | 315.86 | 315.19 | 1958 |
1958-08-01 | 314.93 | 316.19 | 1958 |
1958-09-01 | 313.21 | 316.08 | 1958 |
1958-11-01 | 313.33 | 315.20 | 1958 |
1958-12-01 | 314.67 | 315.43 | 1958 |
1959-01-01 | 315.58 | 315.54 | 1959 |
1959-02-01 | 316.49 | 315.86 | 1959 |
… | … | … | … |
2019-06-01 | 413.96 | 411.38 | 2019 |
2019-07-01 | 411.85 | 411.03 | 2019 |
2019-08-01 | 410.08 | 411.62 | 2019 |
2019-09-01 | 408.55 | 412.06 | 2019 |
2019-10-01 | 408.43 | 412.06 | 2019 |
2019-11-01 | 410.29 | 412.56 | 2019 |
2019-12-01 | 411.85 | 412.78 | 2019 |
2020-01-01 | 413.37 | 413.32 | 2020 |
2020-02-01 | 414.09 | 413.33 | 2020 |
2020-03-01 | 414.51 | 412.94 | 2020 |
2020-04-01 | 416.18 | 413.35 | 2020 |
Group by year
-> co2-over-time
(fn [row]
(tc/group-by (get row "Date"))))) (.getYear (
_unnamed [63 3]:
:name | :group-id | :data |
---|---|---|
1958 | 0 | Group: 1958 [8 3]: |
1959 | 1 | Group: 1959 [12 3]: |
1960 | 2 | Group: 1960 [12 3]: |
1961 | 3 | Group: 1961 [12 3]: |
1962 | 4 | Group: 1962 [12 3]: |
1963 | 5 | Group: 1963 [12 3]: |
1964 | 6 | Group: 1964 [9 3]: |
1965 | 7 | Group: 1965 [12 3]: |
1966 | 8 | Group: 1966 [12 3]: |
1967 | 9 | Group: 1967 [12 3]: |
… | … | … |
2010 | 52 | Group: 2010 [12 3]: |
2011 | 53 | Group: 2011 [12 3]: |
2012 | 54 | Group: 2012 [12 3]: |
2013 | 55 | Group: 2013 [12 3]: |
2014 | 56 | Group: 2014 [12 3]: |
2015 | 57 | Group: 2015 [12 3]: |
2016 | 58 | Group: 2016 [12 3]: |
2017 | 59 | Group: 2017 [12 3]: |
2018 | 60 | Group: 2018 [12 3]: |
2019 | 61 | Group: 2019 [12 3]: |
2020 | 62 | Group: 2020 [4 3]: |
Get average temp per year tablecloth applies the aggregate fn to every groups dataset
defn round2
("Round a double to the given precision (number of significant digits)"
[precision d]let [factor (Math/pow 10 precision)]
(/ (Math/round (* d factor)) factor))) (
-> co2-over-time
(fn [row]
(tc/group-by (get row "Date"))))
(.getYear (:average-co2 (fn [ds]
(tc/aggregate {2
(round2 / (reduce + (get ds "CO2"))
(count (get ds "CO2")))))})) (
_unnamed [63 2]:
:$group-name | :average-co2 |
---|---|
1958 | 315.33 |
1959 | 315.98 |
1960 | 316.91 |
1961 | 317.65 |
1962 | 318.45 |
1963 | 318.99 |
1964 | 319.20 |
1965 | 320.04 |
1966 | 321.37 |
1967 | 322.18 |
… | … |
2010 | 389.90 |
2011 | 391.65 |
2012 | 393.87 |
2013 | 396.57 |
2014 | 398.61 |
2015 | 400.89 |
2016 | 404.28 |
2017 | 406.58 |
2018 | 408.59 |
2019 | 411.50 |
2020 | 414.54 |
Can rename the column to be more descriptive
-> co2-over-time
(fn [row]
(tc/group-by (get row "Date"))))
(.getYear (:average-co2 (fn [ds]
(tc/aggregate {/ (reduce + (get ds "CO2"))
(count (get ds "CO2"))))})
(:year})) (tc/rename-columns {:$group-name
_unnamed [63 2]:
:year | :average-co2 |
---|---|
1958 | 315.33375000 |
1959 | 315.98166667 |
1960 | 316.90916667 |
1961 | 317.64500000 |
1962 | 318.45416667 |
1963 | 318.99250000 |
1964 | 319.20111111 |
1965 | 320.03583333 |
1966 | 321.36916667 |
1967 | 322.18083333 |
… | … |
2010 | 389.90083333 |
2011 | 391.64833333 |
2012 | 393.87000000 |
2013 | 396.56666667 |
2014 | 398.61416667 |
2015 | 400.88500000 |
2016 | 404.27750000 |
2017 | 406.58416667 |
2018 | 408.58750000 |
2019 | 411.49500000 |
2020 | 414.53750000 |
Concatenating datasets
def ds1 (tc/dataset [{:id "id1" :b "val1"}
(:id "id2" :b "val2"}
{:id "id3" :b "val3"}])) {
def ds2 (tc/dataset [{:id "id1" :b "val4"}
(:id "id5" :b "val5"}
{:id "id6" :b "val6"}])) {
Naively concats rows
:id "id3" :b "other value"}])) (tc/concat ds1 ds2 (tc/dataset [{
_unnamed [7 2]:
:id | :b |
---|---|
id1 | val1 |
id2 | val2 |
id3 | val3 |
id1 | val4 |
id5 | val5 |
id6 | val6 |
id3 | other value |
:b "val4" :c "text"}
(tc/concat ds1 (tc/dataset [{:b "val5" :c "hi"}
{:b "val6" :c "test"}])) {
_unnamed [6 3]:
:id | :b | :c |
---|---|---|
id1 | val1 | |
id2 | val2 | |
id3 | val3 | |
val4 | text | |
val5 | hi | |
val6 | test |
De-duping
(tc/union ds1 ds2)
union [6 2]:
:id | :b |
---|---|
id1 | val1 |
id2 | val2 |
id3 | val3 |
id1 | val4 |
id5 | val5 |
id6 | val6 |
- Merging datasets
- When column headers are the same or different, on multiple columns TODO explain set logic and SQL joins
def ds3 (tc/dataset {:id [1 2 3 4]
(:b ["val1" "val2" "val3" "val4"]}))
def ds4 (tc/dataset {:id [1 2 3 4]
(:c ["val1" "val2" "val3" "val4"]}))
Keep all columns
:id) (tc/full-join ds3 ds4
full-join [4 4]:
:id | :b | :right.id | :c |
---|---|---|---|
1 | val1 | 1 | val1 |
2 | val2 | 2 | val2 |
3 | val3 | 3 | val3 |
4 | val4 | 4 | val4 |
“Merge” datasets on a given column where rows have a value
:id) (tc/inner-join ds3 ds4
inner-join [4 3]:
:id | :b | :c |
---|---|---|
1 | val1 | val1 |
2 | val2 | val2 |
3 | val3 | val3 |
4 | val4 | val4 |
Drop rows missing a value
:id [1 2 3 4]
(tc/inner-join (tc/dataset {:b ["val1" "val2" "val3"]})
:id [1 2 3 4]
(tc/dataset {:c ["val1" "val2" "val3" "val4"]})
:id)
inner-join [4 3]:
:id | :b | :c |
---|---|---|
1 | val1 | val1 |
2 | val2 | val2 |
3 | val3 | val3 |
4 | val4 |
:id [1 2 3 ]
(tc/right-join (tc/dataset {:b ["val1" "val2" "val3"]})
:id [1 2 3 4]
(tc/dataset {:c ["val1" "val2" "val3" "val4"]})
:id)
right-outer-join [4 4]:
:id | :b | :right.id | :c |
---|---|---|---|
1 | val1 | 1 | val1 |
2 | val2 | 2 | val2 |
3 | val3 | 3 | val3 |
4 | val4 |
scratch
:email ["asdf"]
(tc/left-join (tc/dataset {:name ["asdfads"]
:entry-id [1 2 3]})
:entry-id [1 2 3]
(tc/dataset {:upload-count [2 3 4]
:catgory ["art" "science"]})
:entry-id)
left-outer-join [3 6]:
:entry-id | :name | :right.entry-id | :upload-count | :catgory | |
---|---|---|---|---|---|
1 | asdf | asdfads | 1 | 2 | art |
2 | 2 | 3 | science | ||
3 | 3 | 4 |
:email ["asdf"]
(tc/dataset {:name ["asdfads"]
:entry-id [1 2 3]})
_unnamed [3 3]:
:name | :entry-id | |
---|---|---|
asdf | asdfads | 1 |
2 | ||
3 |
:entry-id [1 2 3]
(tc/dataset {:upload-count [2 3 4]
:catgory ["art" "science"]})
_unnamed [3 3]:
:entry-id | :upload-count | :catgory |
---|---|---|
1 | 2 | art |
2 | 3 | science |
3 | 4 |
see tablecloth join stuff Inner join, only keeps rows with the specified column value in common
:id) (tc/inner-join ds1 ds2
inner-join [1 3]:
:id | :b | :right.b |
---|---|---|
id1 | val1 | val4 |
- Converting between wide and long formats? Signal processing/time series analysis
- Compute rolling average to be able to plot a trend line
def exp-moving-avg
(let [data (get co2-over-time "adjusted CO2")
(
moving-avg->> data
(reduce (fn [acc next]
(conj acc (+ (* 0.9 (last acc)) (* 0.1 next))))
(first data)])
[(rest)]
"Exponential moving average" moving-avg]]))) (tc/dataset [[
- widen dataset to include new row that’s already in order
(tc/append co2-over-time exp-moving-avg)
data/co2_over_time.csv [741 4]:
Date | CO2 | adjusted CO2 | Exponential moving average |
---|---|---|---|
1958-03-01 | 315.70 | 314.44 | 314.44000000 |
1958-04-01 | 317.46 | 315.16 | 314.51200000 |
1958-05-01 | 317.51 | 314.71 | 314.53180000 |
1958-07-01 | 315.86 | 315.19 | 314.59762000 |
1958-08-01 | 314.93 | 316.19 | 314.75685800 |
1958-09-01 | 313.21 | 316.08 | 314.88917220 |
1958-11-01 | 313.33 | 315.20 | 314.92025498 |
1958-12-01 | 314.67 | 315.43 | 314.97122948 |
1959-01-01 | 315.58 | 315.54 | 315.02810653 |
1959-02-01 | 316.49 | 315.86 | 315.11129588 |
… | … | … | … |
2019-06-01 | 413.96 | 411.38 | 409.42307506 |
2019-07-01 | 411.85 | 411.03 | 409.58376755 |
2019-08-01 | 410.08 | 411.62 | 409.78739079 |
2019-09-01 | 408.55 | 412.06 | 410.01465172 |
2019-10-01 | 408.43 | 412.06 | 410.21918654 |
2019-11-01 | 410.29 | 412.56 | 410.45326789 |
2019-12-01 | 411.85 | 412.78 | 410.68594110 |
2020-01-01 | 413.37 | 413.32 | 410.94934699 |
2020-02-01 | 414.09 | 413.33 | 411.18741229 |
2020-03-01 | 414.51 | 412.94 | 411.36267106 |
2020-04-01 | 416.18 | 413.35 | 411.56140396 |
- Rolling average over a 12 point range
def rolling-average
("Rolling average"
(tc/dataset [[-> co2-over-time
(get "adjusted CO2")
(12
(rolling/fixed-rolling-window
fun/mean:relative-window-position :left}))]])) {
(tc/append co2-over-time rolling-average)
data/co2_over_time.csv [741 4]:
Date | CO2 | adjusted CO2 | Rolling average |
---|---|---|---|
1958-03-01 | 315.70 | 314.44 | 314.44000000 |
1958-04-01 | 317.46 | 315.16 | 314.50000000 |
1958-05-01 | 317.51 | 314.71 | 314.52250000 |
1958-07-01 | 315.86 | 315.19 | 314.58500000 |
1958-08-01 | 314.93 | 316.19 | 314.73083333 |
1958-09-01 | 313.21 | 316.08 | 314.86750000 |
1958-11-01 | 313.33 | 315.20 | 314.93083333 |
1958-12-01 | 314.67 | 315.43 | 315.01333333 |
1959-01-01 | 315.58 | 315.54 | 315.10500000 |
1959-02-01 | 316.49 | 315.86 | 315.22333333 |
… | … | … | … |
2019-06-01 | 413.96 | 411.38 | 410.14000000 |
2019-07-01 | 411.85 | 411.03 | 410.38583333 |
2019-08-01 | 410.08 | 411.62 | 410.63500000 |
2019-09-01 | 408.55 | 412.06 | 410.88333333 |
2019-10-01 | 408.43 | 412.06 | 411.08750000 |
2019-11-01 | 410.29 | 412.56 | 411.26916667 |
2019-12-01 | 411.85 | 412.78 | 411.48833333 |
2020-01-01 | 413.37 | 413.32 | 411.69250000 |
2020-02-01 | 414.09 | 413.33 | 411.89500000 |
2020-03-01 | 414.51 | 412.94 | 412.10166667 |
2020-04-01 | 416.18 | 413.35 | 412.32083333 |
- Train a model to predict the next 10 years
-> co2-over-time
( )
data/co2_over_time.csv [741 3]:
Date | CO2 | adjusted CO2 |
---|---|---|
1958-03-01 | 315.70 | 314.44 |
1958-04-01 | 317.46 | 315.16 |
1958-05-01 | 317.51 | 314.71 |
1958-07-01 | 315.86 | 315.19 |
1958-08-01 | 314.93 | 316.19 |
1958-09-01 | 313.21 | 316.08 |
1958-11-01 | 313.33 | 315.20 |
1958-12-01 | 314.67 | 315.43 |
1959-01-01 | 315.58 | 315.54 |
1959-02-01 | 316.49 | 315.86 |
… | … | … |
2019-06-01 | 413.96 | 411.38 |
2019-07-01 | 411.85 | 411.03 |
2019-08-01 | 410.08 | 411.62 |
2019-09-01 | 408.55 | 412.06 |
2019-10-01 | 408.43 | 412.06 |
2019-11-01 | 410.29 | 412.56 |
2019-12-01 | 411.85 | 412.78 |
2020-01-01 | 413.37 | 413.32 |
2020-02-01 | 414.09 | 413.33 |
2020-03-01 | 414.51 | 412.94 |
2020-04-01 | 416.18 | 413.35 |
- Summarizing data (mean, standard deviation, confidence intervals etc.)
- Standard deviation using fastmath
def avg-co2-by-year
(-> co2-over-time
(fn [row]
(tc/group-by (get row "Date"))))
(.getYear (:average-co2 (fn [ds]
(tc/aggregate {get ds "adjusted CO2"))
(stats/mean (;; (/ (reduce + (get ds "CO2"))
;; (count (get ds "CO2")))
):standard-deviation (fn [ds]
get ds "adjusted CO2")))})
(stats/stddev (;; (tc/rename-columns {:$group-name :year})
))
- Overall average
:average-co2 avg-co2-by-year)) (stats/mean (
355.56414902998233
- Long term average 1991-2020
-> avg-co2-by-year
(;; (tc/select-rows (fn [row] (< 1990 (:year row))))
;; :average-co2
;; mean
)
_unnamed [63 3]:
:$group-name | :average-co2 | :standard-deviation |
---|---|---|
1958 | 315.30000000 | 0.60318204 |
1959 | 315.97750000 | 0.47259679 |
1960 | 316.90750000 | 0.42004599 |
1961 | 317.63833333 | 0.45170049 |
1962 | 318.44833333 | 0.37201743 |
1963 | 318.98750000 | 0.28813270 |
1964 | 319.67888889 | 0.20127372 |
1965 | 320.03083333 | 0.50883929 |
1966 | 321.36250000 | 0.37363388 |
1967 | 322.17500000 | 0.32326460 |
… | … | … |
2010 | 389.89333333 | 0.67686891 |
2011 | 391.64500000 | 0.71908401 |
2012 | 393.86500000 | 0.87383689 |
2013 | 396.55833333 | 0.72002315 |
2014 | 398.60500000 | 0.68076828 |
2015 | 400.87833333 | 1.02130784 |
2016 | 404.27416667 | 0.95601881 |
2017 | 406.57750000 | 0.64441834 |
2018 | 408.58166667 | 0.99862481 |
2019 | 411.48833333 | 0.74410206 |
2020 | 413.23500000 | 0.19706175 |
- Working with sequential data
- Smoothing out data
- Calculating a moving average
- Averaging a sequence in blocks
- Run length encoding?
- Filling
nil
s with last non-nil
value?
def sparse-dataset
(:a [nil 2 3 4 nil nil 7 8]
(tc/dataset {:b [10 11 12 nil nil nil 16 nil]}))
-> sparse-dataset
(:up)) (tc/replace-missing
_unnamed [8 2]:
:a | :b |
---|---|
2 | 10 |
2 | 11 |
3 | 12 |
4 | 16 |
7 | 16 |
7 | 16 |
7 | 16 |
8 |
-> sparse-dataset
(:updown)) (tc/replace-missing
_unnamed [8 2]:
:a | :b |
---|---|
2 | 10 |
2 | 11 |
3 | 12 |
4 | 16 |
7 | 16 |
7 | 16 |
7 | 16 |
8 | 16 |
-> sparse-dataset
(:down)) (tc/replace-missing
_unnamed [8 2]:
:a | :b |
---|---|
10 | |
2 | 11 |
3 | 12 |
4 | 12 |
4 | 12 |
4 | 12 |
7 | 16 |
8 | 16 |
-> sparse-dataset
(:downup)) (tc/replace-missing
_unnamed [8 2]:
:a | :b |
---|---|
2 | 10 |
2 | 11 |
3 | 12 |
4 | 12 |
4 | 12 |
4 | 12 |
7 | 16 |
8 | 16 |
-> sparse-dataset
(:lerp)) (tc/replace-missing
_unnamed [8 2]:
:a | :b |
---|---|
2.0 | 10.0 |
2.0 | 11.0 |
3.0 | 12.0 |
4.0 | 13.0 |
5.0 | 14.0 |
6.0 | 15.0 |
7.0 | 16.0 |
8.0 | 16.0 |
-> sparse-dataset
(:all :value 100)) (tc/replace-missing
_unnamed [8 2]:
:a | :b |
---|---|
100 | 10 |
2 | 11 |
3 | 12 |
4 | 100 |
100 | 100 |
100 | 100 |
7 | 16 |
8 | 100 |
-> sparse-dataset
(:a :value 100)) (tc/replace-missing
_unnamed [8 2]:
:a | :b |
---|---|
100 | 10 |
2 | 11 |
3 | 12 |
4 | |
100 | |
100 | |
7 | 16 |
8 |