scicloj.tcutils.api
between
(between ds col-name low high)
(between ds col-selector low high {:keys [missing-default]})
Detect where values fall in a specified range in a numeric column. This is a shortcut for (< low x high)
.
Usage
(between ds col-name low high)
(between ds col-name low high {:missing-default val})
Arguments
ds
- Atech.ml.dataset
(i.e atablecloth
dataset)column-name
- Name of the column to use in the comparisonlow
- Lower bound for values ofcolumn-name
high
- Upper bound for values ofcolumn-name
options
- optional Options map containing the keymissing-default
to specify what value to use in the case that the value of (col-name row) isnil
. Throws an error if there are any missing values in the column and this option is not provided.
Returns
A dataset with only rows that contain values between low
and high
in column col-name
clean-column-names
(clean-column-names ds)
Convert column names of a dataset into ASCII-only, kebab-cased keywords. Throws an error if any column would be left with no name, e.g. one that was an all non-ASCII string.
Usage
clean-column-names(ds)
Arguments
ds
- Atech.ml.dataset
(i.e atablecloth
dataset)
Returns
A dataset with the column names converted to ASCII-only, kebab-cased keywords.
cumsum
(cumsum ds column-name)
(cumsum ds new-column-name column-name)
Compute the cumulative sum of a column
Usage
(cumsum ds column-name)
(cumsum ds new-column-name column-name)
Arguments
ds
- Atech.ml.dataset
(i.e atablecloth
dataset)new-column-name
- optional Name for the column where newly computed values will go. When ommitted new column name defaults to the keyword<old-column-name>-cumulative-sum
column-name
- Name of the column to use to compute the cumulative sum
Returns
A dataset with the additional column containing the cumulative sum.
duplicate-rows
(duplicate-rows ds)
Filter a dataset for only duplicated rows.
Usage
(duplicate-rows ds)
Arguments
ds
- Atech.ml.dataset
(i.e atablecloth
dataset)
Returns
A dataset containing only rows that are exact duplicates.
lag
(lag ds column-name lag-size)
(lag ds new-column-name column-name lag-size)
Compute previous (lagged) values from one column in a new column, can be used e.g. to compare values behind the current value.
Usage
(lag ds column-name lag-size)
(lag ds new-column-name column-name lag-size)
Arguments
ds
- Atech.ml.dataset
(i.e atablecloth
dataset)new-column-name
- optional Name for the column where newly computed values will go. When ommitted new column name defaults to the keyword<old-column-name>-lag-<lag-size>
column-name
- Name of the column to use to compute the lagged valueslag-size
- positive integer indicating how many rows to skip over to compute the lag
Returns
A dataset with the new column populated with the lagged values.
lead
(lead ds column-name lead-size)
(lead ds new-column-name column-name lead-size)
Compute next (lead) values from one column in a new column, can be used e.g. to compare values ahead of the current value.
Usage
(lead ds column-name lead-size)
(lead ds new-column-name column-name lead-size)
Arguments
ds
- Atech.ml.dataset
(i.e atablecloth
dataset)new-column-name
- optional Name for the column where newly computed values will go. When ommitted new column name defaults to the keyword<old-column-name>-lead-<lead-size>
column-name
- Name of the column to use to compute the lead valueslead-size
- positive integer indicating how many rows to skip over to compute the lead
Returns
A dataset with the column populated with the lead values.