scicloj.tcutils.api
between
(between ds col-name low high)(between ds col-selector low high {:keys [missing-default]})Detect where values fall in a specified range in a numeric column. This is a shortcut for (< low x high).
Usage
(between ds col-name low high)
(between ds col-name low high {:missing-default val})
Arguments
ds- Atech.ml.dataset(i.e atableclothdataset)column-name- Name of the column to use in the comparisonlow- Lower bound for values ofcolumn-namehigh- Upper bound for values ofcolumn-nameoptions- optional Options map containing the keymissing-defaultto specify what value to use in the case that the value of (col-name row) isnil. Throws an error if there are any missing values in the column and this option is not provided.
Returns
A dataset with only rows that contain values between low and high in column col-name
clean-column-names
(clean-column-names ds)Convert column names of a dataset into ASCII-only, kebab-cased keywords. Throws an error if any column would be left with no name, e.g. one that was an all non-ASCII string.
Usage
clean-column-names(ds)
Arguments
ds- Atech.ml.dataset(i.e atableclothdataset)
Returns
A dataset with the column names converted to ASCII-only, kebab-cased keywords.
cumsum
(cumsum ds column-name)(cumsum ds new-column-name column-name)Compute the cumulative sum of a column
Usage
(cumsum ds column-name)
(cumsum ds new-column-name column-name)
Arguments
ds- Atech.ml.dataset(i.e atableclothdataset)new-column-name- optional Name for the column where newly computed values will go. When ommitted new column name defaults to the keyword<old-column-name>-cumulative-sumcolumn-name- Name of the column to use to compute the cumulative sum
Returns
A dataset with the additional column containing the cumulative sum.
duplicate-rows
(duplicate-rows ds)Filter a dataset for only duplicated rows.
Usage
(duplicate-rows ds)
Arguments
ds- Atech.ml.dataset(i.e atableclothdataset)
Returns
A dataset containing only rows that are exact duplicates.
lag
(lag ds column-name lag-size)(lag ds new-column-name column-name lag-size)Compute previous (lagged) values from one column in a new column, can be used e.g. to compare values behind the current value.
Usage
(lag ds column-name lag-size)
(lag ds new-column-name column-name lag-size)
Arguments
ds- Atech.ml.dataset(i.e atableclothdataset)new-column-name- optional Name for the column where newly computed values will go. When ommitted new column name defaults to the keyword<old-column-name>-lag-<lag-size>column-name- Name of the column to use to compute the lagged valueslag-size- positive integer indicating how many rows to skip over to compute the lag
Returns
A dataset with the new column populated with the lagged values.
lead
(lead ds column-name lead-size)(lead ds new-column-name column-name lead-size)Compute next (lead) values from one column in a new column, can be used e.g. to compare values ahead of the current value.
Usage
(lead ds column-name lead-size)
(lead ds new-column-name column-name lead-size)
Arguments
ds- Atech.ml.dataset(i.e atableclothdataset)new-column-name- optional Name for the column where newly computed values will go. When ommitted new column name defaults to the keyword<old-column-name>-lead-<lead-size>column-name- Name of the column to use to compute the lead valueslead-size- positive integer indicating how many rows to skip over to compute the lead
Returns
A dataset with the column populated with the lead values.