13 MALDI API Reference
Complete reference for scicloj.ripple.maldi — the public API for MALDI-TOF mass spectrometry signal processing.
Setup
A synthetic spectrum for the examples below.
(def masses (double-array (range 2000.0 4000.0 1.0)))(def n-points (count masses))Gaussian peak helper.
(defn gaussian [x center height width]
(* height (Math/exp (- (/ (Math/pow (- x center) 2)
(* 2 width width))))))Intensities: three Gaussian peaks on a sloped baseline with noise.
(def intensities
(let [rng (java.util.Random. 42)]
(double-array
(map (fn [m]
(+ (* 0.005 (- m 2000.0))
(gaussian m 2500.0 100.0 30.0)
(gaussian m 3000.0 200.0 50.0)
(gaussian m 3500.0 80.0 20.0)
(* 2.0 (.nextGaussian rng))))
(seq masses)))))(def spectrum (tc/dataset {:mass masses :intensity intensities}))Preprocessing
sqrt-transform
[intensities]
Apply square root transformation to intensities for variance stabilization.
Args:
- intensities: sequence of intensity values
Returns: transformed intensities
(def step-sqrt (maldi/sqrt-transform intensities))(count step-sqrt)2000savitzky-golay-smooth
[intensities opts]
Apply Savitzky-Golay smoothing to spectrum intensities (MALDIquant compatible).
Negative smoothed values are clamped to 0.0.
Args:
- intensities: intensity values
- opts: map with optional keys:
- :window-size (default 11, must be odd)
- :polynomial-order (default 2)
Returns: smoothed intensities
(def step-sg (maldi/savitzky-golay-smooth intensities {:window-size 11 :polynomial-order 2}))(count step-sg)2000moving-average-smooth
[intensities opts]
Apply moving average smoothing (MALDIquant compatible).
Interior points use the mean of a symmetric window. Edge points replicate the first/last full-window mean. Negative values are clamped to 0.0.
Args:
- intensities: intensity values
- opts: map with optional keys:
- :half-window-size (default 2)
Returns: smoothed intensities
(def step-ma (maldi/moving-average-smooth intensities {:half-window-size 5}))(count step-ma)2000median-filter
[intensities opts]
Apply median filter for noise reduction.
Args:
- intensities: intensity values
- opts: map with optional keys:
- :window-size (default 5, must be odd)
Returns: median-filtered intensities
(def step-mf (maldi/median-filter intensities {:window-size 5}))(count step-mf)2000snip-baseline-removal
[intensities opts]
Remove baseline using SNIP algorithm (MALDIquant compatible).
Args:
- intensities: intensity values
- opts: map with optional keys:
- :iterations (default 25)
- :decreasing (default true, MALDIquant default: large→small windows)
Returns: baseline-corrected intensities
(def step-snip (maldi/snip-baseline-removal step-sg {:iterations 25}))SNIP-corrected values are non-negative.
(>= (dfn/reduce-min step-snip) 0.0)truetophat-baseline
[intensities opts]
Remove baseline using TopHat morphological filter (MALDIquant compatible).
The TopHat operation is: signal - opening(signal), where opening = dilation(erosion(signal)).
Args:
- intensities: intensity values
- opts: map with optional keys:
- :half-window-size (default 100)
Returns: baseline-corrected intensities
(def step-tophat (maldi/tophat-baseline intensities {:half-window-size 100}))(>= (dfn/reduce-min step-tophat) 0.0)truetic-normalize
[masses intensities opts]
Normalize intensities using Total Ion Current (MALDIquant compatible).
Normalizes the area under the curve (trapezoid rule) to target-area.
Args:
- masses: mass values
- intensities: intensity values
- opts: map with optional keys:
- :target-area (default 1.0)
Returns: normalized intensities
TIC normalization requires both masses and intensities.
(def step-tic (maldi/tic-normalize masses step-snip {:target-area 1.0}))Verify the area under the normalized curve is close to the target.
(defn trapezoid-area [ms is]
(dfn/sum (dfn/* (dfn/- (dfn/shift ms -1) ms)
(dfn/+ is (dfn/shift is -1))
0.5)))(< (Math/abs (- (trapezoid-area masses step-tic) 1.0)) 1e-10)truemedian-calibrate
[intensities opts]
Normalize intensities by dividing by the median (MALDIquant compatible).
Args:
- intensities: intensity values
- opts: (reserved for future options)
Returns: calibrated intensities
(def step-median (maldi/median-calibrate intensities {}))After median calibration the median intensity is 1.0.
(< (Math/abs (- (fstats/median step-median) 1.0)) 1e-10)truePeak Detection
find-local-maxima-logical
[intensities opts]
Find local maxima using sliding window approach (MALDIquant algorithm).
Args:
- intensities: intensity values
- opts: map with optional keys:
- :half-window-size (default 20)
Returns: boolean array where true indicates a local maximum
(def local-max (maldi/find-local-maxima-logical step-snip {:half-window-size 20}))Returns a boolean array with one entry per point.
(count local-max)2000estimate-noise-mad
[intensities opts]
Estimate noise level using MAD (Median Absolute Deviation).
Args:
- intensities: intensity values
- opts: map with optional keys:
- :half-window-size (nil for global, integer for local)
Returns: scalar noise estimate or array of local noise estimates
Global noise estimate (single scalar).
(def noise-global (maldi/estimate-noise-mad step-snip {:half-window-size nil}))(pos? noise-global)trueLocal noise estimate (one value per point).
(def noise-local (maldi/estimate-noise-mad step-snip {:half-window-size 100}))(count noise-local)2000filter-peaks-by-snr
[intensities is-local-maxima noise opts]
Filter peak candidates by Signal-to-Noise Ratio threshold.
Args:
- intensities: intensity values
- is-local-maxima: boolean array from find-local-maxima-logical
- noise: noise estimate (scalar or array)
- opts: map with optional keys:
- :snr-threshold (default 2)
Returns: vector of indices above SNR threshold
(def peak-indices
(maldi/filter-peaks-by-snr step-snip local-max noise-local {:snr-threshold 2}))Returns a subset of indices — fewer than the total local maxima.
(<= (count peak-indices) (dfn/sum local-max))truedetect-peaks
[intensities opts]
Detect peaks using MALDIquant-compatible algorithm.
Pipeline: local maxima → MAD noise estimation → SNR filtering.
Args:
- intensities: intensity values
- opts: map with optional keys:
- :half-window-size (default 20)
- :snr (default 2)
- :noise-method (default :mad-global)
Returns: buffer of peak indices
detect-peaks combines local maxima, noise estimation, and SNR filtering.
(def peaks (maldi/detect-peaks step-snip {:half-window-size 20 :snr 2}))We expect a small number of peaks from our 3-peak synthetic spectrum.
(<= 1 (count peaks) 50)truePipeline
preprocess-spectrum-data
[spectrum opts]
Apply full preprocessing pipeline to spectrum data.
Args:
- spectrum: tablecloth dataset or map with :mass and :intensity keys
- opts: map with optional keys:
- :should-sqrt-transform (default true)
- :smooth-window (default 11, must be odd)
- :smooth-polynomial (default 2)
- :baseline-iterations (default 20)
- :baseline-repetitions (default 2) — number of times to apply SNIP
- :should-tic-normalize (default true)
- :tic-target (default 1.0)
Returns: tablecloth dataset with :mass and :intensity columns
(def preprocessed
(maldi/preprocess-spectrum-data spectrum
{:should-sqrt-transform true
:smooth-window 11
:smooth-polynomial 2
:baseline-iterations 20
:should-tic-normalize true
:tic-target 1.0}))(tc/dataset? preprocessed)trueThe result has the same columns as the input.
(sort (tc/column-names preprocessed))(:intensity :mass)Spectrum Manipulation
trim-spectrum
[spectrum opts]
Restrict a spectrum to a given mass range.
Args:
- spectrum: tablecloth dataset with :mass and :intensity columns
- opts: map with :range [min-mass max-mass]
Returns: filtered tablecloth dataset
(def trimmed (maldi/trim-spectrum spectrum {:range [2500 3500]}))All masses are within the specified range.
(let [ms (:mass trimmed)]
(and (>= (dfn/reduce-min ms) 2500.0)
(<= (dfn/reduce-max ms) 3500.0)))trueBinning
calculate-n-bins
[params]
Calculate number of bins for given range and step.
Args:
- params: map with keys:
- :range [min-mass max-mass]
- :step bin width
Returns: integer number of bins
DRIAMS parameters: range [2000, 20000], step 3.
(maldi/calculate-n-bins {:range [2000 20000] :step 3})6000bin-spectrum
[spectrum params]
Bin a preprocessed spectrum into fixed-width m/z bins.
Args:
- spectrum: tablecloth dataset with :mass and :intensity columns
- params: map with keys:
- :range [min-mass max-mass]
- :step bin width
Returns: double array of binned intensities
(def binned (maldi/bin-spectrum preprocessed {:range [2000 4000] :step 3}))Returns a double array with the expected number of bins.
(count binned)666Multi-Sample Workflow
Three small peak lists for alignment examples.
(def peak-lists
[{:masses (double-array [3000.1 3500.2 4000.3])
:intensities (double-array [100.0 200.0 150.0])}
{:masses (double-array [3000.3 3500.0 4000.1])
:intensities (double-array [110.0 190.0 160.0])}
{:masses (double-array [3000.2 3500.1])
:intensities (double-array [105.0 195.0])}])bin-peaks-multi
[peak-lists opts]
Bin peaks across multiple spectra by clustering nearby m/z values.
Implements MALDIquant’s binPeaks(method=‘strict’): pools all peak masses, recursively splits at largest gaps until groups are homogeneous within tolerance, then replaces each group’s masses with the group mean.
Args:
- peak-lists: vector of peak maps {:masses double-array, :intensities double-array}
- opts: map with optional keys:
- :tolerance (default 0.002) — relative tolerance for grouping
Returns: vector of peak maps with binned (aligned) masses
(def binned-peaks (maldi/bin-peaks-multi peak-lists {:tolerance 0.002}))After binning, mass positions within each group are aligned to the group mean.
(count binned-peaks)3filter-peaks-multi
[peak-lists opts]
Filter peaks by minimum frequency across spectra.
Keeps only peaks at mass positions that appear in at least min-frequency fraction of spectra.
Args:
- peak-lists: vector of peak maps {:masses double-array, :intensities double-array}
- opts: map with keys:
- :min-frequency (default 0.25)
Returns: filtered vector of peak maps
Keep only peaks present in at least 2/3 of spectra.
(def filtered-peaks (maldi/filter-peaks-multi binned-peaks {:min-frequency 0.66}))The ~3000 and ~3500 peaks appear in all 3 spectra; ~4000 appears in only 2. With min-frequency 0.66, all three mass positions survive.
(count (:masses (first filtered-peaks)))3intensity-matrix
[peak-lists]
Build a samples × features intensity matrix from binned peaks.
After binning and filtering, converts peak lists into a rectangular matrix. Each row is a spectrum, each column is a unique mass position.
Args:
- peak-lists: vector of peak maps {:masses double-array, :intensities double-array}
Returns: map with :matrix (vector of double-arrays), :masses (double-array), :n-spectra, :n-features
(def imat (maldi/intensity-matrix filtered-peaks))The matrix has one row per spectrum and one column per feature.
(:n-spectra imat)3(pos? (:n-features imat))true