13  MALDI API Reference

Complete reference for scicloj.ripple.maldi — the public API for MALDI-TOF mass spectrometry signal processing.

Setup

A synthetic spectrum for the examples below.

(def masses (double-array (range 2000.0 4000.0 1.0)))
(def n-points (count masses))

Gaussian peak helper.

(defn gaussian [x center height width]
  (* height (Math/exp (- (/ (Math/pow (- x center) 2)
                            (* 2 width width))))))

Intensities: three Gaussian peaks on a sloped baseline with noise.

(def intensities
  (let [rng (java.util.Random. 42)]
    (double-array
     (map (fn [m]
            (+ (* 0.005 (- m 2000.0))
               (gaussian m 2500.0 100.0 30.0)
               (gaussian m 3000.0 200.0 50.0)
               (gaussian m 3500.0 80.0 20.0)
               (* 2.0 (.nextGaussian rng))))
          (seq masses)))))
(def spectrum (tc/dataset {:mass masses :intensity intensities}))

Preprocessing

sqrt-transform

[intensities]

Apply square root transformation to intensities for variance stabilization.

Args:

  • intensities: sequence of intensity values

Returns: transformed intensities

(def step-sqrt (maldi/sqrt-transform intensities))
(count step-sqrt)
2000

savitzky-golay-smooth

[intensities opts]

Apply Savitzky-Golay smoothing to spectrum intensities (MALDIquant compatible).

Negative smoothed values are clamped to 0.0.

Args:

  • intensities: intensity values
  • opts: map with optional keys:
    • :window-size (default 11, must be odd)
    • :polynomial-order (default 2)

Returns: smoothed intensities

(def step-sg (maldi/savitzky-golay-smooth intensities {:window-size 11 :polynomial-order 2}))
(count step-sg)
2000

moving-average-smooth

[intensities opts]

Apply moving average smoothing (MALDIquant compatible).

Interior points use the mean of a symmetric window. Edge points replicate the first/last full-window mean. Negative values are clamped to 0.0.

Args:

  • intensities: intensity values
  • opts: map with optional keys:
    • :half-window-size (default 2)

Returns: smoothed intensities

(def step-ma (maldi/moving-average-smooth intensities {:half-window-size 5}))
(count step-ma)
2000

median-filter

[intensities opts]

Apply median filter for noise reduction.

Args:

  • intensities: intensity values
  • opts: map with optional keys:
    • :window-size (default 5, must be odd)

Returns: median-filtered intensities

(def step-mf (maldi/median-filter intensities {:window-size 5}))
(count step-mf)
2000

snip-baseline-removal

[intensities opts]

Remove baseline using SNIP algorithm (MALDIquant compatible).

Args:

  • intensities: intensity values
  • opts: map with optional keys:
    • :iterations (default 25)
    • :decreasing (default true, MALDIquant default: large→small windows)

Returns: baseline-corrected intensities

(def step-snip (maldi/snip-baseline-removal step-sg {:iterations 25}))

SNIP-corrected values are non-negative.

(>= (dfn/reduce-min step-snip) 0.0)
true

tophat-baseline

[intensities opts]

Remove baseline using TopHat morphological filter (MALDIquant compatible).

The TopHat operation is: signal - opening(signal), where opening = dilation(erosion(signal)).

Args:

  • intensities: intensity values
  • opts: map with optional keys:
    • :half-window-size (default 100)

Returns: baseline-corrected intensities

(def step-tophat (maldi/tophat-baseline intensities {:half-window-size 100}))
(>= (dfn/reduce-min step-tophat) 0.0)
true

tic-normalize

[masses intensities opts]

Normalize intensities using Total Ion Current (MALDIquant compatible).

Normalizes the area under the curve (trapezoid rule) to target-area.

Args:

  • masses: mass values
  • intensities: intensity values
  • opts: map with optional keys:
    • :target-area (default 1.0)

Returns: normalized intensities

TIC normalization requires both masses and intensities.

(def step-tic (maldi/tic-normalize masses step-snip {:target-area 1.0}))

Verify the area under the normalized curve is close to the target.

(defn trapezoid-area [ms is]
  (dfn/sum (dfn/* (dfn/- (dfn/shift ms -1) ms)
                  (dfn/+ is (dfn/shift is -1))
                  0.5)))
(< (Math/abs (- (trapezoid-area masses step-tic) 1.0)) 1e-10)
true

median-calibrate

[intensities opts]

Normalize intensities by dividing by the median (MALDIquant compatible).

Args:

  • intensities: intensity values
  • opts: (reserved for future options)

Returns: calibrated intensities

(def step-median (maldi/median-calibrate intensities {}))

After median calibration the median intensity is 1.0.

(< (Math/abs (- (fstats/median step-median) 1.0)) 1e-10)
true

Peak Detection

find-local-maxima-logical

[intensities opts]

Find local maxima using sliding window approach (MALDIquant algorithm).

Args:

  • intensities: intensity values
  • opts: map with optional keys:
    • :half-window-size (default 20)

Returns: boolean array where true indicates a local maximum

(def local-max (maldi/find-local-maxima-logical step-snip {:half-window-size 20}))

Returns a boolean array with one entry per point.

(count local-max)
2000

estimate-noise-mad

[intensities opts]

Estimate noise level using MAD (Median Absolute Deviation).

Args:

  • intensities: intensity values
  • opts: map with optional keys:
    • :half-window-size (nil for global, integer for local)

Returns: scalar noise estimate or array of local noise estimates

Global noise estimate (single scalar).

(def noise-global (maldi/estimate-noise-mad step-snip {:half-window-size nil}))
(pos? noise-global)
true

Local noise estimate (one value per point).

(def noise-local (maldi/estimate-noise-mad step-snip {:half-window-size 100}))
(count noise-local)
2000

filter-peaks-by-snr

[intensities is-local-maxima noise opts]

Filter peak candidates by Signal-to-Noise Ratio threshold.

Args:

  • intensities: intensity values
  • is-local-maxima: boolean array from find-local-maxima-logical
  • noise: noise estimate (scalar or array)
  • opts: map with optional keys:
    • :snr-threshold (default 2)

Returns: vector of indices above SNR threshold

(def peak-indices
  (maldi/filter-peaks-by-snr step-snip local-max noise-local {:snr-threshold 2}))

Returns a subset of indices — fewer than the total local maxima.

(<= (count peak-indices) (dfn/sum local-max))
true

detect-peaks

[intensities opts]

Detect peaks using MALDIquant-compatible algorithm.

Pipeline: local maxima → MAD noise estimation → SNR filtering.

Args:

  • intensities: intensity values
  • opts: map with optional keys:
    • :half-window-size (default 20)
    • :snr (default 2)
    • :noise-method (default :mad-global)

Returns: buffer of peak indices

detect-peaks combines local maxima, noise estimation, and SNR filtering.

(def peaks (maldi/detect-peaks step-snip {:half-window-size 20 :snr 2}))

We expect a small number of peaks from our 3-peak synthetic spectrum.

(<= 1 (count peaks) 50)
true

Pipeline

preprocess-spectrum-data

[spectrum opts]

Apply full preprocessing pipeline to spectrum data.

Args:

  • spectrum: tablecloth dataset or map with :mass and :intensity keys
  • opts: map with optional keys:
    • :should-sqrt-transform (default true)
    • :smooth-window (default 11, must be odd)
    • :smooth-polynomial (default 2)
    • :baseline-iterations (default 20)
    • :baseline-repetitions (default 2) — number of times to apply SNIP
    • :should-tic-normalize (default true)
    • :tic-target (default 1.0)

Returns: tablecloth dataset with :mass and :intensity columns

(def preprocessed
  (maldi/preprocess-spectrum-data spectrum
    {:should-sqrt-transform true
     :smooth-window 11
     :smooth-polynomial 2
     :baseline-iterations 20
     :should-tic-normalize true
     :tic-target 1.0}))
(tc/dataset? preprocessed)
true

The result has the same columns as the input.

(sort (tc/column-names preprocessed))
(:intensity :mass)

Spectrum Manipulation

trim-spectrum

[spectrum opts]

Restrict a spectrum to a given mass range.

Args:

  • spectrum: tablecloth dataset with :mass and :intensity columns
  • opts: map with :range [min-mass max-mass]

Returns: filtered tablecloth dataset

(def trimmed (maldi/trim-spectrum spectrum {:range [2500 3500]}))

All masses are within the specified range.

(let [ms (:mass trimmed)]
  (and (>= (dfn/reduce-min ms) 2500.0)
       (<= (dfn/reduce-max ms) 3500.0)))
true

Binning

calculate-n-bins

[params]

Calculate number of bins for given range and step.

Args:

  • params: map with keys:
    • :range [min-mass max-mass]
    • :step bin width

Returns: integer number of bins

DRIAMS parameters: range [2000, 20000], step 3.

(maldi/calculate-n-bins {:range [2000 20000] :step 3})
6000

bin-spectrum

[spectrum params]

Bin a preprocessed spectrum into fixed-width m/z bins.

Args:

  • spectrum: tablecloth dataset with :mass and :intensity columns
  • params: map with keys:
    • :range [min-mass max-mass]
    • :step bin width

Returns: double array of binned intensities

(def binned (maldi/bin-spectrum preprocessed {:range [2000 4000] :step 3}))

Returns a double array with the expected number of bins.

(count binned)
666

Multi-Sample Workflow

Three small peak lists for alignment examples.

(def peak-lists
  [{:masses (double-array [3000.1 3500.2 4000.3])
    :intensities (double-array [100.0 200.0 150.0])}
   {:masses (double-array [3000.3 3500.0 4000.1])
    :intensities (double-array [110.0 190.0 160.0])}
   {:masses (double-array [3000.2 3500.1])
    :intensities (double-array [105.0 195.0])}])

bin-peaks-multi

[peak-lists opts]

Bin peaks across multiple spectra by clustering nearby m/z values.

Implements MALDIquant’s binPeaks(method=‘strict’): pools all peak masses, recursively splits at largest gaps until groups are homogeneous within tolerance, then replaces each group’s masses with the group mean.

Args:

  • peak-lists: vector of peak maps {:masses double-array, :intensities double-array}
  • opts: map with optional keys:
    • :tolerance (default 0.002) — relative tolerance for grouping

Returns: vector of peak maps with binned (aligned) masses

(def binned-peaks (maldi/bin-peaks-multi peak-lists {:tolerance 0.002}))

After binning, mass positions within each group are aligned to the group mean.

(count binned-peaks)
3

filter-peaks-multi

[peak-lists opts]

Filter peaks by minimum frequency across spectra.

Keeps only peaks at mass positions that appear in at least min-frequency fraction of spectra.

Args:

  • peak-lists: vector of peak maps {:masses double-array, :intensities double-array}
  • opts: map with keys:
    • :min-frequency (default 0.25)

Returns: filtered vector of peak maps

Keep only peaks present in at least 2/3 of spectra.

(def filtered-peaks (maldi/filter-peaks-multi binned-peaks {:min-frequency 0.66}))

The ~3000 and ~3500 peaks appear in all 3 spectra; ~4000 appears in only 2. With min-frequency 0.66, all three mass positions survive.

(count (:masses (first filtered-peaks)))
3

intensity-matrix

[peak-lists]

Build a samples × features intensity matrix from binned peaks.

After binning and filtering, converts peak lists into a rectangular matrix. Each row is a spectrum, each column is a unique mass position.

Args:

  • peak-lists: vector of peak maps {:masses double-array, :intensities double-array}

Returns: map with :matrix (vector of double-arrays), :masses (double-array), :n-spectra, :n-features

(def imat (maldi/intensity-matrix filtered-peaks))

The matrix has one row per spectrum and one column per feature.

(:n-spectra imat)
3
(pos? (:n-features imat))
true
source: notebooks/ripple_book/maldi_api_reference.clj