11 MALDIquant Test Vectors

This notebook validates Ripple against MALDIquant’s own (Gibb & Strimmer, 2016) testthat test suite (version 1.22.3).

Rather than designing our own test cases, we extract the concrete expected values that MALDIquant’s developers use for their unit tests and verify that Ripple produces identical results. This is the strongest form of compatibility validation — if we pass MALDIquant’s own tests, our implementations are faithful to the reference.

Note: This notebook requires impl namespaces to test individual internal functions at a finer granularity than the public API exposes.

(ns ripple-book.maldiquant-test-vectors
  (:require
   ;; Ripple MALDI signal processing internals:
   [scicloj.ripple.maldi.impl.signal :as signal]
   ;; R interop (https://github.com/scicloj/clojisr):
   [clojisr.v1.r :as r :refer [r r->clj]]
   ;; Annotating kinds of visualizations (https://scicloj.github.io/kindly-noted/):
   [scicloj.kindly.v4.kind :as kind]))

Setup

(r "library(MALDIquant)")

[1] "MALDIquant" "Rserve"     "stats"      "graphics"   "grDevices" 
[6] "utils"      "datasets"   "methods"    "base"

1. Local Maxima Detection

A local maximum is a point whose intensity is the highest within its sliding window. MALDIquant pads the input with zeros on both sides before running a C-level sliding-window scan. The internal R function .localMaxima is called by detectPeaks.

Test vector from test_localMaxima-functions.R

y = c(1, 1, 2, 1, 1, 3, 4) with halfWindowSize=1 Expected: c(TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE)

(def maxima-test-1
  (let [y [1.0 1.0 2.0 1.0 1.0 3.0 4.0]
        result (signal/find-local-maxima-logical y {:half-window-size 1})]
    (vec (for [i (range (count y))] (aget result i)))))

maxima-test-1

[true false true false false false true]

Test vectors from test_findLocalMaximaLogical-methods.R

intensity = c(1, 2, 1, 2, 1):

halfWindowSize=1 → c(FALSE, TRUE, FALSE, TRUE, FALSE)
halfWindowSize=2 → c(FALSE, TRUE, FALSE, FALSE, FALSE)

(def maxima-test-hws1
  (let [y [1.0 2.0 1.0 2.0 1.0]
        result (signal/find-local-maxima-logical y {:half-window-size 1})]
    (vec (for [i (range 5)] (aget result i)))))

maxima-test-hws1

[false true false true false]

(def maxima-test-hws2
  (let [y [1.0 2.0 1.0 2.0 1.0]
        result (signal/find-local-maxima-logical y {:half-window-size 2})]
    (vec (for [i (range 5)] (aget result i)))))

maxima-test-hws2

[false true false false false]

2. MAD Noise Estimation

MALDIquant uses R’s stats::mad for Median Absolute Deviation: median(|x - median(x)|) * 1.4826.

A subtle but critical detail: the median must use the standard interpolated definition (average of two middle values for even-length sequences). See the even-length test below.

Test vector from test_estimateNoise-methods.R

intensity = rep(10:1, 2) → noise = stats::mad(rep(10:1, 2))

(def mad-test-data (vec (concat (range 10 0 -1) (range 10 0 -1))))

(def r-mad (first (r->clj (r (str "mad(c(" (clojure.string/join "," mad-test-data) "))")))))

(def clj-mad (signal/estimate-noise-mad (mapv double mad-test-data) {}))

{:r r-mad :clj clj-mad :diff (Math/abs (- clj-mad r-mad))}

{:r 3.7064999999999997, :clj 3.7064999999999997, :diff 0.0}

Even-length MAD (exercises the median fix)

For [1, 2, 3, 4]:

R median = 2.5 (average of 2 and 3)
abs deviations = [1.5, 0.5, 0.5, 1.5] → sorted [0.5, 0.5, 1.5, 1.5]
median of abs deviations = 1.0
MAD = 1.0 * 1.4826 = 1.4826

(def mad-even-test
  (let [x [1.0 2.0 3.0 4.0]
        clj-result (signal/estimate-noise-mad x {})]
    {:clj clj-result :expected 1.4826}))

mad-even-test

{:clj 1.4826, :expected 1.4826}

3. Peak Detection

MALDIquant’s detectPeaks combines three steps:

Find local maxima (sliding window)
Estimate noise using MAD
Keep peaks where intensity > (SNR * noise)

When noise is zero (e.g. a constant-valued spectrum), the threshold becomes zero, so all positive local maxima pass. This matches MALDIquant’s isAboveNoise check.

Test vector from test_detectPeaks-methods.R

createMassSpectrum(mass=1:5, intensity=c(1, 2, 1, 2, 1)) detectPeaks(halfWindowSize=1) → peaks at masses 2 and 4

(def peaks-test
  (let [intensities [1.0 2.0 1.0 2.0 1.0]
        peaks (signal/detect-peaks intensities {:half-window-size 1 :snr 2})]
    peaks))

peaks-test

[1 3]

Verify peak intensities

(mapv #(nth [1.0 2.0 1.0 2.0 1.0] %) peaks-test)

[2.0 2.0]

4. Savitzky-Golay Smoothing

Savitzky-Golay smoothing fits local polynomials in a sliding window. MALDIquant computes custom boundary coefficients for edge handling, while jDSP (our Java dependency) uses a different padding strategy. Interior values match exactly; edge values differ slightly.

Test vector from test_smoothingFilters-functions.R

Scrambled 20-element vector with halfWindowSize=2, polynomialOrder=3

(def sg-test-values
  [8.0 1.0 7.0 6.0 3.0 13.0 5.0 2.0 19.0 11.0
   15.0 18.0 9.0 10.0 20.0 12.0 17.0 14.0 16.0 4.0])

(def sg-r-expected
  [7.64285714285716 2.42857142857143 4.85714285714286
   5.14285714285714 6.94285714285714 8.37142857142857
   5.68571428571428 7.14285714285714 11.9714285714286
   15.2857142857143 14.8285714285714 15.1714285714286
   10.9714285714286 12.2285714285714 15.0285714285714
   16.4571428571429 14.0857142857143 16.7428571428571
   14.1714285714286 4.45714285714283])

(def sg-clj-result
  (vec (signal/savitzky-golay-smooth sg-test-values {:window-size 5 :polynomial-order 3})))

Interior values (indices 2–17) should match exactly

(def sg-interior-diffs
  (mapv (fn [i] {:index i
                 :diff (Math/abs (- (nth sg-clj-result i) (nth sg-r-expected i)))})
        (range 2 18)))

(every? #(< (:diff %) 1e-10) sg-interior-diffs)

true

Edge values (indices 0, 1, 18, 19) differ due to boundary handling

(def sg-edge-diffs
  (mapv (fn [i] {:index i
                 :clj (nth sg-clj-result i)
                 :r (nth sg-r-expected i)
                 :diff (Math/abs (- (nth sg-clj-result i) (nth sg-r-expected i)))})
        [0 1 18 19]))

(kind/table sg-edge-diffs)

index	clj	r	diff
0	5.685714285714287	7.64285714285716	1.9571428571428733
1	4.42857142857143	2.42857142857143	1.9999999999999996
18	12.142857142857142	14.1714285714286	2.028571428571457
19	7.257142857142858	4.45714285714283	2.8000000000000282

SG order-0 = moving average (MALDIquant’s internal check)

(def sg-ma-r-expected
  (r->clj (r "MALDIquant:::.movingAverage(1:10, halfWindowSize=2)")))

(def sg-ma-clj
  (vec (signal/savitzky-golay-smooth (mapv double (range 1 11))
                                     {:window-size 5 :polynomial-order 0})))

Interior match for SG order-0 vs moving average

(def sg-ma-interior-match
  (every? #(< (Math/abs %) 1e-10)
          (map - (subvec sg-ma-clj 2 8) (subvec (vec sg-ma-r-expected) 2 8))))

sg-ma-interior-match

true

5. Negative Clamping

MALDIquant’s .transformIntensity replaces negative values with zeros after any transformation (including smoothing). Our SG implementation must also clamp negatives.

(def clamp-test
  (let [intensities [0.0 0.0 0.0 10.0 0.0 0.0 0.0]
        smoothed (vec (signal/savitzky-golay-smooth intensities {:window-size 5 :polynomial-order 2}))]
    {:all-non-negative? (every? #(>= % 0.0) smoothed)
     :smoothed smoothed}))

(:all-non-negative? clamp-test)

true

6. SNIP Baseline Removal

The SNIP algorithm (Ryan et al. 1988) iteratively clips peaks to estimate the baseline. removeBaseline subtracts the estimated baseline from the original. MALDIquant supports both decreasing (default) and increasing window order.

Test vector from test_removeBaseline-methods.R

intensity = rep(10:1, 2), iterations=2, decreasing=TRUE: Expected corrected = c(rep(0, 10), 7.5, 5, 2.5, rep(0, 7))

(def snip-test-intensities
  (mapv double (concat (range 10 0 -1) (range 10 0 -1))))

(def snip-dec-result
  (vec (signal/snip-baseline-removal snip-test-intensities {:iterations 2 :decreasing true})))

(def snip-dec-expected
  (vec (concat (repeat 10 0.0) [7.5 5.0 2.5] (repeat 7 0.0))))

snip-dec-result

[0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 7.5
 5.0
 2.5
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0]

Increasing window order

(def snip-inc-result
  (vec (signal/snip-baseline-removal snip-test-intensities {:iterations 2 :decreasing false})))

(def snip-inc-expected
  [0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 5.0 2.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0])

snip-inc-result

[0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 5.0
 5.0
 2.5
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0]

With default iterations (100)

(def snip-dec-100
  (vec (signal/snip-baseline-removal snip-test-intensities {:iterations 100 :decreasing true})))

(def snip-dec-100-expected
  (r->clj (r "as.numeric(intensity(removeBaseline(createMassSpectrum(1:20, rep(10:1, 2)), iterations=100, decreasing=TRUE)))")))

(def snip-dec-100-max-diff
  (apply max (map #(Math/abs (- %1 %2)) snip-dec-100 snip-dec-100-expected)))

snip-dec-100-max-diff

0.0

7. TIC Normalization

MALDIquant normalizes using the trapezoid rule for area calculation: sum((left + right) / 2 * diff(mass)).

Test vector from test_totalIonCurrent-methods.R

mass=1:10, intensity=1:10 → TIC = 49.5

(def tic-masses (mapv double (range 1 11)))

(def tic-intensities (mapv double (range 1 11)))

(def tic-area
  (reduce + (map (fn [i1 i2 m1 m2]
                   (* (/ (+ i1 i2) 2.0) (- m2 m1)))
                 tic-intensities (rest tic-intensities)
                 tic-masses (rest tic-masses))))

tic-area

49.5

After TIC normalization, area should be 1.0

(def tic-normalized
  (vec (signal/tic-normalize tic-masses tic-intensities {:target-area 1.0})))

(def tic-normalized-area
  (reduce + (map (fn [i1 i2 m1 m2]
                   (* (/ (+ i1 i2) 2.0) (- m2 m1)))
                 tic-normalized (rest tic-normalized)
                 tic-masses (rest tic-masses))))

tic-normalized-area

1.0

Exact values match R’s calibrateIntensity

(def r-tic-normalized
  (r->clj (r "intensity(calibrateIntensity(createMassSpectrum(1:10, as.double(1:10)), method='TIC'))")))

(def tic-max-diff
  (apply max (map #(Math/abs (- %1 %2)) tic-normalized r-tic-normalized)))

tic-max-diff

2.7755575615628914E-17

8. Square Root Transformation

MALDIquant applies sqrt then clamps negatives to zero. Since sqrt of non-negative values is always non-negative, this is effectively just sqrt.

Test vector from test_transformIntensity-methods.R

intensity = (1:10)^2 → transformIntensity(method="sqrt") → 1:10

(def sqrt-result (vec (signal/sqrt-transform (mapv #(* % %) (range 1.0 11.0)))))

sqrt-result

[1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0]

9. Full Pipeline vs R

End-to-end pipeline comparison on MALDIquant’s fiedler2009subset real MALDI-TOF data (Fiedler et al. 2009).

(r "data('fiedler2009subset', package='MALDIquant')")

[1] "fiedler2009subset"

(def real-masses (r->clj (r "mass(fiedler2009subset[[1]])")))

(def real-intensities (r->clj (r "intensity(fiedler2009subset[[1]])")))

(kind/table
 {:column-names ["Metric" "Value"]
  :row-vectors [["Data Points" (count real-masses)]
                ["Mass Range" (str (first real-masses) " – " (last real-masses))]]})

Metric	Value
Data Points	42388
Mass Range	1000.0150470845815 – 9999.734225176804

R pipeline result

(def r-pipeline
  (r->clj (r "intensity(calibrateIntensity(
                removeBaseline(
                  smoothIntensity(
                    transformIntensity(fiedler2009subset[[1]], method='sqrt'),
                    method='SavitzkyGolay',
                    halfWindowSize=5),
                  method='SNIP',
                  iterations=25),
                method='TIC'))")))

Clojure pipeline result

(def clj-pipeline
  (let [processed (signal/preprocess-spectrum-data
                   {:mass real-masses :intensity real-intensities}
                   {:should-sqrt-transform true
                    :smooth-window 11
                    :smooth-polynomial 2
                    :baseline-iterations 25
                    :baseline-repetitions 1
                    :should-tic-normalize true
                    :tic-target 1.0})]
    (vec (:intensity processed))))

Comparison

Edge handling differences in SG smoothing cause small deviations. With thousands of data points, these are proportionally tiny.

(def pipeline-max-diff
  (apply max (map #(Math/abs %) (map - clj-pipeline r-pipeline))))

pipeline-max-diff

4.607880556342297E-6

Summary

(kind/table
 {:column-names ["Function" "Status" "Notes"]
  :row-vectors [["Local Maxima" "exact match" "All test vectors pass"]
                ["MAD Noise" "exact match" "Fixed: uses R-compatible median"]
                ["Peak Detection" "exact match" "Fixed: noise=0 threshold logic"]
                ["Savitzky-Golay" "interior match" "Edge handling differs (jDSP vs custom coefficients)"]
                ["Negative Clamping" "correct" "Smoothed values clamped to 0.0"]
                ["SNIP Baseline" "exact match" "Both decreasing and increasing orders"]
                ["TIC Normalization" "exact match" "Trapezoid area calculation"]
                ["Sqrt Transform" "exact match" "Trivial"]
                ["Full Pipeline" "< 1e-5" "Edge effects only, real data"]]})

Function	Status	Notes
Local Maxima	exact match	All test vectors pass
MAD Noise	exact match	Fixed: uses R-compatible median
Peak Detection	exact match	Fixed: noise=0 threshold logic
Savitzky-Golay	interior match	Edge handling differs (jDSP vs custom coefficients)
Negative Clamping	correct	Smoothed values clamped to 0.0
SNIP Baseline	exact match	Both decreasing and increasing orders
TIC Normalization	exact match	Trapezoid area calculation
Sqrt Transform	exact match	Trivial
Full Pipeline	< 1e-5	Edge effects only, real data

source: notebooks/ripple_book/maldiquant_test_vectors.clj