11 MALDIquant Test Vectors
This notebook validates Ripple against MALDIquant’s own (Gibb & Strimmer, 2016) testthat test suite (version 1.22.3).
Rather than designing our own test cases, we extract the concrete expected values that MALDIquant’s developers use for their unit tests and verify that Ripple produces identical results. This is the strongest form of compatibility validation — if we pass MALDIquant’s own tests, our implementations are faithful to the reference.
Note: This notebook requires impl namespaces to test individual internal functions at a finer granularity than the public API exposes.
(ns ripple-book.maldiquant-test-vectors
(:require
;; Ripple MALDI signal processing internals:
[scicloj.ripple.maldi.impl.signal :as signal]
;; R interop (https://github.com/scicloj/clojisr):
[clojisr.v1.r :as r :refer [r r->clj]]
;; Annotating kinds of visualizations (https://scicloj.github.io/kindly-noted/):
[scicloj.kindly.v4.kind :as kind]))Setup
(r "library(MALDIquant)")[1] "MALDIquant" "Rserve" "stats" "graphics" "grDevices"
[6] "utils" "datasets" "methods" "base"
1. Local Maxima Detection
A local maximum is a point whose intensity is the highest within its sliding window. MALDIquant pads the input with zeros on both sides before running a C-level sliding-window scan. The internal R function .localMaxima is called by detectPeaks.
Test vector from test_localMaxima-functions.R
y = c(1, 1, 2, 1, 1, 3, 4) with halfWindowSize=1 Expected: c(TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE)
(def maxima-test-1
(let [y [1.0 1.0 2.0 1.0 1.0 3.0 4.0]
result (signal/find-local-maxima-logical y {:half-window-size 1})]
(vec (for [i (range (count y))] (aget result i)))))maxima-test-1[true false true false false false true]Test vectors from test_findLocalMaximaLogical-methods.R
intensity = c(1, 2, 1, 2, 1):
halfWindowSize=1→c(FALSE, TRUE, FALSE, TRUE, FALSE)halfWindowSize=2→c(FALSE, TRUE, FALSE, FALSE, FALSE)
(def maxima-test-hws1
(let [y [1.0 2.0 1.0 2.0 1.0]
result (signal/find-local-maxima-logical y {:half-window-size 1})]
(vec (for [i (range 5)] (aget result i)))))maxima-test-hws1[false true false true false](def maxima-test-hws2
(let [y [1.0 2.0 1.0 2.0 1.0]
result (signal/find-local-maxima-logical y {:half-window-size 2})]
(vec (for [i (range 5)] (aget result i)))))maxima-test-hws2[false true false false false]2. MAD Noise Estimation
MALDIquant uses R’s stats::mad for Median Absolute Deviation: median(|x - median(x)|) * 1.4826.
A subtle but critical detail: the median must use the standard interpolated definition (average of two middle values for even-length sequences). See the even-length test below.
Test vector from test_estimateNoise-methods.R
intensity = rep(10:1, 2) → noise = stats::mad(rep(10:1, 2))
(def mad-test-data (vec (concat (range 10 0 -1) (range 10 0 -1))))(def r-mad (first (r->clj (r (str "mad(c(" (clojure.string/join "," mad-test-data) "))")))))(def clj-mad (signal/estimate-noise-mad (mapv double mad-test-data) {})){:r r-mad :clj clj-mad :diff (Math/abs (- clj-mad r-mad))}{:r 3.7064999999999997, :clj 3.7064999999999997, :diff 0.0}Even-length MAD (exercises the median fix)
For [1, 2, 3, 4]:
- R median = 2.5 (average of 2 and 3)
- abs deviations = [1.5, 0.5, 0.5, 1.5] → sorted [0.5, 0.5, 1.5, 1.5]
- median of abs deviations = 1.0
- MAD = 1.0 * 1.4826 = 1.4826
(def mad-even-test
(let [x [1.0 2.0 3.0 4.0]
clj-result (signal/estimate-noise-mad x {})]
{:clj clj-result :expected 1.4826}))mad-even-test{:clj 1.4826, :expected 1.4826}3. Peak Detection
MALDIquant’s detectPeaks combines three steps:
- Find local maxima (sliding window)
- Estimate noise using MAD
- Keep peaks where
intensity > (SNR * noise)
When noise is zero (e.g. a constant-valued spectrum), the threshold becomes zero, so all positive local maxima pass. This matches MALDIquant’s isAboveNoise check.
Test vector from test_detectPeaks-methods.R
createMassSpectrum(mass=1:5, intensity=c(1, 2, 1, 2, 1)) detectPeaks(halfWindowSize=1) → peaks at masses 2 and 4
(def peaks-test
(let [intensities [1.0 2.0 1.0 2.0 1.0]
peaks (signal/detect-peaks intensities {:half-window-size 1 :snr 2})]
peaks))peaks-test[1 3]Verify peak intensities
(mapv #(nth [1.0 2.0 1.0 2.0 1.0] %) peaks-test)[2.0 2.0]4. Savitzky-Golay Smoothing
Savitzky-Golay smoothing fits local polynomials in a sliding window. MALDIquant computes custom boundary coefficients for edge handling, while jDSP (our Java dependency) uses a different padding strategy. Interior values match exactly; edge values differ slightly.
Test vector from test_smoothingFilters-functions.R
Scrambled 20-element vector with halfWindowSize=2, polynomialOrder=3
(def sg-test-values
[8.0 1.0 7.0 6.0 3.0 13.0 5.0 2.0 19.0 11.0
15.0 18.0 9.0 10.0 20.0 12.0 17.0 14.0 16.0 4.0])(def sg-r-expected
[7.64285714285716 2.42857142857143 4.85714285714286
5.14285714285714 6.94285714285714 8.37142857142857
5.68571428571428 7.14285714285714 11.9714285714286
15.2857142857143 14.8285714285714 15.1714285714286
10.9714285714286 12.2285714285714 15.0285714285714
16.4571428571429 14.0857142857143 16.7428571428571
14.1714285714286 4.45714285714283])(def sg-clj-result
(vec (signal/savitzky-golay-smooth sg-test-values {:window-size 5 :polynomial-order 3})))Interior values (indices 2–17) should match exactly
(def sg-interior-diffs
(mapv (fn [i] {:index i
:diff (Math/abs (- (nth sg-clj-result i) (nth sg-r-expected i)))})
(range 2 18)))(every? #(< (:diff %) 1e-10) sg-interior-diffs)trueEdge values (indices 0, 1, 18, 19) differ due to boundary handling
(def sg-edge-diffs
(mapv (fn [i] {:index i
:clj (nth sg-clj-result i)
:r (nth sg-r-expected i)
:diff (Math/abs (- (nth sg-clj-result i) (nth sg-r-expected i)))})
[0 1 18 19]))(kind/table sg-edge-diffs)| index | clj | r | diff |
|---|---|---|---|
| 0 | 5.685714285714287 | 7.64285714285716 | 1.9571428571428733 |
| 1 | 4.42857142857143 | 2.42857142857143 | 1.9999999999999996 |
| 18 | 12.142857142857142 | 14.1714285714286 | 2.028571428571457 |
| 19 | 7.257142857142858 | 4.45714285714283 | 2.8000000000000282 |
SG order-0 = moving average (MALDIquant’s internal check)
(def sg-ma-r-expected
(r->clj (r "MALDIquant:::.movingAverage(1:10, halfWindowSize=2)")))(def sg-ma-clj
(vec (signal/savitzky-golay-smooth (mapv double (range 1 11))
{:window-size 5 :polynomial-order 0})))Interior match for SG order-0 vs moving average
(def sg-ma-interior-match
(every? #(< (Math/abs %) 1e-10)
(map - (subvec sg-ma-clj 2 8) (subvec (vec sg-ma-r-expected) 2 8))))sg-ma-interior-matchtrue5. Negative Clamping
MALDIquant’s .transformIntensity replaces negative values with zeros after any transformation (including smoothing). Our SG implementation must also clamp negatives.
(def clamp-test
(let [intensities [0.0 0.0 0.0 10.0 0.0 0.0 0.0]
smoothed (vec (signal/savitzky-golay-smooth intensities {:window-size 5 :polynomial-order 2}))]
{:all-non-negative? (every? #(>= % 0.0) smoothed)
:smoothed smoothed}))(:all-non-negative? clamp-test)true6. SNIP Baseline Removal
The SNIP algorithm (Ryan et al. 1988) iteratively clips peaks to estimate the baseline. removeBaseline subtracts the estimated baseline from the original. MALDIquant supports both decreasing (default) and increasing window order.
Test vector from test_removeBaseline-methods.R
intensity = rep(10:1, 2), iterations=2, decreasing=TRUE: Expected corrected = c(rep(0, 10), 7.5, 5, 2.5, rep(0, 7))
(def snip-test-intensities
(mapv double (concat (range 10 0 -1) (range 10 0 -1))))(def snip-dec-result
(vec (signal/snip-baseline-removal snip-test-intensities {:iterations 2 :decreasing true})))(def snip-dec-expected
(vec (concat (repeat 10 0.0) [7.5 5.0 2.5] (repeat 7 0.0))))snip-dec-result[0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
7.5
5.0
2.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0]Increasing window order
(def snip-inc-result
(vec (signal/snip-baseline-removal snip-test-intensities {:iterations 2 :decreasing false})))(def snip-inc-expected
[0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 5.0 2.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0])snip-inc-result[0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
5.0
5.0
2.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0]With default iterations (100)
(def snip-dec-100
(vec (signal/snip-baseline-removal snip-test-intensities {:iterations 100 :decreasing true})))(def snip-dec-100-expected
(r->clj (r "as.numeric(intensity(removeBaseline(createMassSpectrum(1:20, rep(10:1, 2)), iterations=100, decreasing=TRUE)))")))(def snip-dec-100-max-diff
(apply max (map #(Math/abs (- %1 %2)) snip-dec-100 snip-dec-100-expected)))snip-dec-100-max-diff0.07. TIC Normalization
MALDIquant normalizes using the trapezoid rule for area calculation: sum((left + right) / 2 * diff(mass)).
Test vector from test_totalIonCurrent-methods.R
mass=1:10, intensity=1:10 → TIC = 49.5
(def tic-masses (mapv double (range 1 11)))(def tic-intensities (mapv double (range 1 11)))(def tic-area
(reduce + (map (fn [i1 i2 m1 m2]
(* (/ (+ i1 i2) 2.0) (- m2 m1)))
tic-intensities (rest tic-intensities)
tic-masses (rest tic-masses))))tic-area49.5After TIC normalization, area should be 1.0
(def tic-normalized
(vec (signal/tic-normalize tic-masses tic-intensities {:target-area 1.0})))(def tic-normalized-area
(reduce + (map (fn [i1 i2 m1 m2]
(* (/ (+ i1 i2) 2.0) (- m2 m1)))
tic-normalized (rest tic-normalized)
tic-masses (rest tic-masses))))tic-normalized-area1.0Exact values match R’s calibrateIntensity
(def r-tic-normalized
(r->clj (r "intensity(calibrateIntensity(createMassSpectrum(1:10, as.double(1:10)), method='TIC'))")))(def tic-max-diff
(apply max (map #(Math/abs (- %1 %2)) tic-normalized r-tic-normalized)))tic-max-diff2.7755575615628914E-178. Square Root Transformation
MALDIquant applies sqrt then clamps negatives to zero. Since sqrt of non-negative values is always non-negative, this is effectively just sqrt.
Test vector from test_transformIntensity-methods.R
intensity = (1:10)^2 → transformIntensity(method="sqrt") → 1:10
(def sqrt-result (vec (signal/sqrt-transform (mapv #(* % %) (range 1.0 11.0)))))sqrt-result[1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0]9. Full Pipeline vs R
End-to-end pipeline comparison on MALDIquant’s fiedler2009subset real MALDI-TOF data (Fiedler et al. 2009).
(r "data('fiedler2009subset', package='MALDIquant')")[1] "fiedler2009subset"
(def real-masses (r->clj (r "mass(fiedler2009subset[[1]])")))(def real-intensities (r->clj (r "intensity(fiedler2009subset[[1]])")))(kind/table
{:column-names ["Metric" "Value"]
:row-vectors [["Data Points" (count real-masses)]
["Mass Range" (str (first real-masses) " – " (last real-masses))]]})| Metric | Value |
|---|---|
| Data Points | 42388 |
| Mass Range | 1000.0150470845815 – 9999.734225176804 |
R pipeline result
(def r-pipeline
(r->clj (r "intensity(calibrateIntensity(
removeBaseline(
smoothIntensity(
transformIntensity(fiedler2009subset[[1]], method='sqrt'),
method='SavitzkyGolay',
halfWindowSize=5),
method='SNIP',
iterations=25),
method='TIC'))")))Clojure pipeline result
(def clj-pipeline
(let [processed (signal/preprocess-spectrum-data
{:mass real-masses :intensity real-intensities}
{:should-sqrt-transform true
:smooth-window 11
:smooth-polynomial 2
:baseline-iterations 25
:baseline-repetitions 1
:should-tic-normalize true
:tic-target 1.0})]
(vec (:intensity processed))))Comparison
Edge handling differences in SG smoothing cause small deviations. With thousands of data points, these are proportionally tiny.
(def pipeline-max-diff
(apply max (map #(Math/abs %) (map - clj-pipeline r-pipeline))))pipeline-max-diff4.607880556342297E-6Summary
(kind/table
{:column-names ["Function" "Status" "Notes"]
:row-vectors [["Local Maxima" "exact match" "All test vectors pass"]
["MAD Noise" "exact match" "Fixed: uses R-compatible median"]
["Peak Detection" "exact match" "Fixed: noise=0 threshold logic"]
["Savitzky-Golay" "interior match" "Edge handling differs (jDSP vs custom coefficients)"]
["Negative Clamping" "correct" "Smoothed values clamped to 0.0"]
["SNIP Baseline" "exact match" "Both decreasing and increasing orders"]
["TIC Normalization" "exact match" "Trapezoid area calculation"]
["Sqrt Transform" "exact match" "Trivial"]
["Full Pipeline" "< 1e-5" "Edge effects only, real data"]]})| Function | Status | Notes |
|---|---|---|
| Local Maxima | exact match | All test vectors pass |
| MAD Noise | exact match | Fixed: uses R-compatible median |
| Peak Detection | exact match | Fixed: noise=0 threshold logic |
| Savitzky-Golay | interior match | Edge handling differs (jDSP vs custom coefficients) |
| Negative Clamping | correct | Smoothed values clamped to 0.0 |
| SNIP Baseline | exact match | Both decreasing and increasing orders |
| TIC Normalization | exact match | Trapezoid area calculation |
| Sqrt Transform | exact match | Trivial |
| Full Pipeline | < 1e-5 | Edge effects only, real data |