20 Least Squares and Curve Fitting

Prerequisites: matrix multiplication and linear systems (covered in the Linear Systems chapter).

Given noisy data, how do we find the “best” curve that fits it? Least squares minimises the sum of squared residuals. The solution is a linear algebra problem.

This notebook derives the normal equations, solves them with la/solve, connects the solution to QR and SVD decompositions, and applies the technique to polynomial and trigonometric fitting.

(ns lalinea-book.least-squares
  (:require
   ;; La Linea (https://github.com/scicloj/lalinea):
   [scicloj.lalinea.linalg :as la]
   [scicloj.lalinea.tensor :as t]
   [scicloj.lalinea.elementwise :as el]
   ;; Dataset manipulation (https://scicloj.github.io/tablecloth/):
   [tablecloth.api :as tc]
   ;; Interactive Plotly charts (https://scicloj.github.io/tableplot/):
   [scicloj.tableplot.v1.plotly :as plotly]
   ;; Visualization annotations (https://scicloj.github.io/kindly-noted/):
   [scicloj.kindly.v4.kind :as kind]
   [clojure.math :as math]))

The problem

We observe \(m\) data points \((x_i, y_i)\) and want to fit a model \(y = c_0 \phi_0(x) + c_1 \phi_1(x) + \cdots + c_{n-1} \phi_{n-1}(x)\) where \(\phi_j\) are known basis functions and \(c_j\) are unknown coefficients.

Stacking all observations into a matrix equation:

\[A \mathbf{c} = \mathbf{y}\]

where \(A_{ij} = \phi_j(x_i)\). When \(m > n\) (more data than unknowns), the system is overdetermined. The least squares solution minimises \(\|A\mathbf{c} - \mathbf{y}\|^2\).

Linear fit: the simplest case

For a line \(y = c_0 + c_1 x\), the basis functions are \(\phi_0(x) = 1\) and \(\phi_1(x) = x\).

Some noisy data along the line \(y = 2 + 3x\):

(def x-data
  (t/make-reader :float64 20
                 (* 0.1 idx)))

(def noise-linear
  (t/matrix [0.343 0.276 -0.285 -0.332 0.084 0.205 -0.245 -0.419
             -0.057 0.446 0.241 -0.036 0.423 -0.192 -0.363 0.106
             -0.147 0.165 -0.361 0.096]))

(def y-linear
  (el/+ (el/+ 2.0 (el/scale x-data 3.0)) noise-linear))

Build the design matrix \(A\):

(def A-linear
  (t/compute-matrix (count x-data) 2
                    (fn [i j] (if (zero? j) 1.0 (double (x-data i))))))

A-linear

#la/R [:float64 [20 2]
       [[1.000  0.000]
        [1.000 0.1000]
        [1.000 0.2000]
        [1.000 0.3000]
        [1.000 0.4000]
        [1.000 0.5000]
        [1.000 0.6000]
        [1.000 0.7000]
        [1.000 0.8000]
        [1.000 0.9000]
        [1.000  1.000]
        [1.000  1.100]
        [1.000  1.200]
        [1.000  1.300]
        [1.000  1.400]
        [1.000  1.500]
        [1.000  1.600]
        [1.000  1.700]
        [1.000  1.800]
        [1.000  1.900]]]

The observation vector \(\mathbf{y}\) as a column:

(def y-col (t/column y-linear))

The normal equations

The least-squares solution satisfies:

\[A^T A \mathbf{c} = A^T \mathbf{y}\]

This is a square system that we solve with la/solve:

(def c-linear
  (la/solve (la/mmul (la/transpose A-linear) A-linear)
            (la/mmul (la/transpose A-linear) y-col)))

c-linear

#la/R [:float64 [2 1]
       [[2.046]
        [2.949]]]

The fitted coefficients are close to the true values \(c_0 \approx 2\), \(c_1 \approx 3\).

The residual — how well does the fit match the data?

(def residual-linear
  (el/- (la/mmul A-linear c-linear) y-col))

(def rms-linear
  (math/sqrt (/ (el/sum (el/* residual-linear residual-linear))
                (count x-data))))

rms-linear

0.2702909141474731

Visualise the fit:

(let [c0 (c-linear 0 0)
      c1 (c-linear 1 0)
      x-fit (t/make-reader :float64 100 (* 0.019 idx))
      y-fit (el/+ c0 (el/scale x-fit c1))]
  (-> (tc/dataset {:x x-data
                   :y y-linear
                   :type (repeat (count x-data) "data")})
      (tc/concat (tc/dataset {:x x-fit
                              :y y-fit
                              :type (repeat 100 "fit")}))
      (plotly/base {:=x :x :=y :y :=color :type})
      (plotly/layer-point {:=mark-size 8
                           :=mark-opacity 0.6})
      (plotly/layer-line)
      plotly/plot))

Polynomial fit

For a polynomial of degree \(d\), the basis functions are \(\phi_j(x) = x^j\), \(j = 0, \ldots, d\).

The Vandermonde matrix has \(A_{ij} = x_i^j\).

Some data along \(y = 1 - 2x + x^2\):

(def x-poly
  (t/make-reader :float64 30
                 (- (* 0.2 idx) 3.0)))

(def noise-poly
  (t/matrix [0.132 -0.541  0.269  0.370  0.067  0.284 -0.096 -0.651
             -0.228  0.055  0.483 -0.301  0.088 -0.104  0.612  0.073
             -0.318  0.247 -0.428  0.518 -0.092  0.146 -0.190  0.398
             0.031 -0.263  0.195 -0.416  0.347  0.010]))

(def y-poly
  (el/+ (el/+ 1.0 (el/scale x-poly -2.0))
        (el/+ (el/* x-poly x-poly) noise-poly)))

(def vandermonde
  (fn [xs degree]
    (t/compute-matrix (count xs) (inc degree)
                      (fn [i j] (math/pow (double (xs i)) (double j))))))

(def A-poly (vandermonde x-poly 2))

(def c-poly
  (la/solve (la/mmul (la/transpose A-poly) A-poly)
            (la/mmul (la/transpose A-poly) (t/column y-poly))))

c-poly

#la/R [:float64 [3 1]
       [[ 1.033]
        [-1.995]
        [0.9969]]]

The fitted coefficients recover the true polynomial \(1 - 2x + x^2\).

Visualise:

(let [c0 (c-poly 0 0)
      c1 (c-poly 1 0)
      c2 (c-poly 2 0)
      x-fit (t/make-reader :float64 100 (- (* 0.06 idx) 3.0))
      y-fit (el/+ c0 (el/+ (el/scale x-fit c1) (el/scale (el/* x-fit x-fit) c2)))]
  (-> (tc/dataset {:x x-poly
                   :y y-poly
                   :type (repeat 30 "data")})
      (tc/concat (tc/dataset {:x x-fit
                              :y y-fit
                              :type (repeat 100 "fit")}))
      (plotly/base {:=x :x :=y :y :=color :type})
      (plotly/layer-point {:=mark-size 8
                           :=mark-opacity 0.6})
      (plotly/layer-line)
      plotly/plot))

QR-based least squares

The normal equations \(A^T A \mathbf{c} = A^T \mathbf{y}\) can be numerically unstable when \(A\) is ill-conditioned. A better approach uses the QR decomposition: \(A = QR\) where \(Q\) is orthogonal and \(R\) is upper triangular.

Substituting gives \(R \mathbf{c} = Q^T \mathbf{y}\), a triangular system that is easy to solve.

(def qr-result (la/qr A-poly))

EJML returns the full QR, where \(Q\) is \(m \times m\). For least squares we only need the first \(n\) columns — the “thin” (economy) form: \(Q_1\) is \(m \times n\), \(R_1\) is \(n \times n\).

(def ncols (second (t/shape A-poly)))

(def Q1 (t/submatrix (:Q qr-result) :all (range ncols)))

(def R1 (t/submatrix (:R qr-result) (range ncols) :all))

Verify \(A = Q_1 R_1\):

(la/norm (el/- (la/mmul Q1 R1) A-poly))

3.790726059255168E-15

Solve \(R_1 \mathbf{c} = Q_1^T \mathbf{y}\):

(def c-qr
  (la/solve R1 (la/mmul (la/transpose Q1) (t/column y-poly))))

Compare with the normal-equation solution:

(la/norm (el/- c-qr c-poly))

1.159106867033638E-15

Both methods give the same answer (to machine precision), but QR is numerically more stable.

SVD-based least squares

The pseudoinverse via SVD provides the most robust solution, especially when columns of \(A\) are nearly linearly dependent.

If \(A = U \Sigma V^T\), the least-squares solution is \(\mathbf{c} = V \Sigma^{-1} U^T \mathbf{y}\).

(def svd-result (la/svd A-poly))

EJML returns the full SVD. We extract the thin form (first \(n\) columns of \(U\)) for the pseudoinverse.

(def S-svd (:S svd-result))

(def U-thin (t/submatrix (:U svd-result) :all (range (count S-svd))))

(def Vt-svd (:Vt svd-result))

The singular values:

S-svd

#la/R [:float64 [3] [22.55 9.408 3.588]]

Compute the pseudoinverse solution: \(\mathbf{c} = V \Sigma^{-1} U_1^T \mathbf{y}\)

(def c-svd
  (let [k (count S-svd)
        S-inv (t/diag (t/make-reader :float64 k
                                     (/ 1.0 (S-svd idx))))
        Ut-y (la/mmul (la/transpose U-thin) (t/column y-poly))]
    (la/mmul (la/transpose Vt-svd) (la/mmul S-inv Ut-y))))

Compare:

(la/norm (el/- c-svd c-poly))

1.1322097734007351E-15

The la/pinv convenience wrapper computes the pseudoinverse in one call:

(la/close? c-svd (la/mmul (la/pinv A-poly) (t/column y-poly)))

true

All three methods — normal equations, QR, and SVD — agree.

Trigonometric fitting

Least squares is not limited to polynomials. We can fit any linear combination of basis functions.

Let us fit a sum of sines and cosines to a periodic signal: \(y = a_0 + a_1 \cos(x) + b_1 \sin(x) + a_2 \cos(2x) + b_2 \sin(2x)\)

(def x-trig
  (t/make-reader :float64 40
                 (* (/ (* 2.0 math/PI) 40.0) idx)))

(def noise-trig
  (t/matrix [-0.058  0.218 -0.109  0.034  0.192 -0.277  0.138  0.065
             -0.193  0.145 -0.044  0.291 -0.180  0.077  0.253 -0.116
             0.162 -0.207  0.098 -0.151  0.184 -0.023  0.211 -0.138
             0.073 -0.246  0.119  0.047 -0.183  0.264 -0.091  0.156
             -0.228  0.103  0.189 -0.144  0.268 -0.076  0.131 -0.201]))

(def y-trig
  (el/+ (el/+ 3.0 (el/scale (el/cos x-trig) 2.0))
        (el/+ (el/scale (el/sin x-trig) -1.5)
              (el/+ (el/scale (el/cos (el/scale x-trig 2.0)) 0.5)
                    noise-trig))))

(def A-trig
  (t/compute-matrix (count x-trig) 5
                    (fn [i j]
                      (let [xi (double (x-trig i))]
                        (case (int j)
                          0 1.0
                          1 (math/cos xi)
                          2 (math/sin xi)
                          3 (math/cos (* 2.0 xi))
                          4 (math/sin (* 2.0 xi)))))))

(def c-trig
  (la/solve (la/mmul (la/transpose A-trig) A-trig)
            (la/mmul (la/transpose A-trig) (t/column y-trig))))

c-trig

#la/R [:float64 [5 1]
       [[   3.019]
        [   2.001]
        [  -1.494]
        [  0.4900]
        [-0.02023]]]

The recovered coefficients are close to the true values: \(a_0 \approx 3\), \(a_1 \approx 2\), \(b_1 \approx -1.5\), \(a_2 \approx 0.5\).

Visualise:

(let [x-fit (t/make-reader :float64 200
                           (* (/ (* 2.0 math/PI) 200.0) idx))
      y-fit (el/+ (c-trig 0 0)
                  (el/+ (el/scale (el/cos x-fit) (c-trig 1 0))
                        (el/+ (el/scale (el/sin x-fit) (c-trig 2 0))
                              (el/+ (el/scale (el/cos (el/scale x-fit 2.0)) (c-trig 3 0))
                                    (el/scale (el/sin (el/scale x-fit 2.0)) (c-trig 4 0))))))]
  (-> (tc/dataset {:x x-trig
                   :y y-trig
                   :type (repeat 40 "data")})
      (tc/concat (tc/dataset {:x x-fit
                              :y y-fit
                              :type (repeat 200 "fit")}))
      (plotly/base {:=x :x :=y :y :=color :type})
      (plotly/layer-point {:=mark-size 6
                           :=mark-opacity 0.6})
      (plotly/layer-line)
      plotly/plot))

Condition number and numerical stability

The condition number \(\kappa(A) = \sigma_{\max} / \sigma_{\min}\) measures how sensitive the solution is to perturbations in the data. A large condition number means the problem is ill-conditioned.

The Vandermonde matrix for our polynomial fit:

(def condition-number (la/condition-number A-poly))

condition-number

6.284088296800099

For comparison, a higher-degree Vandermonde matrix on a wider interval has a much larger condition number:

(def A-high
  (vandermonde (t/make-reader :float64 30 (* 1.0 idx)) 8))

(la/condition-number A-high)

1.6231454034419954E12

For double-precision arithmetic, condition numbers above \(10^6\) suggest the normal equations will lose accuracy — QR or SVD should be preferred.

Summary

This notebook covered:

The normal equations \(A^T A \mathbf{c} = A^T \mathbf{y}\)
Linear and polynomial fitting via Vandermonde matrices
QR-based least squares for numerical stability
SVD-based least squares (pseudoinverse)
Trigonometric fitting with custom basis functions
Condition number as a diagnostic for ill-conditioned problems

source: notebooks/lalinea_book/least_squares.clj