13  Random Transpositions and the Cutoff Phenomenon

How many random transpositions does it take to mix up a deck of cards?

At each step, we pick a random transposition \((i\; j)\) uniformly — including the “identity transposition” where \(i = j\) — and apply it. After \(k\) steps, how close is the resulting distribution to uniform?

Diaconis and Shahshahani (1981) proved a remarkable result: the distribution stays far from uniform until approximately \(\tfrac{1}{2}n\ln n\) steps, then rapidly converges. This sharp transition is called the cutoff phenomenon.

The analysis uses Fourier analysis on the symmetric group — exactly the character tables we computed via the Murnaghan-Nakayama rule.

(ns harmonica-book.random-transpositions
  (:require
   [scicloj.harmonica :as hm]
   [scicloj.harmonica.linalg.complex :as cx]
   [harmonica-book.book-helpers :refer [allclose?]]
   [tablecloth.api :as tc]
   [scicloj.tableplot.v1.plotly :as plotly]
   [scicloj.kindly.v4.kind :as kind]))

Character table of \(S_5\)

Let’s start by looking at the character table for a small symmetric group. Rows are indexed by partitions (labeling irreducible representations), columns by conjugacy classes (also indexed by partitions = cycle types).

(let [ct (hm/character-table (hm/symmetric-group 5))]
  (kind/table
   {:column-names (into ["Irrep \u03bb"]
                        (map #(str %) (:classes ct)))
    :row-vectors (mapv (fn [label row]
                         (into [(str label)]
                               (map #(long (cx/re %)) row)))
                       (:irrep-labels ct) (:table ct))}))
Irrep λ [1 1 1 1 1] [2 1 1 1] [2 2 1] [3 1 1] [3 2] [4 1] [5]
[5] 1 1 1 1 1 1 1
[4 1] 4 2 0 1 -1 0 -1
[3 2] 5 1 1 -1 1 -1 0
[3 1 1] 6 0 -2 0 0 0 1
[2 2 1] 5 -1 1 -1 -1 1 0
[2 1 1 1] 4 -2 0 1 1 0 -1
[1 1 1 1 1] 1 -1 1 1 -1 -1 1

Each entry \(\chi_\lambda(\mu)\) is an integer. The first column (identity class) gives the dimension \(d_\lambda\) of each irrep.

(count (:irrep-labels (hm/character-table (hm/symmetric-group 5))))
7

Row orthogonality

The character table satisfies a fundamental orthogonality relation:

\[\langle \chi_\lambda, \chi_\mu \rangle = \frac{1}{|G|} \sum_C |C|\;\chi_\lambda(C)\;\overline{\chi_\mu(C)} = \delta_{\lambda\mu}\]

(let [ct (hm/character-table (hm/symmetric-group 5))
      table (:table ct)
      sizes (:class-sizes ct)
      order 120]
  (kind/table
   {:column-names (into [""] (map str (:irrep-labels ct)))
    :row-vectors (mapv (fn [i]
                         (into [(str ((:irrep-labels ct) i))]
                               (map (fn [j]
                                      (let [v (hm/character-inner-product
                                               (table i) (table j) sizes order)]
                                        (format "%.0f" (cx/re v))))
                                    (range (count table)))))
                       (range (count table)))}))
[5] [4 1] [3 2] [3 1 1] [2 2 1] [2 1 1 1] [1 1 1 1 1]
[5] 1 0 0 0 0 0 0
[4 1] 0 1 0 0 0 0 0
[3 2] 0 0 1 0 0 0 0
[3 1 1] 0 0 0 1 0 0 0
[2 2 1] 0 0 0 0 1 0 0
[2 1 1 1] 0 0 0 0 0 1 0
[1 1 1 1 1] 0 0 0 0 0 0 1

The result is the identity matrix — each irrep is orthonormal.

(let [ct (hm/character-table (hm/symmetric-group 5))
      table (:table ct)
      sizes (:class-sizes ct)
      k (count (:irrep-labels ct))
      errors (double-array
              (for [i (range k) j (range k)]
                (- (cx/re (hm/character-inner-product
                           (table i) (table j) sizes 120))
                   (if (= i j) 1.0 0.0))))]
  (allclose? errors 0.0))
true

The random transposition walk

A random transposition on \(S_n\) picks an ordered pair \((i, j)\) uniformly from \(\{0,\ldots,n{-}1\}^2\). If \(i \neq j\), apply the transposition \((i\; j)\). If \(i = j\), do nothing.

There are \(\binom{n}{2}\) distinct transpositions and 1 identity, giving \(M = \binom{n}{2} + 1\) equally weighted outcomes. Since the distribution depends only on cycle type, it is a class function — its Fourier transform gives a scalar for each irrep.

Fourier coefficients via characters

For a class function \(f\), the Fourier coefficient at irrep \(\lambda\) is:

\[\hat{f}(\lambda) = \frac{1}{d_\lambda}\sum_\mu |C_\mu|\;f(\mu)\;\chi_\lambda(\mu)\]

For random transpositions, only the identity class and the transposition class \([2,1^{n-2}]\) contribute:

\[\beta_\lambda = \frac{1 + \binom{n}{2}\;\chi_\lambda([2,1^{n-2}])/d_\lambda}{M}\]

Closed-form eigenvalues

A key result gives the ratio \(\chi_\lambda([2,1^{n-2}])/d_\lambda\) in closed form:

\[\frac{\chi_\lambda([2,1^{n-2}])}{d_\lambda} = \frac{n(\lambda) - n(\lambda')}{\binom{n}{2}}\]

where \(n(\lambda) = \sum_i \binom{\lambda_i}{2}\) and \(\lambda'\) is the conjugate partition. The eigenvalue simplifies to:

\[\beta_\lambda = \frac{1 + n(\lambda) - n(\lambda')}{\binom{n}{2} + 1}\]

This formula only needs partition shapes — no character table — so it scales to arbitrarily large \(n\).

(defn n-stat
  "n(lambda) = sum_i C(lambda_i, 2) = sum_i lambda_i*(lambda_i-1)/2."
  [lambda]
  (reduce + (map (fn [p] (/ (* p (dec p)) 2)) lambda)))
(defn eigenvalue
  "Eigenvalue of the random transposition operator on irrep lambda of S_n."
  [n lambda]
  (let [M (inc (/ (* n (dec n)) 2))]
    (/ (double (+ 1 (n-stat lambda) (- (n-stat (hm/partition-conjugate lambda)))))
       (double M))))
(defn hook-length-dim
  "Dimension of irrep lambda via the hook-length formula: n! / prod h(i,j)."
  [lambda]
  (if (empty? lambda)
    1
    (let [n (reduce + lambda)
          conj (hm/partition-conjugate lambda)]
      (/ (reduce *' (range 1 (inc n)))
         (reduce *' (for [i (range (count lambda))
                          j (range (lambda i))]
                      (+ (- (lambda i) j)
                         (- (conj j) i)
                         -1)))))))

Let’s verify the closed-form eigenvalues against the character table for \(S_5\).

(let [n 5
      ct (hm/character-table (hm/symmetric-group n))
      table (:table ct)
      M (inc (/ (* n (dec n)) 2))
      trans-idx (.indexOf ^clojure.lang.PersistentVector (:classes ct)
                          (into [2] (repeat (- n 2) 1)))]
  (kind/table
   {:column-names ["\u03bb" "d\u2097" "\u03b2 (table)" "\u03b2 (closed)"]
    :row-vectors
    (mapv (fn [i]
            (let [lam ((:irrep-labels ct) i)
                  d (long (cx/re ((table i) 0)))
                  chi-t (long (cx/re ((table i) trans-idx)))
                  from-table (/ (+ 1.0 (* (/ (* n (dec n)) 2) (/ (double chi-t) d))) M)]
              [(str lam) d (format "%.4f" from-table) (format "%.4f" (eigenvalue n lam))]))
          (range (count table)))}))
λ dₗ β (table) β (closed)
[5] 1 1.0000 1.0000
[4 1] 4 0.5455 0.5455
[3 2] 5 0.2727 0.2727
[3 1 1] 6 0.0909 0.0909
[2 2 1] 5 -0.0909 -0.0909
[2 1 1 1] 4 -0.3636 -0.3636
[1 1 1 1 1] 1 -0.8182 -0.8182

The two columns match exactly.

The trivial representation \([n]\) always has eigenvalue 1:

(eigenvalue 5 [5])
1.0

The sign representation \([1,1,1,1,1]\) has the smallest eigenvalue:

(eigenvalue 5 [1 1 1 1 1])
-0.8181818181818182

The Upper Bound Lemma

Diaconis’s Upper Bound Lemma bounds the total variation distance between \(Q^{*k}\) (the distribution after \(k\) steps) and the uniform distribution \(U\):

\[4\,\|Q^{*k} - U\|_{\mathrm{TV}}^2 \;\le\; \sum_{\lambda \neq \mathrm{trivial}} d_\lambda^2\;|\beta_\lambda|^{2k}\]

The trivial representation (partition \([n]\)) is excluded because \(\beta = 1\).

(defn tv-upper-bound
  "Upper bound on ||Q^{*k} - U||_TV via Diaconis's Upper Bound Lemma."
  [partitions-data k]
  (let [ub (reduce + (map (fn [{:keys [dim eigenvalue]}]
                            (* dim dim (Math/pow (Math/abs (double eigenvalue)) (* 2 k))))
                          partitions-data))]
    (min 1.0 (* 0.5 (Math/sqrt ub)))))

The cutoff phenomenon

The predicted cutoff is at \(k \approx \tfrac{1}{2}n\ln n\). Let’s compute the upper bound for several deck sizes and observe the sharp transition.

(let [ns-to-plot [10 20 30 40]
      rows (vec (for [n ns-to-plot
                      :let [parts (hm/partitions n)
                            data (mapv (fn [p] {:dim (double (hook-length-dim p))
                                                :eigenvalue (eigenvalue n p)})
                                       (rest parts))
                            cutoff (* 0.5 n (Math/log n))
                            k-max (long (* 3 cutoff))]
                      k (range 0 (inc k-max) (max 1 (long (/ k-max 100))))]
                  {:n (str "n=" n) :k k :tv-bound (tv-upper-bound data k)}))]
  (-> (tc/dataset rows)
      (plotly/base {:=x :k :=y :tv-bound :=color :n})
      (plotly/layer-line)
      (plotly/update-data
       (fn [d] (assoc d
                      :=layout {:title "Random Transpositions: TV Distance Upper Bound"
                                :xaxis {:title "Steps (k)"}
                                :yaxis {:title "TV upper bound" :range [0 1.05]}})))
      plotly/plot))

All curves show the cutoff shape: near 1 for a while, then a sharp drop. The larger the deck, the sharper the transition.

Verify: after 1 step, the walk is far from uniform (TV bound > 0.5):

(let [n 10
      parts (hm/partitions n)
      data (mapv (fn [p] {:dim (double (hook-length-dim p))
                          :eigenvalue (eigenvalue n p)})
                 (rest parts))]
  (> (tv-upper-bound data 1) 0.5))
true

After many steps, the bound approaches 0:

(let [n 10
      parts (hm/partitions n)
      data (mapv (fn [p] {:dim (double (hook-length-dim p))
                          :eigenvalue (eigenvalue n p)})
                 (rest parts))]
  (< (tv-upper-bound data 100) 0.01))
true

Cutoff in normalized time

To confirm the \(\tfrac{1}{2}n\ln n\) scaling, we normalize the x-axis by dividing \(k\) by \(\tfrac{1}{2}n\ln n\). All curves should drop near \(x = 1\).

(let [ns-to-plot [10 20 30 40]
      rows (vec (for [n ns-to-plot
                      :let [parts (hm/partitions n)
                            data (mapv (fn [p] {:dim (double (hook-length-dim p))
                                                :eigenvalue (eigenvalue n p)})
                                       (rest parts))
                            cutoff (* 0.5 n (Math/log n))
                            k-max (long (* 3 cutoff))]
                      k (range 0 (inc k-max) (max 1 (long (/ k-max 100))))]
                  {:n (str "n=" n) :k-normalized (/ (double k) cutoff)
                   :tv-bound (tv-upper-bound data k)}))]
  (-> (tc/dataset rows)
      (plotly/base {:=x :k-normalized :=y :tv-bound :=color :n})
      (plotly/layer-line)
      (plotly/update-data
       (fn [d] (assoc d
                      :=layout {:title "Cutoff at k = (1/2) n ln(n)"
                                :xaxis {:title "k / ((1/2) n ln n)"}
                                :yaxis {:title "TV upper bound" :range [0 1.05]}})))
      plotly/plot))

All curves drop around \(x \approx 1\), confirming the \(\tfrac{1}{2}n\ln n\) scaling law.

Scaling to larger decks

(kind/table
 {:column-names ["n" "|S\u2099|" "\u00bdn ln n" "# partitions"]
  :row-vectors (mapv (fn [n]
                       [n
                        (str (reduce *' (range 1 (inc n))))
                        (format "%.1f" (* 0.5 n (Math/log n)))
                        (count (hm/partitions n))])
                     [5 10 15 20 30 40 52])})
n |Sₙ| ½n ln n # partitions
5 120 4.0 7
10 3628800 11.5 42
15 1307674368000 20.3 176
20 2432902008176640000 30.0 627
30 265252859812191058636308480000000 51.0 5604
40 815915283247897734345611269596115894272000000000 73.8 37338
52 80658175170943878571660636856403766975289505440883277824000000000000 102.7 281589

For a standard 52-card deck, the cutoff is at about \(\tfrac{1}{2} \cdot 52 \cdot \ln 52 \approx 103\) random transpositions.

What drives the cutoff

The Fourier transform decomposes the random walk into independent components — one per irreducible representation. Each component decays at rate \(|\beta_\lambda|^{2k}\). The slowest-decaying non-trivial component controls the mixing time.

For random transpositions, the largest non-trivial eigenvalue is \(\beta_{[n-1,1]} = (n-2)/(n+1)\), which is close to 1 for large \(n\). But because the sum includes \(d_\lambda^2\) factors (which grow combinatorially with the partition), the overall mixing time is \(\tfrac{1}{2}n\ln n\) — governed by the interplay between eigenvalue decay and multiplicity.

Connection to coupon collecting

The \(\tfrac{1}{2}n\ln n\) threshold has an intuitive explanation via the coupon collector problem. For the permutation to be fully mixed, every position \(\{0,\ldots,n{-}1\}\) must be “touched” by at least one transposition. The expected number of random events to cover all \(n\) positions is \(n H_n \sim n\ln n\), where \(H_n\) is the \(n\)-th harmonic number. Since each transposition touches two positions, we divide by 2, yielding the \(\tfrac{1}{2}n\ln n\) cutoff.

The Fourier analysis makes this precise and quantitative.

References

  • Diaconis, P. & Shahshahani, M. (1981). “Generating a random permutation with random transpositions.” Z. Wahrscheinlichkeitstheorie 57, 159–179.

  • Diaconis, P. (1988). Group Representations in Probability and Statistics. IMS Lecture Notes.

  • Sagan, B. (2001). The Symmetric Group. Springer GTM 203.

What comes next

This analysis used characters — the scalar-valued traces of representation matrices. For the riffle shuffle, characters alone are not enough. The next chapter builds the full matrix representations and uses the matrix-valued Fourier transform:

source: notebooks/harmonica_book/random_transpositions.clj