15  Riffle Shuffles and the “Seven Shuffles” Theorem

How many times should you riffle-shuffle a deck of cards?

In 1992, Bayer and Diaconis proved a remarkable result: for a standard 52-card deck, the distribution after \(k\) riffle shuffles stays far from uniform until \(k \approx 7\), then rapidly converges. This sharp transition is the cutoff phenomenon.

Unlike random transpositions (which are a class function), the riffle shuffle distribution is not a class function — it depends on the specific permutation, not just its cycle type. Analyzing it requires the full machinery of matrix representations and the matrix-valued Fourier transform.

In this notebook we:

(ns harmonica-book.riffle-shuffle
  (:require
   [scicloj.harmonica :as hm]
   [tablecloth.api :as tc]
   [scicloj.tableplot.v1.plotly :as plotly]
   [scicloj.kindly.v4.kind :as kind]
   [fastmath.matrix :as fm]))

Young’s Orthogonal Form

Every irreducible representation of \(S_n\) can be realized as an orthogonal matrix representation, indexed by a partition \(\lambda\) of \(n\). The basis vectors are standard Young tableaux (SYTs) — fillings of the Young diagram of \(\lambda\) with \(1, \ldots, n\) that increase along rows and columns.

For \(S_4\) with partition \([3\;1]\), the SYTs are:

(hm/standard-young-tableaux [3 1])
[[[1 2 3] [4]] [[1 2 4] [3]] [[1 3 4] [2]]]

The dimension is \(d_{[3,1]} = 3\), matching the hook-length formula:

(hm/hook-length-dimension [3 1])
3

This matches the number of standard Young tableaux:

(count (hm/standard-young-tableaux [3 1]))
3

The representation matrices for the generators \(s_1, s_2, s_3\) (adjacent transpositions swapping values \(i\) and \(i+1\)):

(let [ir (hm/irrep [3 1])
      gens (hm/rep-generators ir)]
  (kind/table
   {:column-names ["Generator" "Matrix"]
    :row-vectors (mapv (fn [i g]
                         [(str "$s_" (inc i) "$") (str g)])
                       (range) gens)}))
Generator Matrix
$s_1$ Array2DRowRealMatrix{{1.0,0.0,0.0},{0.0,1.0,0.0},{0.0,0.0,-1.0}}
$s_2$ Array2DRowRealMatrix{{1.0,0.0,0.0},{0.0,-0.5,0.8660254038},{0.0,0.8660254038,0.5}}
$s_3$ Array2DRowRealMatrix{{-0.3333333333,0.9428090416,0.0},{0.9428090416,0.3333333333,0.0},{0.0,0.0,1.0}}

These matrices are orthogonal (\(\rho(\sigma)^T \rho(\sigma) = I\)) and satisfy the homomorphism property (\(\rho(\sigma\tau) = \rho(\sigma)\rho(\tau)\)).

Dimension Table for \(S_5\)

Each partition \(\lambda\) of \(n\) gives an irrep of dimension \(d_\lambda\). The sum \(\sum_\lambda d_\lambda^2 = n!\) (a consequence of Maschke’s theorem).

(let [n 5
      parts (hm/partitions n)]
  (kind/table
   {:column-names ["\u03bb" "d\u2097" "d\u2097\u00b2"]
    :row-vectors (conj (mapv (fn [lam]
                               (let [d (hm/hook-length-dimension lam)]
                                 [(str lam) d (* d d)]))
                             parts)
                       ["**Total**" ""
                        (reduce + (map #(let [d (hm/hook-length-dimension %)]
                                          (* d d))
                                       parts))])}))
λ dₗ dₗ²
[5] 1 1
[4 1] 4 16
[3 2] 5 25
[3 1 1] 6 36
[2 2 1] 5 25
[2 1 1 1] 4 16
[1 1 1 1 1] 1 1
**Total** 120

The Gilbert-Shannon-Reeds Riffle Shuffle

The GSR model for a single riffle shuffle:

  1. Cut the deck into two packets according to a binomial distribution
  2. Interleave the packets, preserving relative order within each

After \(k\) shuffles (equivalently, an \(a\)-shuffle with \(a = 2^k\)), the probability of permutation \(\sigma\) is:

\[Q_a(\sigma) = \frac{\binom{a + n - r(\sigma)}{n}}{a^n}\]

where \(r(\sigma)\) is the number of rising sequences of \(\sigma\) — maximal consecutive increasing runs. Equivalently, \(r(\sigma) = 1 + \text{descents}(\sigma^{-1})\).

For example, \([2\;0\;3\;1]\) has \(\sigma^{-1} = [1\;3\;0\;2]\), with a descent at position 1 (3 > 0), so \(r = 1 + 1 = 2\) rising sequences:

(hm/rising-sequences [2 0 3 1])
2

The identity always has exactly 1 rising sequence:

(hm/rising-sequences (hm/identity-perm 6))
1

The identity permutation always has \(r = 1\) (one rising sequence), making it the most likely outcome:

(hm/gsr-probability (hm/identity-perm 4) 1)
0.3125

GSR probabilities sum to 1 over all permutations:

(let [G (hm/symmetric-group 4)
      elts (vec (hm/elements G))
      probs (hm/gsr-distribution-vec elts 2)]
  (< (Math/abs (- (reduce + (map #(probs %) (range (count elts)))) 1.0)) 1e-10))
true

Total Variation Distance: Exact Computation

For small \(n\), we can compute the exact total variation distance between the \(k\)-shuffle distribution and uniform:

\[\|Q^{*k} - U\|_{TV} = \frac{1}{2}\sum_{\sigma \in S_n} \left|Q_a(\sigma) - \frac{1}{n!}\right|\]

(let [n 5
      G (hm/symmetric-group n)
      elts (vec (hm/elements G))
      n-elts (count elts)
      uniform (/ 1.0 n-elts)
      tv-data (mapv (fn [k]
                      (let [probs (hm/gsr-distribution-vec elts k)
                            tv (* 0.5 (reduce + (map #(Math/abs (- (probs %) uniform))
                                                     (range n-elts))))]
                        {:k k :tv tv}))
                    (range 1 15))]
  (-> (tc/dataset tv-data)
      (plotly/base {:=x :k :=y :tv})
      (plotly/layer-line {:=mark-color "steelblue"})
      (plotly/layer-point {:=mark-color "steelblue" :=mark-size 8})
      (plotly/update-data
       (fn [d] (assoc d
                      :=layout {:title "TV distance from uniform after k riffle shuffles (S₅)"
                                :xaxis {:title "k (number of shuffles)"}
                                :yaxis {:title "Total variation distance"
                                        :range [0 1.05]}})))
      plotly/plot))

After 1 shuffle, distribution is far from uniform:

(let [n 5
      G (hm/symmetric-group n)
      elts (vec (hm/elements G))
      n-elts (count elts)
      uniform (/ 1.0 n-elts)
      probs (hm/gsr-distribution-vec elts 1)
      tv (* 0.5 (reduce + (map #(Math/abs (- (probs %) uniform)) (range n-elts))))]
  (> tv 0.5))
true

After many shuffles, distribution converges to uniform:

(let [n 5
      G (hm/symmetric-group n)
      elts (vec (hm/elements G))
      n-elts (count elts)
      uniform (/ 1.0 n-elts)
      probs (hm/gsr-distribution-vec elts 14)
      tv (* 0.5 (reduce + (map #(Math/abs (- (probs %) uniform)) (range n-elts))))]
  (< tv 0.01))
true

The cutoff is at \(k \approx \tfrac{3}{2}\log_2(n)\). For \(n = 5\), that is \(\approx 3.5\) shuffles.

Matrix Fourier Transform of the GSR Distribution

We compute \(\hat{Q}(\rho_\lambda) = \sum_\sigma Q(\sigma) \cdot \rho_\lambda(\sigma)\) for all irreps of \(S_4\).

(let [n 4
      G (hm/symmetric-group n)
      parts (hm/partitions n)
      irreps (mapv hm/irrep parts)
      k 2
      f (fn [sigma] (hm/gsr-probability sigma k))
      f-hats (hm/matrix-fourier-transform-all G f irreps)]
  (kind/table
   {:column-names ["\u03bb" "d\u2097"
                   "\u2016Q\u0302(\u03c1\u2097)\u2016"]
    :row-vectors (mapv (fn [lam ir fh]
                         [(str lam)
                          (:dimension ir)
                          (format "%.6f" (hm/frobenius-norm fh))])
                       parts irreps f-hats)}))
λ dₗ ‖Q̂(ρₗ)‖
[4] 1 1.000000
[3 1] 3 0.269729
[2 2] 2 0.088388
[2 1 1] 3 0.269729
[1 1 1 1] 1 0.062500

Plancherel Identity Verification

The Plancherel identity states:

\[\sum_{\sigma \in G} |f(\sigma)|^2 = \frac{1}{|G|} \sum_\lambda d_\lambda \|\hat{f}(\rho_\lambda)\|_F^2\]

(let [n 4
      G (hm/symmetric-group n)
      parts (hm/partitions n)
      irreps (mapv hm/irrep parts)
      f (fn [sigma] (hm/gsr-probability sigma 2))
      f-hats (hm/matrix-fourier-transform-all G f irreps)
      lhs (reduce + (map (fn [sigma] (let [v (f sigma)] (* v v)))
                         (hm/elements G)))
      rhs (* (/ 1.0 (hm/order G))
             (reduce + (map (fn [ir f-hat]
                              (* (double (hm/rep-dimension ir))
                                 (fm/trace (fm/mulm (fm/transpose f-hat) f-hat))))
                            irreps f-hats)))]
  {:lhs lhs :rhs rhs :difference (Math/abs (- lhs rhs))})
{:lhs 0.0606689453125, :rhs 0.0606689453125, :difference 0.0}

Convergence Rates: Comparing \(S_4\), \(S_5\), \(S_6\)

The cutoff at \(k \approx \tfrac{3}{2}\log_2(n)\) means larger groups need more shuffles. Let’s compare.

(let [tv-data (vec
               (for [n [4 5 6]
                     k (range 1 15)]
                 (let [G (hm/symmetric-group n)
                       elts (vec (hm/elements G))
                       n-elts (count elts)
                       uniform (/ 1.0 n-elts)
                       probs (hm/gsr-distribution-vec elts k)
                       tv (* 0.5 (reduce + (map #(Math/abs (- (probs %) uniform))
                                                (range n-elts))))]
                   {:k k :tv tv :n (str "n=" n)})))]
  (-> (tc/dataset tv-data)
      (plotly/base {:=x :k :=y :tv :=color :n})
      (plotly/layer-line)
      (plotly/layer-point {:=mark-size 6})
      (plotly/update-data
       (fn [d] (assoc d
                      :=layout {:title "Riffle shuffle cutoff phenomenon"
                                :xaxis {:title "k (number of shuffles)"}
                                :yaxis {:title "Total variation distance"
                                        :range [0 1.05]}})))
      plotly/plot))

Toward \(n = 52\): The “Seven Shuffles” Result

For a standard 52-card deck, the exact computation is infeasible (\(52! \approx 8 \times 10^{67}\) permutations). However, Bayer and Diaconis computed the exact total variation distance using the formula for \(a\)-shuffles. Their result:

k TV distance
1 1.000
3 1.000
5 0.924
6 0.614
7 0.334
8 0.167
10 0.039
12 0.009

The cutoff at \(k \approx \tfrac{3}{2}\log_2(52) \approx 8.5\) is clearly visible. After 7 shuffles the deck is “nearly random” (TV \(\approx 1/3\)), while 6 shuffles leave it decidedly non-random (TV \(> 0.6\)).

Summary

This notebook demonstrated:

  • Standard Young Tableaux as the basis for irreducible representations
  • Young’s orthogonal form: explicit orthogonal matrices for generators
  • The GSR riffle shuffle distribution \(Q_a(\sigma) = \binom{a+n-r}{n}/a^n\)
  • Matrix Fourier transform for non-class functions
  • The Plancherel identity as a verification tool
  • The cutoff phenomenon: a sharp transition from “far from random” to “nearly uniform” at \(k \approx \tfrac{3}{2}\log_2(n)\)

For the theory behind random transpositions — where character-level analysis suffices — see Random Transpositions.

For the representation matrices used here, see Representation Matrices.

source: notebooks/harmonica_book/riffle_shuffle.clj