15 Riffle Shuffles and the “Seven Shuffles” Theorem
How many times should you riffle-shuffle a deck of cards?
In 1992, Bayer and Diaconis proved a remarkable result: for a standard 52-card deck, the distribution after \(k\) riffle shuffles stays far from uniform until \(k \approx 7\), then rapidly converges. This sharp transition is the cutoff phenomenon.
Unlike random transpositions (which are a class function), the riffle shuffle distribution is not a class function — it depends on the specific permutation, not just its cycle type. Analyzing it requires the full machinery of matrix representations and the matrix-valued Fourier transform.
In this notebook we:
- Build irreducible representations of \(S_n\) via Young’s orthogonal form
- Compute the Gilbert-Shannon-Reeds (GSR) riffle shuffle distribution
- Use the matrix Fourier transform to analyze convergence
- Reproduce the cutoff phenomenon for small groups
(ns harmonica-book.riffle-shuffle
(:require
[scicloj.harmonica :as hm]
[tablecloth.api :as tc]
[scicloj.tableplot.v1.plotly :as plotly]
[scicloj.kindly.v4.kind :as kind]
[fastmath.matrix :as fm]))Young’s Orthogonal Form
Every irreducible representation of \(S_n\) can be realized as an orthogonal matrix representation, indexed by a partition \(\lambda\) of \(n\). The basis vectors are standard Young tableaux (SYTs) — fillings of the Young diagram of \(\lambda\) with \(1, \ldots, n\) that increase along rows and columns.
For \(S_4\) with partition \([3\;1]\), the SYTs are:
(hm/standard-young-tableaux [3 1])[[[1 2 3] [4]] [[1 2 4] [3]] [[1 3 4] [2]]]The dimension is \(d_{[3,1]} = 3\), matching the hook-length formula:
(hm/hook-length-dimension [3 1])3This matches the number of standard Young tableaux:
(count (hm/standard-young-tableaux [3 1]))3The representation matrices for the generators \(s_1, s_2, s_3\) (adjacent transpositions swapping values \(i\) and \(i+1\)):
(let [ir (hm/irrep [3 1])
gens (hm/rep-generators ir)]
(kind/table
{:column-names ["Generator" "Matrix"]
:row-vectors (mapv (fn [i g]
[(str "$s_" (inc i) "$") (str g)])
(range) gens)}))| Generator | Matrix |
|---|---|
| $s_1$ | Array2DRowRealMatrix{{1.0,0.0,0.0},{0.0,1.0,0.0},{0.0,0.0,-1.0}} |
| $s_2$ | Array2DRowRealMatrix{{1.0,0.0,0.0},{0.0,-0.5,0.8660254038},{0.0,0.8660254038,0.5}} |
| $s_3$ | Array2DRowRealMatrix{{-0.3333333333,0.9428090416,0.0},{0.9428090416,0.3333333333,0.0},{0.0,0.0,1.0}} |
These matrices are orthogonal (\(\rho(\sigma)^T \rho(\sigma) = I\)) and satisfy the homomorphism property (\(\rho(\sigma\tau) = \rho(\sigma)\rho(\tau)\)).
Dimension Table for \(S_5\)
Each partition \(\lambda\) of \(n\) gives an irrep of dimension \(d_\lambda\). The sum \(\sum_\lambda d_\lambda^2 = n!\) (a consequence of Maschke’s theorem).
(let [n 5
parts (hm/partitions n)]
(kind/table
{:column-names ["\u03bb" "d\u2097" "d\u2097\u00b2"]
:row-vectors (conj (mapv (fn [lam]
(let [d (hm/hook-length-dimension lam)]
[(str lam) d (* d d)]))
parts)
["**Total**" ""
(reduce + (map #(let [d (hm/hook-length-dimension %)]
(* d d))
parts))])}))| λ | dₗ | dₗ² |
|---|---|---|
| [5] | 1 | 1 |
| [4 1] | 4 | 16 |
| [3 2] | 5 | 25 |
| [3 1 1] | 6 | 36 |
| [2 2 1] | 5 | 25 |
| [2 1 1 1] | 4 | 16 |
| [1 1 1 1 1] | 1 | 1 |
| **Total** | 120 |
The Gilbert-Shannon-Reeds Riffle Shuffle
The GSR model for a single riffle shuffle:
- Cut the deck into two packets according to a binomial distribution
- Interleave the packets, preserving relative order within each
After \(k\) shuffles (equivalently, an \(a\)-shuffle with \(a = 2^k\)), the probability of permutation \(\sigma\) is:
\[Q_a(\sigma) = \frac{\binom{a + n - r(\sigma)}{n}}{a^n}\]
where \(r(\sigma)\) is the number of rising sequences of \(\sigma\) — maximal consecutive increasing runs. Equivalently, \(r(\sigma) = 1 + \text{descents}(\sigma^{-1})\).
For example, \([2\;0\;3\;1]\) has \(\sigma^{-1} = [1\;3\;0\;2]\), with a descent at position 1 (3 > 0), so \(r = 1 + 1 = 2\) rising sequences:
(hm/rising-sequences [2 0 3 1])2The identity always has exactly 1 rising sequence:
(hm/rising-sequences (hm/identity-perm 6))1The identity permutation always has \(r = 1\) (one rising sequence), making it the most likely outcome:
(hm/gsr-probability (hm/identity-perm 4) 1)0.3125GSR probabilities sum to 1 over all permutations:
(let [G (hm/symmetric-group 4)
elts (vec (hm/elements G))
probs (hm/gsr-distribution-vec elts 2)]
(< (Math/abs (- (reduce + (map #(probs %) (range (count elts)))) 1.0)) 1e-10))trueTotal Variation Distance: Exact Computation
For small \(n\), we can compute the exact total variation distance between the \(k\)-shuffle distribution and uniform:
\[\|Q^{*k} - U\|_{TV} = \frac{1}{2}\sum_{\sigma \in S_n} \left|Q_a(\sigma) - \frac{1}{n!}\right|\]
(let [n 5
G (hm/symmetric-group n)
elts (vec (hm/elements G))
n-elts (count elts)
uniform (/ 1.0 n-elts)
tv-data (mapv (fn [k]
(let [probs (hm/gsr-distribution-vec elts k)
tv (* 0.5 (reduce + (map #(Math/abs (- (probs %) uniform))
(range n-elts))))]
{:k k :tv tv}))
(range 1 15))]
(-> (tc/dataset tv-data)
(plotly/base {:=x :k :=y :tv})
(plotly/layer-line {:=mark-color "steelblue"})
(plotly/layer-point {:=mark-color "steelblue" :=mark-size 8})
(plotly/update-data
(fn [d] (assoc d
:=layout {:title "TV distance from uniform after k riffle shuffles (S₅)"
:xaxis {:title "k (number of shuffles)"}
:yaxis {:title "Total variation distance"
:range [0 1.05]}})))
plotly/plot))After 1 shuffle, distribution is far from uniform:
(let [n 5
G (hm/symmetric-group n)
elts (vec (hm/elements G))
n-elts (count elts)
uniform (/ 1.0 n-elts)
probs (hm/gsr-distribution-vec elts 1)
tv (* 0.5 (reduce + (map #(Math/abs (- (probs %) uniform)) (range n-elts))))]
(> tv 0.5))trueAfter many shuffles, distribution converges to uniform:
(let [n 5
G (hm/symmetric-group n)
elts (vec (hm/elements G))
n-elts (count elts)
uniform (/ 1.0 n-elts)
probs (hm/gsr-distribution-vec elts 14)
tv (* 0.5 (reduce + (map #(Math/abs (- (probs %) uniform)) (range n-elts))))]
(< tv 0.01))trueThe cutoff is at \(k \approx \tfrac{3}{2}\log_2(n)\). For \(n = 5\), that is \(\approx 3.5\) shuffles.
Matrix Fourier Transform of the GSR Distribution
We compute \(\hat{Q}(\rho_\lambda) = \sum_\sigma Q(\sigma) \cdot \rho_\lambda(\sigma)\) for all irreps of \(S_4\).
(let [n 4
G (hm/symmetric-group n)
parts (hm/partitions n)
irreps (mapv hm/irrep parts)
k 2
f (fn [sigma] (hm/gsr-probability sigma k))
f-hats (hm/matrix-fourier-transform-all G f irreps)]
(kind/table
{:column-names ["\u03bb" "d\u2097"
"\u2016Q\u0302(\u03c1\u2097)\u2016"]
:row-vectors (mapv (fn [lam ir fh]
[(str lam)
(:dimension ir)
(format "%.6f" (hm/frobenius-norm fh))])
parts irreps f-hats)}))| λ | dₗ | ‖Q̂(ρₗ)‖ |
|---|---|---|
| [4] | 1 | 1.000000 |
| [3 1] | 3 | 0.269729 |
| [2 2] | 2 | 0.088388 |
| [2 1 1] | 3 | 0.269729 |
| [1 1 1 1] | 1 | 0.062500 |
Plancherel Identity Verification
The Plancherel identity states:
\[\sum_{\sigma \in G} |f(\sigma)|^2 = \frac{1}{|G|} \sum_\lambda d_\lambda \|\hat{f}(\rho_\lambda)\|_F^2\]
(let [n 4
G (hm/symmetric-group n)
parts (hm/partitions n)
irreps (mapv hm/irrep parts)
f (fn [sigma] (hm/gsr-probability sigma 2))
f-hats (hm/matrix-fourier-transform-all G f irreps)
lhs (reduce + (map (fn [sigma] (let [v (f sigma)] (* v v)))
(hm/elements G)))
rhs (* (/ 1.0 (hm/order G))
(reduce + (map (fn [ir f-hat]
(* (double (hm/rep-dimension ir))
(fm/trace (fm/mulm (fm/transpose f-hat) f-hat))))
irreps f-hats)))]
{:lhs lhs :rhs rhs :difference (Math/abs (- lhs rhs))}){:lhs 0.0606689453125, :rhs 0.0606689453125, :difference 0.0}Convergence Rates: Comparing \(S_4\), \(S_5\), \(S_6\)
The cutoff at \(k \approx \tfrac{3}{2}\log_2(n)\) means larger groups need more shuffles. Let’s compare.
(let [tv-data (vec
(for [n [4 5 6]
k (range 1 15)]
(let [G (hm/symmetric-group n)
elts (vec (hm/elements G))
n-elts (count elts)
uniform (/ 1.0 n-elts)
probs (hm/gsr-distribution-vec elts k)
tv (* 0.5 (reduce + (map #(Math/abs (- (probs %) uniform))
(range n-elts))))]
{:k k :tv tv :n (str "n=" n)})))]
(-> (tc/dataset tv-data)
(plotly/base {:=x :k :=y :tv :=color :n})
(plotly/layer-line)
(plotly/layer-point {:=mark-size 6})
(plotly/update-data
(fn [d] (assoc d
:=layout {:title "Riffle shuffle cutoff phenomenon"
:xaxis {:title "k (number of shuffles)"}
:yaxis {:title "Total variation distance"
:range [0 1.05]}})))
plotly/plot))Toward \(n = 52\): The “Seven Shuffles” Result
For a standard 52-card deck, the exact computation is infeasible (\(52! \approx 8 \times 10^{67}\) permutations). However, Bayer and Diaconis computed the exact total variation distance using the formula for \(a\)-shuffles. Their result:
| k | TV distance |
|---|---|
| 1 | 1.000 |
| 3 | 1.000 |
| 5 | 0.924 |
| 6 | 0.614 |
| 7 | 0.334 |
| 8 | 0.167 |
| 10 | 0.039 |
| 12 | 0.009 |
The cutoff at \(k \approx \tfrac{3}{2}\log_2(52) \approx 8.5\) is clearly visible. After 7 shuffles the deck is “nearly random” (TV \(\approx 1/3\)), while 6 shuffles leave it decidedly non-random (TV \(> 0.6\)).
Summary
This notebook demonstrated:
- Standard Young Tableaux as the basis for irreducible representations
- Young’s orthogonal form: explicit orthogonal matrices for generators
- The GSR riffle shuffle distribution \(Q_a(\sigma) = \binom{a+n-r}{n}/a^n\)
- Matrix Fourier transform for non-class functions
- The Plancherel identity as a verification tool
- The cutoff phenomenon: a sharp transition from “far from random” to “nearly uniform” at \(k \approx \tfrac{3}{2}\log_2(n)\)
For the theory behind random transpositions — where character-level analysis suffices — see Random Transpositions.
For the representation matrices used here, see Representation Matrices.