PS
Grasping
Grasping · Statistics

Confidence intervals, felt.

A 95% confidence interval is not a 95% probability. Seven sections from a single sample through Cumming's dance of CIs, the 1/√N price of precision, the historical origin of 95%, the Bayesian counterpart that does mean what people think, to the misreadings that survive among researchers.

01 · One sample

One sample, one interval.

Draw N samples from a Normal(0, 1) distribution. The sample mean and the 95% interval x̄ ± 1.96 · σ/√N. Slide N — the interval shrinks, but the mean wanders.

Sample size N30
Mean x̄
0.136
SE
0.183
Half-width
±0.358
95% CI
[-0.22, 0.49]
02 · Dance of CIs

Watch the intervals dance.

Every second a new sample of size N is drawn and its 95% interval lands on top. Hits keep the accent, misses turn ember. What looks chaotic per row stabilises around 95% over time — that is what 'nominal coverage' actually feels like. (After Cumming 2014.)

N30
Runs
0
Hits
0 / 0
Empirical rate
03 · Sixty replications

Repeat the experiment sixty times.

Each row is one experiment of size N with its own 95% interval. The nominal rate is 95%, so if the model is correct, about 57 of 60 intervals should contain μ and about 3 should miss. The empirical hit rate fluctuates around that number.

Sample size N30
Hits
57 / 60
Empirical rate
95.0%
Nominal
95.0%
04 · Half-width vs N

Why doubling N does not halve the interval.

The CI half-width scales with 1/√N. To halve it, you need four times as many observations. The curve makes the cost of precision visible.

N50
05 · Why 95?

The 95 is a convention, not a law.

Slide the confidence level — z stretches with it, the half-width stretches too, the cost of certainty becomes visible. The shaded middle of the bell is the chosen coverage; the two thin tails are what you are willing to be wrong about.

Confidence95.0%
z
1.9600
Half-width
±0.358
vs 95%
100%
Where 95% came from

Fisher fixed the cutoff at 5% in his 1925 textbook, 'Statistical Methods for Research Workers'. It was an editorial decision driven by table size and printing economy, not a derivation. Cowles & Davis traced it in 1982 and Hurlbert & Lombardi reopened the case in 2009. Every CI we draw today inherits that choice. Different fields have softly moved: clinical trials still hold 95%, particle physics demands 99.99994% (5σ), epidemiology often reports both 90% and 95% side by side. The number says more about a discipline's risk tolerance than about nature.

06 · Frequency and belief

Two intervals that look alike, one that does what people think.

Same data, same N. The dashed bar is the frequentist 95% CI — a property of the procedure. The solid bar is the 95% credible interval from a Bayesian posterior — a probability statement about μ given a prior. Slide the prior precision τ₀ from broad to tight and watch the posterior pull toward zero.

Prior precision (τ₀)τ₀ = 2.00
0.094
μ_post
0.093
95% CI
[-0.34, 0.53]
95% credible
[-0.34, 0.53]
Why this matters

The 'there is a 95% probability μ is in the interval' reading is correct — but only for the Bayesian credible interval, and only given the prior you used. The frequentist CI does not make that statement about this interval; it makes it about the long-run behaviour of the procedure. With a flat prior and lots of data the two intervals converge, which is why the confusion survives. With a tight prior or small N they diverge, which is exactly when it matters.

07 · Common misreadings

What 95% does not mean.

Hoekstra et al. (2014) showed that the most common readings of a confidence interval are wrong, even among researchers. Three of them, side by side with the careful statement.

Misreading

There is a 95% probability that the true mean lies in this interval.

Careful

Either the interval contains μ or it doesn't. The 95% is the long-run rate at which the procedure produces intervals that cover μ — not a probability about this one interval.

Misreading

We are 95% confident that the true mean is between the bounds.

Careful

Confidence is a property of the method, not of the data at hand. If you repeated the study, 95% of the constructed intervals would cover μ.

Misreading

A narrower interval means a more precise estimate of μ.

Careful

A narrower interval reflects more data, not more truth. Bias is invisible to the CI — a biased estimator with tight bounds is precisely wrong.

How the visualisation works

Samples are drawn via Box-Muller from N(μ=0, σ=1) with σ assumed known, so the 95% z-interval is x̄ ± 1.959964 · σ/√N. With unknown σ, the correct interval uses Student's t. For N ≥ 30, the difference is small. The dance loop adds one fresh sample per second, oldest falls off; the running rate stabilises around the nominal 95% only after many runs. The confidence-level slider uses the Beasley-Springer/Moro inverse-normal-CDF approximation (≈5 decimal-place accuracy). The Bayesian section assumes Normal-Normal conjugacy with prior mean 0 and prior precision 1/τ₀²; the hidden true mean is set to μ=0.4 so the data are mildly informative against the prior, which makes the posterior shift visible.

Sources