Confidence intervals, felt.
A 95% confidence interval is not a 95% probability. Seven sections from a single sample through Cumming's dance of CIs, the 1/√N price of precision, the historical origin of 95%, the Bayesian counterpart that does mean what people think, to the misreadings that survive among researchers.
One sample, one interval.
Draw N samples from a Normal(0, 1) distribution. The sample mean and the 95% interval x̄ ± 1.96 · σ/√N. Slide N — the interval shrinks, but the mean wanders.
Watch the intervals dance.
Every second a new sample of size N is drawn and its 95% interval lands on top. Hits keep the accent, misses turn ember. What looks chaotic per row stabilises around 95% over time — that is what 'nominal coverage' actually feels like. (After Cumming 2014.)
Repeat the experiment sixty times.
Each row is one experiment of size N with its own 95% interval. The nominal rate is 95%, so if the model is correct, about 57 of 60 intervals should contain μ and about 3 should miss. The empirical hit rate fluctuates around that number.
Why doubling N does not halve the interval.
The CI half-width scales with 1/√N. To halve it, you need four times as many observations. The curve makes the cost of precision visible.
The 95 is a convention, not a law.
Slide the confidence level — z stretches with it, the half-width stretches too, the cost of certainty becomes visible. The shaded middle of the bell is the chosen coverage; the two thin tails are what you are willing to be wrong about.
Fisher fixed the cutoff at 5% in his 1925 textbook, 'Statistical Methods for Research Workers'. It was an editorial decision driven by table size and printing economy, not a derivation. Cowles & Davis traced it in 1982 and Hurlbert & Lombardi reopened the case in 2009. Every CI we draw today inherits that choice. Different fields have softly moved: clinical trials still hold 95%, particle physics demands 99.99994% (5σ), epidemiology often reports both 90% and 95% side by side. The number says more about a discipline's risk tolerance than about nature.
Two intervals that look alike, one that does what people think.
Same data, same N. The dashed bar is the frequentist 95% CI — a property of the procedure. The solid bar is the 95% credible interval from a Bayesian posterior — a probability statement about μ given a prior. Slide the prior precision τ₀ from broad to tight and watch the posterior pull toward zero.
The 'there is a 95% probability μ is in the interval' reading is correct — but only for the Bayesian credible interval, and only given the prior you used. The frequentist CI does not make that statement about this interval; it makes it about the long-run behaviour of the procedure. With a flat prior and lots of data the two intervals converge, which is why the confusion survives. With a tight prior or small N they diverge, which is exactly when it matters.
What 95% does not mean.
Hoekstra et al. (2014) showed that the most common readings of a confidence interval are wrong, even among researchers. Three of them, side by side with the careful statement.
There is a 95% probability that the true mean lies in this interval.
Either the interval contains μ or it doesn't. The 95% is the long-run rate at which the procedure produces intervals that cover μ — not a probability about this one interval.
We are 95% confident that the true mean is between the bounds.
Confidence is a property of the method, not of the data at hand. If you repeated the study, 95% of the constructed intervals would cover μ.
A narrower interval means a more precise estimate of μ.
A narrower interval reflects more data, not more truth. Bias is invisible to the CI — a biased estimator with tight bounds is precisely wrong.
Samples are drawn via Box-Muller from N(μ=0, σ=1) with σ assumed known, so the 95% z-interval is x̄ ± 1.959964 · σ/√N. With unknown σ, the correct interval uses Student's t. For N ≥ 30, the difference is small. The dance loop adds one fresh sample per second, oldest falls off; the running rate stabilises around the nominal 95% only after many runs. The confidence-level slider uses the Beasley-Springer/Moro inverse-normal-CDF approximation (≈5 decimal-place accuracy). The Bayesian section assumes Normal-Normal conjugacy with prior mean 0 and prior precision 1/τ₀²; the hidden true mean is set to μ=0.4 so the data are mildly informative against the prior, which makes the posterior shift visible.
- Fisher, R. A. (1925) — Statistical Methods for Research Workers. Oliver & Boyd. The book that fixed the 5% cutoff.
- Neyman, J. (1937) — Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability. Phil. Trans. Royal Society A 236, 333-380.
- Cowles, M. & Davis, C. (1982) — On the Origins of the .05 Level of Statistical Significance. American Psychologist 37, 553-558.
- Cohen, J. (1994) — The earth is round (p < .05). American Psychologist 49, 997-1003.
- Hurlbert, S. H. & Lombardi, C. M. (2009) — Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian. Annales Zoologici Fennici 46, 311-349.
- Hoekstra, R., Morey, R. D., Rouder, J. N. & Wagenmakers, E.-J. (2014) — Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review 21, 1157-1164.
- Cumming, G. (2014) — The New Statistics: Why and How. Psychological Science 25, 7-29. Origin of the dance-of-CIs visualisation.
- Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D. & Wagenmakers, E.-J. (2016) — The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review 23, 103-123.
- Greenland, S. et al. (2016) — Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology 31, 337-350.
- Wasserstein, R. L. & Lazar, N. A. (2016) — The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician 70, 129-133.
- Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. & Rubin, D. B. (2013) — Bayesian Data Analysis, 3rd ed., CRC Press.
