This post is based on Chapter 7 of Edwin Jaynes’ Probability Theory.
The Normal distribution is undoubtedly the most common probability distribution, known for its remarkable properties. Like many others, I used to wonder why such a special—and in some ways unusual—distribution is called ‘Normal’. Historically, when Gauss was working on linear regression, he referred to the system of equations used to estimate the regression coefficients as the ‘Normal equations’. Presumably, he meant ‘Normal’ in the mathematical sense of perpendicular. This meaning persists today—for example, we call a vector perpendicular to a surface its normal vector. In regression, the normal equations are derived such that the errors are perpendicular to the fitting surface. Later on, people began associating the word normal with the distribution of the errors themselves.
Outside the field of statistics, this distribution is more commonly called the Gaussian, highlighting the Stigler law of eponymy: no scientific discovery is named after its original discoverer. In fact, the core properties of the Normal distribution were explored by Laplace when Gauss was only six years old, and the distribution itself had been discovered by de Moivre even before Laplace was born. Still, it was Gauss’s work that popularised it.
One of the most well-known results in probability theory is the Central Limit Theorem. It states that, under mild conditions, the distribution of sample averages tends toward a Normal distribution—regardless of the original distribution of the data. Interestingly, The term ‘central limit theorem’ was introduced by George Pólya, with the original German term “Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung”, with the intention that the adjective ‘central’ was to modify the noun ‘theorem’—i.e., it is the limit theorem that is ‘central’ to probability theory. But today, most people assume that central modifies limit, making it a theorem about some sort of central limit—a phrase that doesn’t really make sense. Jaynes, noting the stability and equilibrium properties of the Normal distribution—toward which many others gravitate under operations like summation, convolution, or random transformations—suggests a more meaningful name: the central distribution. That way, even if someone interprets central limit theorem as describing a ‘central limit’, it would at least clearly refer to convergence toward this central distribution.