Our PROMISE: Our ads will never cover up content.

Our children thank you.

Six Sigma

Published: Monday, April 3, 2023 - 12:03

As the foundations of modern science were being laid, the need for a model for the uncertainty in a measurement became apparent. Here we look at the development of the theory of measurement error and discover its consequences.

The problem may be expressed as follows: Repeated measurements of one and the same item, if they are not rounded off too much, will frequently yield a whole range of observed values. If we let X denote one of these observed values, then it is logical to think of X as having two components. Let Y denote the actual, but unknown, value of the item measured, and let E denote the measurement error associated with that observation Then X = Y + E, and we want to use our repeated measurements to find an estimate for Y in spite of the uncertainties introduced by the error terms E. As it turns out, the best estimate to use will depend on the properties of the distribution of the error terms E.

Measurement errors are generally thought of as consisting of the combined effects of the different “environmental” conditions surrounding the measurement process, such as operators, equipment, techniques, plus all other sundry factors known or unknown. Since these conditions are logically independent of each other, we think of the measurement error for a single observation as the sum of the effects of these various environmental conditions. Even though we may not be able to define the effects of all these conditions, we still need to characterize how they all combine to produce errors of measurement

In the post-Newtonian 18th century, the only credible models were those that could be derived from first principles. But no one had developed such a model for characterizing measurement error. Various attempts had been made, but these were essentially *ad hoc* models, selected in a subjective manner for the specific problem considered. About all that could be agreed upon was that these models should be symmetric about zero, and that the likelihood of an error should decrease as the size of the error increased. And these two broad conditions would fit many different probability models, making the choice of any model rather arbitrary. Since the results of an analysis could change depending on the choice of the model for uncertainty, this subjectivity was unsatisfactory.

Finally, Pierre-Simon Laplace decided to derive, from first principles, a probability model for the uncertainties of measurement. In 1774 he proposed an elegant and deceptively simple curve as an error distribution.

Today this curve is known as the Laplace distribution. It is shown, in standardized form, in Figure 1. This curve is very difficult to work with, and it suffers another serious drawback which made Laplace reluctant to publish his result.

The drawback of the Laplace distribution is the following: Given multiple observations of a single quantity, it is intuitive to use the average of these observations as the estimate of that quantity. But does the average represent the best estimate? The answer to this question will depend on the probability model for the errors of observation; Laplace’s model of 1774 does not support the average as the best estimate.

Between this failing, and the intractability of the curve itself, the 1774 model for the errors of observation was of no practical use, even though the derivation of the curve from first principles was itself a remarkable technical achievement.

Being dissatisfied with his first attempt at an error distribution, Laplace continued to work on this problem and in 1777 came up with another model. The argument supporting this new model was very complex, running to a total of 20 pages in the first published version. This 1777 model for the error function used the curve defined by:

This distribution is symmetric about 0. It gives a decreasing likelihood for an error as you move away from 0. And it places a bound, *a*, on the magnitude of an error. It is shown in standardized form in Figure 2.

The error distribution of 1777 had one redeeming feature that the distribution of 1774 did not have: The new error curve would allow the use of the average of the observations as the best estimate of a quantity. However, in spite of the triumph of the 1777 model, the new curve was still not a practical error function—a fact that Laplace recognized and acknowledged in the original work. Following these two unsuccessful attempts to characterize measurement uncertainty from first principles, Laplace quit working on this problem. However, 33 years later, Laplace found the solution to this problem as he extended the work of Abraham de Moivre.

In the 1730s, de Moivre had published an approximation to the central terms of a binomial probability model. This approximation was offered as an aid to computation. Today it is known as the *normal distribution*.

In April of 1810, Laplace read a paper to the French Academy of Sciences in Paris that presented a major generalization of de Moivre’s result. While de Moivre had proven that:

*The total number of successes in *n* trials will, if *n* is large, be approximately normally distributed.*

Laplace had extended this to become:

*Any sum or average will, if the number of terms in the sum is large, be approximately normally distributed. *

While some exceptions and side conditions would be added later, this theorem is the fundamental limit theorem of probability theory. This result of Laplace’s is commonly referred to as the *central limit theorem*, since it establishes the central role of the normal distribution in probability theory.

Laplace’s central limit theorem creates order out of chaos. Given observations from a system that is subject to a homogeneous set of causes, we do not have to know the characteristics of the individual observations in order to know how sums or averages of large numbers of those observations will behave—they will always be approximately normally distributed! And for purposes of this theorem, “large” can be as few as 5 to 10 terms in the sum or average.

Moreover, if each of the individual observations can be thought of as being the sum of a large number of effects arising from a set of homogeneous causes (where no one cause will have a predominating effect) then the individual observations themselves will be approximately normally distributed. Thus, with the central limit theorem, Laplace opened the door to the development of techniques of statistical analysis. And this was the thrust of Laplace’s original paper.

Gauss’s book *The Theory of the Motion of Heavenly Bodies…* reached Paris in May of 1810. In this book, Gauss justified his least squares estimates by assuming that the normal distribution modeled measurement error. Upon reading Gauss’s book, Laplace realized that his theorem filled the gap in Gauss’s argument—it provided a rationale for the choice of the normal distribution. So Laplace quickly prepared an appendix to his central limit theorem paper in which he justified the role of the normal curve as an error distribution.

With the weight of Gauss and Laplace behind it, the normal curve quickly found acceptance as the appropriate model for measurement error. Today, after more than 200 years of use, mathematical tables simply refer to the normal distribution as *the error function*.

Today, some practitioners are suggesting the use of various other probability models for measurement error. Among the proposals we find lognormal distributions, Student-t distributions with small degrees of freedom (df), and even the Cauchy distribution (known to mathematicians as the Witch of Agnesi). With these proposals, we have effectively gone back to the chaos of the 18th century.

The first problem with these proposals is that they impose inappropriate structures upon the data. We think of measurement errors as the combined result of the effects of various environmental conditions. In order for a combined result to be lognormally distributed, the various effects would have to combine in a multiplicative manner. This is inconsistent with the idea that these various conditions operate independently of each other. (Probability theory requires that independent effects be combined in an additive manner).

Likewise for the Student-t distributions: A Cauchy or Student-t variable describes the behavior of a ratio of two independent quantities, and a ratio requires a multiplicative model, not an additive one. Thus, our conceptual model for measurement error is inconsistent with these proposed alternate distributions for measurement error.

In addition, all of the proposed alternatives have less uncertainty than the normal distribution. Figure 4 shows this for a couple of standardized t-distributions. The normal distribution is the distribution of maximum uncertainty.

Figure 4: Standardized Student’s T vs normal

The second problem with these proposed alternate distributions for measurement error is that *if any of these alternate distributions were correct, the central limit theorem would no longer be true! *

The normal distribution is the distribution of maximum entropy. This means that the central limit theorem is a restatement of the second law of thermodynamics. Until we repeal the second law of thermodynamics, the central limit theorem will be true, and as long as the central limit theorem is true, the normal distribution will be *the* error function.

Only the normal distribution will satisfy the logical assumption that measurement errors are the sum of various environmental conditions that operate independently of each other. Any proposal of an alternate model for measurement error reveals a lack of understanding of one of the fundamental theorems of mathematical statistics.

So the measurement error problem has been solved. It was solved over 200 years ago. The solution has a firm foundation in mathematics and physics, and has withstood the test of time.

The normal distribution is the error function.

Accept no substitutes.

Drink no snake oil.

## Comments

## Measurement Error

This article reminds me of an early undergraduate physics experiment conducted with my lab partner at Montana State University a long time ago. The purpose of the experiment was to measure the weight of an electron "n" times and report the result. I don't remember the value of "n" nor the variation of our individual measurements. What I do remember is that our report was downgraded because our average was too close to the accepted value. Measurement error was a constant during my 40 years in manufacturing.

Thanks Don and Quality Digest for this continuing series.

## Electron mass measurement

Hi, Bill Pound (Pound? what a suggestive name for a metrologist!). When I was at college (Institute of Physics of University of Sao Paulo, Brazil) I had to determine the mass/charge relation of the electron too, the famous Milikan's Experiment. And the groups that got results very close to the actual value were always considered suspicious by the professor.

We had the accumulated results from the previous classes on the wall of the lab and the gaussian was pretty clear.