Featured Product
This Week in Quality Digest Live
Statistics Features
Donald J. Wheeler
What are the symptoms?
Douglas C. Fair
Part 3 of our series on SPC in a digital era
Scott A. Hindle
Part 2 of our series on SPC in a digital era
Donald J. Wheeler
Part 2: By trying to do better, we can make things worse
Douglas C. Fair
Introducing our series on SPC in a digital era

More Features

Statistics News
How to use Minitab statistical functions to improve business processes
New capability delivers deeper productivity insights to help manufacturers meet labor challenges
Day and a half workshop to learn, retain, and transfer GD&T knowledge across an organization
Elsmar Cove is a leading forum for quality and standards compliance
InfinityQS’ quality solutions have helped cold food and beverage manufacturers around the world optimize quality and safety
User friendly graphical user interface makes the R-based statistical engine easily accessible to anyone
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Ability to subscribe with single-user minimum, floating license, and no long-term commitment

More News

Patrick Runkel

Statistics

So Why Is It Called ‘Regression,’ Anyway?

There’s nothing particularly regressive about it

Published: Wednesday, March 22, 2017 - 11:02

Did you ever wonder why statistical analyses and concepts often have such weird, cryptic names?

One conspiracy theory points to the workings of a secret committee called the ICSSNN. The International Committee for Sadistic Statistical Nomenclature and Numerophobia was formed solely to befuddle and subjugate the masses. Its mission: To select the most awkward, obscure, and confusing name possible for each statistical concept.

A whistle-blower recently released the following transcript of a secretly recorded ICSSNN meeting:

“This statistical analysis seems pretty straightforward....”

“What does it do?”

“It describes the relationship between one or more ‘input’ variables and an ‘output’ variable. It gives you an equation to predict values for the ‘output’ variable, by plugging in values for the input variables.”

“Oh dear. That sounds disturbingly transparent.”

“Yes. We need to fix that—call it something grey and nebulous. What do you think of ‘regression’?”

“What’s ‘regressive’ about it? 

“Nothing at all. That’s the point!”

Re-gres-sion. It does sound intimidating. I’d be afraid to try that alone.”

“Are you sure it’s completely unrelated to anything? Sounds a lot like ‘digression.’ Maybe it’s what happens when you add up umpteen sums of squares; you forget what you were talking about.”

“Maybe it makes you regress and relive your traumatic memories of high-school math... until you  revert to a fetal position?”

“No, no. It’s not connected with anything concrete at all.”

“Then it’s perfect!”

“I don’t know... it only has three syllables. I’d feel better if it were at least seven syllables and hyphenated.”

“I agree. Phonetically, it’s too easy. People are even likely to pronounce it correctly. Could we add an uvular fricative, or an interdental retroflex followed by a sustained turbulent trill?”

The real story: how regression got its name

Conspiracy theories aside, the term “regression” in statistics was probably not a result of the workings of the ICSSNN. Instead, the term is usually attributed to Sir Francis Galton.


Sir Francis Galton, wearer of many hats

Galton was a 19th-century English Victorian who wore many hats: explorer, inventor, meteorologist, anthropologist, and—most important for the field of statistics—an inveterate measurement nut. You might call him a statistician’s statistician. Galton just couldn’t stop measuring anything and everything around him.

During a meeting of the Royal Geographical Society, Galton devised a way to roughly quantify boredom: He counted the number of fidgets of the audience in relation to the number of breaths he took (he didn’t want to attract attention using a timepiece). Galton then converted the results on a time scale to obtain a mean rate of one fidget per minute per person. Decreases or increases in the rate could then be used to gauge audience interest levels. (That mean fidget rate was calculated in 1885. I’d guess the mean fidget rate is astronomically higher today—especially if glancing at an electronic device counts as a fidget.)

Galton also noted the importance of considering sampling bias in his fidget experiment. “These observations should be confined to persons of middle age,” Galton wrote. “Children are rarely still, while elderly philosophers will sometimes remain rigid for minutes.”

But I regress....

Galton was also keenly interested in heredity. In one experiment, he collected data on the heights of 205 sets of parents with adult children. To make male and female heights directly comparable, he rescaled the female heights, multiplying them by a factor of 1.08. Then he calculated the average of the two parents’ heights (which he called the “mid-parent height”) and divided them into groups based on the range of their heights. The results are shown below, replicated on a Minitab graph.

For each group of parents, Galton then measured the heights of their adult children and plotted their median heights on the same graph.

Galton fit a line to each set of heights, and added a reference line to show the average adult height (68.25 inches).

Like most statisticians, Galton was all about deviance. So he represented his results in terms of deviance from the average adult height.

 

Based on these results, Galton concluded that as heights of the parents deviated from the average height (that is, as they became taller or shorter than the average adult), their children tended to be less extreme in height. That is, the heights of the children regressed to the average height of an adult.

He calculated the rate of regression as 2/3 of the deviance value. So if the average height of the two parents was, say, 3 in. taller than the average adult height, their children would tend to be (on average) approximately 2/3*3 = 2 in. taller than the average adult height.

Galton published his results in a paper called “Regression Towards Mediocrity in Hereditary Stature.”

So here’s the irony: The term “regression,” as Galton used it, didn’t refer to the statistical procedure he used to determine the fit lines for the plotted data points. In fact, Galton didn’t even use the least-squares method that we now most commonly associate with the term regression. (The least-squares method had already been developed some 80 years previously by Gauss and Legendre, but wasn’t called regression yet.) In his study, Galton just “eyeballed” the data values to draw the fit line.

For Galton, regression referred only to the tendency of extreme data values to “revert” to the overall mean value. In a biological sense, this meant a tendency for offspring to revert to average size (“mediocrity”) as their parentage became more extreme in size. In a statistical sense, it meant that, with repeated sampling, a variable that is measured to have an extreme value the first time tends to be closer to the mean when you measure it a second time. 

Later, as he and other statisticians built on the methodology to quantify correlation relationships and to fit lines to data values, “regression” became associated with the statistical analysis that we now call regression. But it was just by chance that Galton’s original results using a fit line happened to show a regression of heights. If his study had showed increasing deviance of childrens’ heights from the average compared to their parents, perhaps we’d be calling it “progression” instead.

So, you see, there’s nothing particularly “regressive” about a regression analysis.

And that makes the ICSSNN very happy.

Don’t regress… progress

Never let intimidating terminology deter you from using a statistical analysis. The sign on the door is often much scarier than what’s behind it. Regression is an intuitive, practical, statistical tool with broad and powerful applications.

If you’ve never performed a regression analysis before, a good place to start is the Minitab Assistant. See Jim Frost’s post on using the Assistant to perform a multiple regression analysis. He has also compiled a helpful compendium of blog posts on regression.

And don’t forget Minitab Help. In Minitab, choose Help > Help. Then click Tutorials > Regression, or  Stat Menu > Regression.

Discuss

About The Author

Patrick Runkel’s picture

Patrick Runkel

Patrick Runkel is a statistical communication specialist at Minitab LLC. He’s spent most of his professional life teaching and writing about mathematics—from helping kids divide fractions in elementary school to helping Ph.D. researchers apply logistic regression analysis in their fields. In The Minitab Blog he likes to focus on mining and presenting statistical “gems” giving you quick practical insights about statistics.

Comments

Interesting

Very interesting article.  I've always wondered where the term "regression analysis" originated.  Now we all know.

The downside:  Now it's going to be more difficult to impress my colleagues with such a simple technique.  Please don't explain 'studentized residuals' to them.  That still sounds scary enough that they won't question me on it!