Featured Product
This Week in Quality Digest Live
Operations Features
Chris Caldwell
Significant breakthroughs are required, but fully automated facilities are in the future
Dawn Bailey
Helping communities nurture the skilled workforce of the next generation
Leah Chan Grinvald
Independent repair shops are fighting for access to vehicles’ increasingly sophisticated data
Mike Figliuolo
Stay cool. It all works out.
Gad Allon
Aligning timing, leadership, and strategy is complicated

More Features

Operations News
Easy to use, automated measurement collection
A tool to help detect sinister email
Funding will scale Aigen’s robotic fleet, launching on farms in spring 2024
3D printing technology enables mass production of complex aluminum parts
High-end microscope camera for life science and industrial applications
Three new models for nondestructive inspection
Machine learning identifies flaws in real time
Developing tools to measure and improve trustworthiness

More News

Joel Smith


Gauging Gage, Part 1

How accurately can you assess your measurement system with 10 parts?

Published: Tuesday, May 9, 2017 - 11:02

‘You take 10 parts and have three operators measure each two times.”

This standard approach to a gage repeatability and reproducibility (GR&R) experiment is so common, so accepted, so ubiquitous, that few people ever question whether it is effective. Obviously, one could look at whether three is an adequate number of operators or two an adequate number of replicates, but in this first of a series of posts about “gauging gage,” I want to look at 10. Just 10 parts. How accurately can you assess your measurement system with 10 parts?

Assessing a measurement system with 10 parts

I’m going to use a simple scenario as an example. I’m going to simulate the results of 1,000 GR&R studies with the following underlying characteristics:
1. There are no operator-to-operator differences, and no operator/part interaction.
2. The measurement system variance and part-to-part variance used would result in a %Contribution of 5.88 percent, between the popular guidelines of <1% being excellent and >9% being poor.

So (no looking ahead here) based on my 1,000 simulated gage studies, what do you think the distribution of %Contribution looks like across all studies? Specifically, do you think it is centered near the true value (5.88%), or do you think the distribution is skewed, and if so, how much do you think the estimates vary?

Go ahead and think about it... I’ll just wait here for a minute.

OK, ready?

Here is the distribution, with the guidelines and true value indicated:

The good news is that it is roughly averaging around the true value.

However, the distribution is highly skewed—a decent number of observations estimated %Contribution to be at least double the true value, with one estimating it at about six times the true value! And the variation is huge. In fact, about one in four gage studies would have resulted in failing this gage.

Now a standard gage study is no small undertaking. A total of 60 data points must be collected, and once randomization and “masking” of the parts are done, it can be quite tedious (and possibly annoying to the operators). So just how many parts would be needed for a more accurate assessment of %Contribution?

Assessing a measurement system with 30 parts

I repeated 1,000 simulations, this time using 30 parts (if you’re keeping score, that’s 180 data points). And then for kicks, I went ahead and did 100 parts (that’s 600 data points). So now consider the same questions from before for these counts—mean, skewness, and variation.

Mean is probably easy: If it was centered before, it’s probably centered still.

So let’s really look at skewness and how much we were able to reduce variation:

Skewness and variation have clearly decreased, but I suspect you thought variation would have decreased more than it did. Keep in mind that %Contribution is affected by your estimates of repeatability and reproducibility as well, so you can only tighten this distribution so much by increasing the number of parts. But still, even using 30 parts—an enormous experiment to undertake—still results in this gauge failing 7 percent of the time.

So what is a quality practitioner to do?

I have two recommendations for you. First, let’s talk about %Process. Often the measurement system we are evaluating has been in place for some time, and we are simply verifying its effectiveness. In this case, rather than relying on your small sampling of parts to estimate the overall variation, you can use the historical standard deviation as your estimate and eliminate much of the variation caused by the same sample size of parts. Just enter your historical standard deviation in the Options subdialog in Minitab:

Then your output will include an additional column of information called %Process. This column is the equivalent of the %StudyVar column, but uses the historical standard deviation (which comes from a much larger sample) instead of the overall standard deviation estimated from the data collected in your experiment:

My second recommendation is to include confidence intervals in your output. This can be done in the Conf Int subdialog:

Including confidence intervals in your output doesn’t inherently improve the wide variation of estimates the standard gage study provides, but it does force you to recognize just how much uncertainty there is in your estimate. For example, consider this output from the gageaiag.mtw sample dataset in Minitab with confidence intervals turned on:

For some processes you might accept this gage based on the %Contribution being less than 9 percent. But for most processes you really need to trust your data, and the 95 percent CI of (2.14, 66.18) is a red flag that you really shouldn’t be very confident that you have an acceptable measurement system.

So the next time you run a GR&R study, put some thought into how many parts you use and how confident you are in your results.

In part two, we’ll take a look at what happens when you increase the number of operators or replicates.


About The Author

Joel Smith’s picture

Joel Smith

Joel Smith is a senior business development representative at Minitab LLC, developer of statistical software headquartered in Pennsylvania.



Whether you measure 5, 10 or 10,000 parts you are already inferring that the data set is homogenous when you go on to calculate the summary statistics to estimate the process parameters. The first task is to construct a control chart to understand whether you have homogeniety in the data set.

If you dont have homogeniety you will need to deal with it there before you even contemplate moving forward because any statistics you compute will be meaningless.