Featured Product
This Week in Quality Digest Live
Operations Features
Chris Caldwell
Significant breakthroughs are required, but fully automated facilities are in the future
Dawn Bailey
Helping communities nurture the skilled workforce of the next generation
Leah Chan Grinvald
Independent repair shops are fighting for access to vehicles’ increasingly sophisticated data
Mike Figliuolo
Stay cool. It all works out.
Gad Allon
Aligning timing, leadership, and strategy is complicated

More Features

Operations News
Funding will scale Aigen’s robotic fleet, launching on farms in spring 2024
3D printing technology enables mass production of complex aluminum parts
High-end microscope camera for life science and industrial applications
Three new models for nondestructive inspection
Machine learning identifies flaws in real time
Developing tools to measure and improve trustworthiness
Partnership with local manufacturers to upskill workers for emerging technology
Encouraging production of more environmentally friendly building materials 

More News

Joel Smith


Gauging Gage, Part 3

How do you choose parts for a GR&R study?

Published: Tuesday, May 23, 2017 - 11:02

In parts one and two of “Gauging Gage,” we looked at the numbers of parts, operators, and replicates used in a gage repeatability and reproducibility (GR&R) study and how accurately we could estimate %Contribution based on the choice for each. In doing so, I hoped to provide you with valuable and interesting information, but mostly I hoped to make you like me. I mean like me so much that if I told you that you were doing something flat-out wrong and had been for years and probably screwed some things up, you would hear me out and hopefully just revert back to being indifferent toward me.

For the third (and maybe final) installment, I want to talk about something that drives me crazy. It really gets under my skin. I see it all of the time, maybe more often than not. You might even do it. If you do, I’m going to try to convince you that you are very, very wrong. If you’re an instructor, you may even have to contact past students with groveling apologies and admit you steered them wrong. And that’s the best-case scenario. Maybe instead of admitting error, you will post scathing comments at the end of this column insisting I am wrong and maybe even insulting me despite the evidence I provide here that I am, in fact, right.

Let me ask you a question:

When you choose parts to use in a GR&R study, how do you choose them?

If your answer to that question required any more than a few words—and it can be done in one word—then I’m afraid you may have been making a very popular but very bad decision. If you’re in that group, I bet you’re already reciting your rebuttal in your head now, without even hearing what I have to say. You’ve had this argument before, haven’t you? Consider whether your response was some variation on the following popular schemes:
1. Sample parts at regular intervals across the range of measurements typically seen
2. Sample parts at regular intervals across the process tolerance (lower spec to upper spec)
3. Sample randomly but pull a part from outside of either spec

No. 1 is wrong. No. 2 is wrong. No. 3 is wrong.

You see, the statistics you use to qualify your measurement system are all reported relative to the part-to-part variation, and all of the schemes I just listed do not accurately estimate your true part-to-part variation. The answer to the question that would have provided the most reasonable estimate?


But enough with the small talk. This is a statistics column, so let’s see what the statistics say.

In part one I described a simulated GR&R experiment, which I will repeat here using the standard design of 10 parts, three operators, and two replicates. The difference is that in only one set of 1,000 simulations will I randomly pull parts, and we’ll consider that our baseline. The other schemes I will simulate are as follows:
1. An “exact” sampling. While not practical in real life, this pulls parts corresponding to the fifth, 15th, 25th, ..., and 95th percentiles of the underlying normal distribution, and forms a (nearly) “exact” normal distribution as a means of seeing how much the randomness of sampling affects our estimates.
2. Parts are selected uniformly (at equal intervals) across a typical range of parts seen in production (from the fifth to the 95th percentile).
3. Parts are selected uniformly (at equal intervals) across the range of the specs, in this case assuming the process is centered with a Ppk of 1.
4. Eight of the 10 parts are selected randomly, and then one part each is used that lies one-half of a standard deviation outside of the specs.

Keep in mind that we know with absolute certainty that the underlying %Contribution is 5.88325 percent.

Random sampling for gage

Let’s use “random” as the default to compare to, which as you recall from parts one and two, already does not provide a particularly accurate estimate:

On several occasions I’ve had people tell me that you can’t just sample randomly because you might get parts that don’t really match the underlying distribution.

Sample 10 parts that match the distribution

So let’s compare the results of random sampling from above with our results if we could magically pull 10 parts that follow the underlying part distribution almost perfectly, thereby eliminating the effect of randomness:

There’s obviously something to the idea that the randomness that comes from random sampling has a big impact on our estimate of %Contribution... the “exact” distribution of parts shows much less skewness and variation and is considerably less likely to incorrectly reject the measurement system. To be sure, implementing an “exact” sample scheme is impossible in most cases: Since you don’t yet know how much measurement error you have, there’s no way to know that you’re pulling an exact distribution. What we have here is a statistical version of chicken-and-the-egg.

Sampling uniformly across a typical range of values

Let’s move on. Next up, we will compare the random scheme to scheme No. 2, sampling uniformly across a typical range of values:

So here we have a different situation: There is a very clear reduction in variation, but also a very clear bias. So while pulling parts uniformly across the typical part range gives much more consistent estimates, those estimates are likely telling you that the measurement system is much better than it really is.

Sampling uniformly across the spec range

How about collecting uniformly across the range of the specs?

This scheme results in an even more extreme bias. It makes qualifying this measurement system a certainty and in some cases even classifying it as excellent. Needless to say it does not result in an accurate assessment.

Selectively sampling outside the spec limits

Finally, how about that scheme where most of the points are taken randomly, but just one part is pulled from just outside of each spec limit? Surely just taking two of the 10 points from outside of the spec limits wouldn’t make a substantial difference, right?

Actually, those two points make a huge difference and render the study’s results meaningless! This process had a Ppk of 1, and a higher-quality process would make this result even more extreme. Clearly this is not a reasonable sampling scheme.

Why these sampling schemes?

If you were taught to sample randomly, you might be wondering why so many people would use one of these other schemes (or similar ones). They actually all have something in common that explains their use: All of them allow a practitioner to assess the measurement system across a range of possible values. After all, if you almost always produce values between 8.2 and 8.3 and the process goes out of control, how do you know that you can adequately measure a part at 8.4 if you never evaluated the measurement system at that point?

Those who choose these schemes for that reason are smart to think about that issue, but just aren’t using the right tool for it. Gage R&R evaluates your measurement system’s ability to measure relative to the current process. To assess your measurement system across a range of potential values, the correct tool to use is a bias and linearity study, which is found in the Gage Study menu in Minitab. This tool establishes for you whether you have bias across the entire range (i.e., consistently measuring high or low) or bias that depends on the value measured (for example, measuring smaller parts larger than they are and larger parts smaller than they are).

To really assess a measurement system, I advise performing both a bias and linearity study as well as a Gage R&R.

Which sampling scheme to use?

In the beginning I suggested that a random scheme be used but then clearly illustrated that the “exact” method provides even better results. Using an exact method requires you to know the underlying distribution from having enough previous data (somewhat reasonable, although existing data include measurement error) as well as to be able to measure those parts accurately enough to ensure you’re pulling the right parts. (Not too feasible. If you know you can measure accurately, why are you doing a Gage R&R?) In other words, it isn’t very realistic.

So for the majority of cases, the best we can do is to sample randomly. But we can do a reality check after the fact by looking at the average measurement for each of the parts chosen and verifying that the distribution seems reasonable. If you have a process that typically shows normality and your sample shows unusually high skewness, there’s a chance you pulled an unusual sample and may want to pull some additional parts and supplement the original experiment.


About The Author

Joel Smith’s picture

Joel Smith

Joel Smith is a senior business development representative at Minitab LLC, developer of statistical software headquartered in Pennsylvania.