PROMISE: Our kitties will never sit on top of content. Please turn off your ad blocker for our site.

puuuuuuurrrrrrrrrrrr

Statistics

Published: Monday, May 1, 2023 - 12:03

Ever since 1935 people have been trying to fine-tune Walter Shewhart’s simple but sophisticated process behavior chart. One of these embellishments is the use of two-sigma “warning” limits. This column will consider the theoretical and practical consequences of using two-sigma warning limits.

British statistician Egon Sharpe Pearson wanted to use warning limits set at plus or minus 1.96 sigma on either side of the central line. Others simply round the 1.96 off to 2.00. Either way, these warning limits are analogous to the 95-percent confidence intervals encountered in introductory courses in statistics. However, the use of such warning limits fails to consider the difference between the objective of a confidence interval and the purpose of using a process behavior chart.

A confidence interval is a one-time analysis that seeks to describe the properties of a specific lot or batch. It’s all about how many or how much is present. A confidence interval describes the uncertainty in an estimate of some static quantity.

A process behavior chart is a sequential analysis that characterizes a process as operating either predictably or unpredictably. Every time we plot a new point on a process behavior chart, we’re carrying out an act of analysis. Here, the outcome isn’t a static quantity but a continuing interaction with the process. As long as the process is being operated predictably, it will operate up to its full potential with no interventions needed. Whenever the process shows evidence of unpredictable operation, we can be confident that some assignable cause is affecting our process. To eliminate the excess costs due to the assignable cause, and to get the process back to operating at full potential, we’ll need to identify and control the assignable cause. Thus, the objective of a process behavior chart is to know when to take action on our process and when to refrain from taking action on it.

Decision theory shows that optimum decision rules for a one-time analysis technique will tend to have about a 5-percent risk of a false alarm. Thus, 95-percent confidence intervals are reasonable estimates for the question of how many or how much. However, with sequential techniques, optimum decision rules will require less than a 1-percent risk of a false alarm for each act of analysis. This is why a process behavior chart uses three-sigma limits rather than two-sigma limits.

Shewhart’s original three-sigma limits have been thoroughly proven in years of use for all kinds of applications and in all kinds of industries. Most of the time we don’t need to increase the sensitivity. On those rare occasions when increased sensitivity is desired, two-sigma warning limits aren’t the right way to proceed.

Points outside the limits call for action. However, we only want to take action when it’s economical to do so. So while we really want to know about the larger process changes, we don’t need to know about every little process hiccup. And history has shown that points outside the three-sigma limits tend to be economically interesting.

Theory tells us that the *a* *posteriori* probability of a point outside three-sigma limits actually representing a real process change is approximately 90 percent. However, when you have a point between a two-sigma warning limit and a three-sigma limit, the *a posteriori* probability that it represents a change in your process is only about 60 percent. So when you use two-sigma warning limits you should expect about 40 percent of your “signals” to be false alarms. (That’s only slightly better than tossing a coin.)

Since 1956 the recognized and accepted way to increase the sensitivity of a process behavior chart has been to use the Western Electric run-tests in addition to the primary detection rule of a point falling outside the three-sigma limits. For clarity’s sake, these detection rules are:

Detection Rule 1: A point outside the three-sigma limits is likely to signal a large process change.

Detection Rule 2: Two out of three successive values that are both on the same side of the average and are beyond one of the two-sigma lines are likely to signal a moderate process change.

Detection Rule 3: Four out of five successive values that are all on the same side of the average and are beyond one of the one-sigma lines are likely to signal a moderate, sustained shift in the process.

Detection Rule 4: Eight successive values on the same side of the average are likely to signal a small, sustained shift in the process.

Detection rules 2, 3, and 4 are the Western Electric run-tests. Collectively, all four rules are often referred to as the Western Electric zone tests. Because these run-tests look for smaller signals, they increase the sensitivity of a process behavior chart when they are used with Rule 1.

In the sections that follow, I’ll compare the use of these detection rules with the use of two-sigma limits as a means of increasing the sensitivity of a process behavior chart. This will be done in three ways. First, we’ll look at the power functions. Next, we’ll look at the average run length curves. Finally, we’ll look at the probabilities of a false alarm.

The power function for a statistical technique describes the probability of detecting a signal. Of course this probability will depend upon the size of the signal, the number of data available, and the technique itself. When the signal is large, useful techniques will have a 100-percent probability of detecting that signal. However, as the size of the signal gets smaller, the probability of detection will generally drop. Finally, in the limiting case where there’s no signal present, desirable techniques will have a small probability of a false alarm. If we plot the probability of detecting a signal on the vertical axis, and plot the size of the signal on the horizontal axis, then we would like to see a curve that starts near zero on the left and climbs rapidly up to 1.00 on the right. (I first published the formulas for the power functions for a process behavior chart 40 years ago. They may be found in my text *Advanced Topics in Statistical Process Control* [SPC Press, 2004] or downloaded in manuscript 321 on my website.) These formulas are for detecting a shift in location using either an *X*-chart or an average chart. To remove the effects of subgroup size, the shifts are expressed in standard error units. The curves shown here are the power functions for exactly *k* = 10 subgroups. Ten subgroups were used because Rule 4 can’t be used with fewer than eight subgroups.

Figure 1 shows the power function for using Detection Rule 1 alone. The left end-point of the power function defines the risk of a false alarm. Here we find that there is a 2.7-percent chance of a false alarm for an average chart using 10 subgroups. When a shift occurs, the probability of detecting that shift within 10 subgroups of when it actually occurs climbs as the size of the shift increases. An X-chart of average chart using Detection Rule 1 alone will have a 100-percent chance of detecting a 3.0 standard error shift in location within 10 subgroups of when that shift occurs. Since the objective is to find those shifts that are large enough to justify the expense of fixing the problem, this curve shows why Rule 1 is usually sufficient.

Figure 2 shows the power function for using Detection Rules 1 and 2. The increased sensitivity can be seen in the steeper power function curve. With Rules 1 and 2, you have a 100-percent chance of detecting a 2.5 standard error shift within 10 subgroups of when that shift occurred. However, as is always the case, using additional detection rules results in an increased risk of a false alarm. Here it’s 4.3 percent for an average chart with 10 subgroups.

Figure 3 shows the power function for using Rules 1, 2, and 3, and the power function for using all of the Western Electric zone tests. These curves are slightly steeper than the curve for Rules 1 and 2 combined. They both show a 100-percent probability of detecting a 2.0 standard error shift in location within 10 subgroups of when it occurred. The false alarm risks for these two curves for an average chart with 10 subgroups are approximately 6 percent and 8 percent. In fact, these last two power function curves show probabilities that differ by less than 0.10 for shifts smaller than 2.0 standard errors. Such small differences in power are hard to detect in practice. Using Rules 1, 2, and 3 will work about as well as using all of the detection rules. Rule 4 will only add some sensitivity to small and sustained shifts.

The curves in figure 3 are getting squeezed together because there’s a limit to steepness of the power function, and these curves are approaching that limit. Once you hit this limit, the only way to raise the power function curve is by raising the left-hand endpoint of the curve. We essentially see this beginning to happen in the last two curves of figure 3.

Figure 4 shows the power function curve for using two-sigma warning limits. Here we see that with warning limits we’re 4-percent more likely to detect a 2.0 standard error chart than we would be using Detection Rules 1 and 2. For this very slight increase in sensitivity to changes of economic importance, these warning limits increase our risk of a false alarm tenfold, from 4 percent to 38 percent!

A different perspective is provided by the average run length (*ARL*) curves. The average run length is the average number of subgroups between the occurrence of a signal and the detection of that signal. When these *ARL* values are plotted against the size of the signal, we end up with the curves in figure 5.

For shifts in excess of 2.0 standard errors, the use of two-sigma warning limits will have an average run length that is less than one subgroup smaller than that of all four detection rules. However, when there are no signals, the use of two-sigma warning limits will result in one false alarm every 22 subgroups on average. In contrast, using all four detection rules will result in one false alarm every 91 subgroups on average. So, by using two-sigma limits, you’re increasing your false alarm rate fourfold in return for a very slight advantage in detecting a signal that is large enough to be of any practical consequence.

Figure 4 showed a comparison while holding the number of subgroups constant. Figure 5 showed the average number of subgroups between the signal and the detection of that signal. As noted earlier, the risk of a false alarm on each step of a sequential procedure isn’t the same as the overall risk of a false alarm across several steps. Here we’ll look at how the false alarm probability increases as the number of subgroups increases.

Figure 6 shows the probability of a false alarm on the vertical axis and the number of subgroups considered on the horizontal axis. Here we compare the probabilities of false alarms for the traditional process behavior charts and a chart using two-sigma limits.

The use of two-sigma warning limits will result in a dramatic increase in the number of false alarms compared to the other charts. This dramatic increase begins immediately, and just gets bigger as more data are collected. Since, in order to use the charts effectively, you must investigate each and every out-of-limits point in search of an assignable cause, this excessive number of false alarms will inevitably undermine the credibility of both the chart and the person using it. In short, an excessive number of false alarms will kill the use of the charts.

In practice, most people who use process behavior charts effectively find that they have plenty of signals using Detection Rule 1. In fact, the problem is usually one of needing a procedure that is *less* sensitive, rather than more sensitive. However, for those situations where an increased sensitivity is desired, the addition of Detection Rules 2, 3, or 4 will suffice.

The only reason to collect data is to use them to take action. To use data to take action, you must have a properly balanced decision rule for interpreting those data. Otherwise, you’ll either err in the direction of missing signals or else you’ll err in the direction of taking action based on noise.

The purpose of a process behavior chart is to tell you when to take action and when to refrain from taking action; when to look for an assignable cause of exceptional variation and when not to do so. The idea behind the chart is as old as Aristotle, who taught us that the time to identify a cause is at that point where a change occurs. This is why we’re only concerned with changes that are large enough to be of economic consequence. And Shewhart’s three-sigma limits are sufficient to detect these changes virtually every time.

Tightening up limits on a process behavior chart will not improve things. You can’t squeeze the voice of the process. Tightening up the limits on a process behavior chart will only increase the false alarm rate. When you look for nonexistent assignable causes, you’ll be wasting time and effort while undermining the usefulness of the process behavior chart. Three-sigma limits strike a balance between the economic consequences of the dual mistakes of missing signals and getting false alarms.

Finally, in this column I’ve used the power functions, *ARL* curves, and false alarm probabilities computed in the usual way simply because that’s the only way to obtain valid comparisons between different techniques. These usual assumptions are that the shift in location can be modeled by a step function, that the measurements are continuous, that the measurements are independent of each other, and that the measurements are normally distributed. These assumptions are necessary to carry out the mathematics. In practice none of these assumptions are realistic. This is why theory only provides a starting place for practice. So, while theory suggests that three-sigma limits should work, almost 100 years of practice has proven beyond any doubt that they do work as expected. Make no changes. Accept no substitutes.

## Comments

## 2-Sigma Limits

I remember writing an article a few years ago about this same topic, though not nearly as sophisitcatedly as Dr. Wheeler. I ran into this problem many times in industry where someone in management felt the control limits were too wide. He just did not understand that the process determines the UCL and LCL, not the specifications or the desires of management. Long story short: These so-called "Action Limits" should be called "Tampering Limits"! "Tamper, tamper is the way. Off we go to the milky way."

## Is the process owner's intent

Is the process owner's intent to run a process that is in control or in compliance?

## Control or compliance

Both. However, if it's not in control, you cannot predict that it will be in compliance.

## Great reinforcement!

It costs, too...it's not as though you can use the two-sigma "warning" limits for free. Every false signal they pick up increases the chance that you will take action on a stable process. Those familiar with the Nelson Funnel experiment understand the dangers of that...tampering. It can increase the variation in a number of different ways, and will increase cost inevitably (you are paying someone to take action when no action is warranted or wanted).

I had a profound experience with this when I attended Dr. Wheeleer's "Advanced Topics in Statistical Process Control" seminar back in the mid-90s. He used a quincunx to demonstrate a stable process, and derived 1-, 2-, and 3-sigma limits from the data in the quincunx. Then he ran a lot of beads to show that the process was stable, and that his empirical rule held up. Then, he covered the top section so you couldn't see what was happening after the beads dropped through the funnel and drew specification limits on the board. Then he started dropping beads, and when a couple of beads started landing near a spec limit, some of the engineers in the room suggested that he shift the funnel away from the spec limit. So he did, then started dropping beads again. Every time a bead fell close to the upper or lower spec limit, he would shift the funnel (at the request of the students). After he'd done this for a couple of minutes, he pulled the paper off the front of the board and showed them that the distribution was significantly wider. Then he asked, "Do any of you use p-controllers in your processes? This is just a p-controller..."

This was well before the age of ubiquitous cell phones, and at the next class break, several of these engineers raced to the bank of cell phones out in the lobby, and called back to their factories to tell their people "Shut down the p-controllers! They're killing us!"

## But Wait, There's More...

Thank you for another excellent deep dive into why control chart rules work so effectively for sttistically stable processes.

I know you don't need to know this, but to me the other benefit of control charts over histograms is how special cause constitutes not only an alarm that something has changed, but quite often also offers information about what caused the change. The rules keep the decision objective, the visual display provides potential insight into the process.

## Since you bring it up...

Since you mention histograms, it's probably worth noting that--in analytic studies, process studies--the histogram means nothing in the absence of statistical control.