Our PROMISE: Our ads will never cover up content.

Our children thank you.

Metrology

Published: Wednesday, November 22, 2023 - 12:03

All articles in this series:

Data overload has become a common malady. Modern data collection technologies and low-cost database storage have motivated companies to collect data on almost everything. The result? Data overload. Unfortunately, few companies leverage the information hidden away in those terabytes of data.

There are inherent challenges in managing large volumes of data, not the least of which is how to extract meaningful information from those data or calculate returns on investment for those systems. Invariably, some well-meaning person will ask for a control chart using data that are gathered in millisecond increments. That presents a series of problems. Building on Part 2, this article is written to reveal practical ways of creating meaningful control charts and analyses from mountains of data.

High-speed data collection systems, such as historians (a set of time-series database applications that collects and stores industrial data, e.g., measurements), are wonderful systems. They provide traceability to all data values (e.g., speeds, feeds, temperatures) that drive manufacturing excellence.

Using a control chart to evaluate high-speed data typically results in a negative experience. Why? Because control charts were built to provide critical, and insightful, process control information from *sampled *data. That is, when working with individual measurements, control charts are used for analyzing data that has been infrequently, albeit regularly and thoughtfully, gathered from a production process. Control charts were not designed to be used to analyze data gathered every second or so. Some may interpret this to mean control charts have a serious limitation, but we disagree as discussed below.

The key to success is a thoughtful collection and use of the available data: What do you want to achieve, and what questions are you looking to answer?

Rational sampling plans are designed to allow regular, repeated gathering of small amounts of data plotted on control charts to indicate both normal and non-normal operating behaviors. Plot points might represent individual data values, averages, and ranges. We have seen sampling plans where 1, 2, 5, or more data values are gathered every few minutes, every 30 minutes, or even every hour. Sampling frequencies change based on process personalities and the sources of variation one wishes to uncover.

And that is the issue. Think about it: How much variation would you expect in an oven temperature from one-half second to another one-half second? Probably not much at all, if any. It’s likely that the data values will be identical. Why? Well, how much change in temperature would you expect from an oven across a *second *in time? Probably none.

If one were to chart temperatures in one-second increments on a control chart, this is one of the likely outcomes:

The issues seen in the chart above stem from the fact that there is little or no variation in temperature from one plot point to the next. There is too little time between the points—just one second—for the process’s routine variation to show itself.

Figure 1’s plot, or running record, has a strange, unnatural appearance. The plot points look “chunky,” since several consecutive plot points have identical values, leading one to view the chart as having “clumps” or “chunks” of data. This characteristic also results in calculations of control limits that seem too “tight,” or close together.

Twelve of the 100 points in Figure 1 fall outside a control limit. As will be seen later, these 12 points are not true indications of process change warranting investigation. The bottom line is that using a control chart with data where sampling occurs too frequently results in far too many alarms—*false* alarms—simply because data values that are seconds apart don’t capture normal and abnormal process variations.

Ultimately, historian and statistical process control (SPC) systems are complementary technologies. One shouldn’t be considered a replacement for the other. Organizations can enjoy the access to vast amounts of data that historians offer, and they can also leverage SPC systems to accurately control processes as well as improve them.

Strategy No. 1: Agree on a rational sampling plan

The word “rational” is used intentionally here. Process experts (operators, engineers, quality professionals) need to agree on the most reasonable, rational ways to sample data. Questions you will want to answer include:

• How much data should be sampled?

• How frequently should data be sampled?

• What should our sample size be (e.g., at each sampling collect two results, or three...)?

• How often can the control chart be reviewed?

• How important is the parameter being charted (e.g., critical to quality characteristic)?

Entire books have been written about sampling strategies and sample size determination. This article will not discuss the myriad ways of determining sampling plans. However, our simple advice is to gather just enough data, at just the right frequency. Not too much data, not too little. Not too frequently, and not too infrequently. We recognize that this advice is nonspecific. Yet that is exactly the point we wish to confer: Creating a sampling plan requires discussion, differing points of view, and—ultimately—rationality. Not too little, not too much. A good sampling plan should be “just right.”

Once the sampling plan has been agreed upon, stakeholders can select data from a historian (or other sources) at the agreed upon time, then manually enter it into an SPC program. This strategy is decidedly low-tech, yet the benefit of a good sampling plan with powerful and insightful control charts cannot be overstated.

Lastly, sampling plans are meant to evolve. That is, just because a sampling plan has been agreed upon, don’t assume that it will not change. In fact, assume the opposite: Based on lessons from the first sampling plan, create a second one that leverages the information revealed by the first. Then iterate. By evolving sampling plans, organizations accelerate their knowledge regarding critical processes, how best to control them, and how to benefit the organization most.

Strategy No. 2: Sample data directly from your historian

Many SPC systems have the ability to automate data sampling directly from historian data streams. This is a fantastic feature, and it should be leveraged after creating a sampling plan as specified in Strategy No. 1 above.

Let’s consider those oven temperatures we discussed earlier. Experts speculate that the heating process gradually increases over time, say within two or three hours. Stakeholders don’t want temperatures to rise too much, and they wish to adjust thermostats at just the right time to ensure the most consistent temperatures.

To capture significant changes in temperature, the team agrees to gather a single temperature data value every 15 minutes from the historian data stream. The result might be something like the chart below:

Figure 2’s control chart indicates a gradual increase in temperature over several hours. The gradual increase is detected by the two points below the limits at the start of the record and the one at the end. Moreover, the last 13 points are all above the chart’s central line. The values are easy to understand, comprehend, and interpret—even though the chart uses only 22 data values from (potentially) thousands of temperatures gathered by the historian.

This strategy helps ensure process control and cost efficiency. Note that few data values (proportionally) are gathered from the historian, and the sampling of the data occurs automatically. Simplicity and automation help ensure understanding. But remember that sampling plans should evolve over time. It’s possible that the process stakeholders may wish to increase or decrease sample frequency based on what is learned. Modifying data-sampling plans should be simple when using electronic connections between software systems.

Strategy No. 3: Calculate and save statistics for plotting on control charts

In Strategy No. 2, note that the initial sample size was one temperature value every 15 minutes. The result is four temperature data values every hour. It’s entirely possible that a manager or other stakeholder might claim, “The historian provides us so much data, we have to use more than just four temperatures every hour.” This statement, while perhaps unnecessary, isn’t totally unreasonable, especially if the sampling plan can be modified programmatically within the SPC software.

Say that the team agrees to gather a single temperature value every minute, resulting in 60 temperatures every hour. When using a control chart for individual values, this would result in 480 plot points for a single eight-hour shift. Aside from the risk of too many false alarms (as in Figure 1) and the ensuing waste of “chasing shadows,” that might be a bit overwhelming from an analysis standpoint.

Continuing with the temperature data, values were collected over approximately five and a half hours. Rather than plot every single individual data value on a control chart for individual values, consider calculating statistics from the gathered data, then plotting those statistics. Here’s how this strategy might be employed:

• Create a subgroup size of 15—one data value for each of 15 one-minute increments of time.

• Calculate the average temperature for each of the 15 data values. Each subgroup average will represent the average temperature for each 15-minute timeframe.

• Plot each of these average values on a control chart for individual values, treating the average values as individual values on the control chart. This chart is based on average temperatures for each plot point and reveals the variation in the *average *temperatures between each 15-minute timeframe.

• Calculate the sample standard deviation for each of the 15 data values. This sample standard deviation will represent the variation in temperature values for each 15-minute timeframe.

• Plot each of the standard deviation values on a separate control chart for individual values. This chart will allow users to evaluate how the standard deviation changes through time. Each plot point on this chart indicates the amount of variation in temperature values for each 15-minute increment. Collectively, the chart tells us about *how the variation is varying* through time.

The result is *two* different control charts, each with 22 plot points over the five and a half-hour period. One chart is used to evaluate the mean temperatures, and the second chart is used for evaluating the variation of the same temperatures (by using the standard deviation).

Both charts contain signals, i.e., useful information from which to generate actionable insights:

• Chart of average values: Reveals the gradual increase in temperature that was anticipated

• Chart of standard deviation values: Reveals a greater variation in temperatures at the start of the run as well as a lower extent of variation in the temperatures during the second, third, and fourth hours

Some may ask, “Why not use the typical x-bar and range or x-bar and S chart for Strategy No. 3?” Given what the team knows about temperatures, it’s likely that the data values from one minute to the next may be too autocorrelated to result in proper calculation of accurate control limits. This assumption can be verified or discarded based on evidence from the collected data.

Using high-speed data collection systems has become beneficial to manufacturing companies around the world. The same is true for SPC techniques. SPC and high-speed data collection are actually complementary technologies. But to get the most useful and relevant information from SPC, data should be sampled in a rational manner from your high-speed data collection system. Doing so will result in information that organizations can leverage to intelligently control their processes using SPC, and do so in a very efficient manner. The benefit? Better quality at lower cost—and maybe even a better year-end bonus.

*This topic—working with high-speed data—might be one of the more contentious topics in and around SPC. Do you like our recommendations? Do you disagree with us? Please share your thoughts in the comments below.*

## Comments

## Elaborations for strategy 1?

As you write, the advice for strategy 1 are somewhat nonspecidic. Any further suggestions on how you could spot when a sampling plan is not "just right"?

## "Just right"

Hi Robert, good question…

First of all, be clear on what you want to achieve, or which questions you are looking to answer. As an example, you might want to learn if the process remains stable and predictable when you change a raw material batch or supplier, or change to a different upstream feed tank, or change production shift …? Be clear that your sampling plan convinces you the question/s of interest can be answered.

I learnt from Donald Wheeler many years ago that “so long as you have signals you have enough data” … if you have signals (on your control chart) you have detected process changes that are worthy of investigation – investigate, learn what you can, and take action.

If your signals come from a chart looking like Figure 1 above you are probably sampling too fast. Use your knowledge of the process to “best guess” how quickly your process can change … (ask the shop floor what they think and get them involved). If you think, for example, that meaningful process changes cannot occur at a faster rate than every 10 minutes then don’t sample much quicker than this.

As Doug writes above, challenge your current sampling plan and adapt it to correct the weak points in it that you identify. You may not find a weak point today, but you may tomorrow. When you find the weak point adapt things accordingly.

## SPC for High speed data

One may use CUSUM control on qualit attributes for which data collection is periodic usually in lab parameters. It can supplement PID conrol.

## Another approach for charting temperature

Hi Doug,

I have been greatly enjoying your series of articles.

Another way to track the temperature data when 15 measurements form a subgroup is to use a 3-D X & MR chart. The standard deviation of the 15 readings is plotted on the bottom (S) chart, the average of the 15 on the top (X) chart, and the moving range of two consecutive averages (Xs) on the middle (MR) chart.

Take care and keep writing these interseting, and timely, articles!

Davis Bothe

## Three-Way Chart

David Bothe!

Your suggestion refers to the so-called Three-Way Charts (invented by Dr. Wheeler).