Featured Product
This Week in Quality Digest Live
Statistics Features
Donald J. Wheeler
The more you know, the easier it becomes to use your data
Scott A. Hindle
Part 7 of our series on statistical process control in the digital era
Donald J. Wheeler
How you can filter out noise
Scott A. Hindle
Part 6 of our series on SPC in a digital era
Douglas C. Fair
Part 5 of our series on statistical process control in the digital era

More Features

Statistics News
How to use Minitab statistical functions to improve business processes
New capability delivers deeper productivity insights to help manufacturers meet labor challenges
Day and a half workshop to learn, retain, and transfer GD&T knowledge across an organization
Elsmar Cove is a leading forum for quality and standards compliance
InfinityQS’ quality solutions have helped cold food and beverage manufacturers around the world optimize quality and safety
User friendly graphical user interface makes the R-based statistical engine easily accessible to anyone
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Ability to subscribe with single-user minimum, floating license, and no long-term commitment

More News

Davis Balestracci


90 Percent of DOE Is Half Planning

Don’t just teach statistics; teach how to solve problems

Published: Wednesday, May 18, 2016 - 14:14

I’ve mentioned that design of experiments (DOE) is one of the few things worth salvaging from typical statistical training, and I thought I’d talk a bit more about DOE in the next couple of columns. The needed discipline for a good design is similar when using rapid-cycle plan-do-study-act (PDSA).

Doing a search on the current state of DOE in improvement education, I observed that curricula haven’t changed much in the last 10 years and still seem to favor factorial designs or orthogonal arrays as a panacea.

The main topics for many basic courses remain:
• Full and fractional factorial designs
• Screening designs
• Residual analysis and normal probability plots
• Hypothesis testing and analysis of variance (ANOVA)

The main topics for advanced DOE courses usually include:
• Taguchi signal-to-noise ratio
• Taguchi approach to experimental design
• Response-surface designs
• Hill climbing
• Mixture designs

No doubt these are all very interesting. But what is the 20 percent of this material that will solve 80 percent of people’s problems? Some of the topics above are very specialized, rarely used, and can only be understood when people have a practical working knowledge of the other material after actually using it.

Many trainers also fall into the trap of thinking that hypothesis testing and ANOVA should be taught as separate topics. A well-respected statistical colleague says it so well [my emphasis]: “I get [questions about degrees of freedom] all the time (ANOVA tables in particular seem to terrorize people)... but I wish people were asking better questions about the problem they’re trying to understand/solve, the quality of the data they’re collecting/crunching, and what on Earth they’re actually going to do with the results and their conclusions. In a well-meaning attempt not to turn away any statistical questions, my own painful attempts to explain degrees of freedom have only served to distract the people who are asking from what they really should be thinking about.”

A basic knowledge of full and fractional factorial designs, screening designs, and their analysis and diagnostics is a good place to start. This knowledge, though necessary and useful when one is at a low state of knowledge about one’s process, is not sufficient. It usually needs to be supplemented by some basic, extremely useful designs from response surface methodology.

There is no finer reference for a process-oriented approach to DOE than Ronald Moen’s, Thomas Nolan’s, and Lloyd Provost’s Quality Improvement Through Planned Experimentation (McGraw-Hill Education, 2012). Response surface methodology, however, is not covered.

When possible, get a ‘road map’

In my industrial career, many of my clients found much more ultimate value in obtaining a process road map—called a contour plot—which is accomplished through a response surface methodology. In its basic form it is hardly an advanced technique, but it does go a bit beyond factorial designs. Many times a response surface methodology can even build on factorial designs in a nice, efficient sequential strategy as one evolves to a higher state of knowledge, which leads to much more effective optimization and process control.

A typical contour plot is shown in figure 1 (scenario explained shortly). It shows how the predictive model from the design analysis can be turned into a road map of the process studied. Temperature (x-axis) and an ingredient’s concentration (y-axis) were varied over the ranges on their respective axes. For any combination of those two variables, one can read the predicted value of the response being studied (the objective in this case is to minimize it).

Figure 1: A typical contour plot

However, this map is never fully known and can only be approximated. The question becomes: What is your best shot at doing this in as few experiments as possible? First, some background.

The contour plot in figure 1 maps a real production process where the desired product immediately decomposes into a pesky, tarry byproduct that is difficult and expensive to remove. The process is currently averaging approximately 15-percent tar, and each achievable percent reduction equates to $1 million (in 1970 dollars) in annual savings.

Process history has determined three variables to be crucial for process control: temperature, copper sulfate concentration, and excess nitrite. Any combination of these three variables within the ranges of temperature, 55°–65°C; copper sulfate concentration, 26–31 percent; and excess nitrite, 0-12 percent would represent a safe and economical operating condition. The current operating condition is the midpoint of these ranges.

For purposes of experimentation only, the equipment is capable of operating in the following ranges if necessary: temperature, 50°–70°C; copper sulfate concentration, 20–35 percent; and excess nitrate, 0–20 percent.

Suppose you had a budget of 25 (expensive) experiments that need to answer these questions:
• Where should the three variables be set to minimize tar production?
• What percent tar would be expected?
• What’s the best estimate of the process variation (i.e., tar ± x percent)?

This is the scenario I use to introduce my experimental design seminars. I divide the audience into groups of three to four people, and give each group a process simulator where they can enter any condition and get the resulting percent tar.

It almost never fails: I get as many different answers for optimum settings, resulting tar, and variation as there are groups in the room—and just as many strategies (and number of experiments run) for reaching their conclusions. Human variation!

I have each group present its results to me, and I act like the many mercilessly tough managers to whom I have made similar presentations.

General observations:
• Most try holding two of the variables constant while varying the third, and then try to further optimize by varying the other two around their best result.
• Each experiment seems to be run based only on the previous result.
• Some look at me smugly and run the cube of a three-variable factorial design (many times getting the worst answers in the room).
• Some run more than the allotted 25 experiments.
• Some go outside of the established variable safe ranges.
• Most find a good result, and then try a finer and finer grid to further optimize.
• There’s always one group that claims to have optimized in fewer than 10 experiments, and the group members (and everyone else) look at me like I’m nuts when I tell them they should repeat their alleged optimum, which will use up an experiment; and that repeating any condition uses up an experiment.

I’m accused of horrible things when the repeated condition gets a different answer (sometimes differing by as much as 11–14). I simply ask, “If you run a process at the same conditions on two different days, do you get the same results?”

What usually happens as a result:
• I’m often told the “process is out of control,” so there’s no use experimenting.
• Most estimates of process variability are naively low.
• Groups have no idea how to present results in a way that would sell them to a tough manager.
• The suggested optimal excess nitrate settings are all over the range of 0–12, even though it is modeled to have no effect and should be set to zero.

My simulator generates the true number from the actual process map (in figure 1) along with a random, normally distributed variation that has a standard deviation of four. (The actual process had a standard deviation of eight.) In looking at the contour plot, tar is minimized at 65°C and approximately 28.8 percent CuSO4, resulting in 6–8-percent tar + ~8–10 for any production run.

In 1983, I heard the wonderfully practical C.D. Hendrix say, “People tend to invest too many experiments in the wrong place!”

As it turns out, by the end of the class, human variation is minimized when every group independently agrees on the same 15-experiment strategy (a few choose an alternative, equally effective 20-experiment strategy). When they see quite different numerical results from each individual design, they are initially leery, but then pleasantly surprised when they all get pretty close to the real answer.

Reduced human variation = higher quality and more consistent results in only 15–20 experiments. They are now in “the right place,” and have 5–10 more experiments to refine their optimum.

More next time.


About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.


More info please

Thank you Davis, the Quality Improvement Through Planned Experimentation is on it's way to my door now. Could you go a little more in depth as to why the 15 experiments is the best approach. Obviously I would need to see the data to form more of an opinion, but you say you have 25 maxiumum, 3 variables, and 1 output. Please go into more detail as to how the 15 experiments would be the most efficient use of my time.

I'm currently using AIAG DOE's similar to Gage RR's, to test a bunch of gages in a short business trip to Japan, the purpose is machine and gage trials. Obviously time is my budget for this experiment. I have 42 days and many gages, I found that planning my experiments before I left has helped alot, but the experiments are still tedious and time consuming.

Otherwise, I completely agree that there should be more emphasis on DOE and specifically hypothesis testing in engineering statistics and problem solving. DOE has been my most powerful tool, and it shows as I am very successful and my company sends me to Japan to test out fun new machines.

Cheers, great article.

Maybe this will help

Hi, Ken,

Thank you so much for your kind comments.  I'm glad the article was useful for you.  It was definitely a "to be continued..." and the next two or so columns will continue using the tar scenario to expand on some very basic concepts of DOE.

BUT...here is another article I wrote for QD 10 years ago using the same scenario that goes more in depth and should answer your questions, especially about the 15-run design:


Please contact me if you have any more questions.

Thanks again for reading,