Featured Product
This Week in Quality Digest Live
Six Sigma Features
Mark Rosenthal
The intersection between Toyota kata and VSM
Scott A. Hindle
Part 7 of our series on statistical process control in the digital era
Adam Grant
Wharton’s Adam Grant discusses unlocking hidden potential
Scott A. Hindle
Part 6 of our series on SPC in a digital era
Douglas C. Fair
Part 5 of our series on statistical process control in the digital era

More Features

Six Sigma News
Helps managers integrate statistical insights into daily operations
How to use Minitab statistical functions to improve business processes
Sept. 28–29, 2022, at the MassMutual Center in Springfield, MA
Elsmar Cove is a leading forum for quality and standards compliance
Is the future of quality management actually business management?
Too often process enhancements occur in silos where there is little positive impact on the big picture
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth

More News

Davis Balestracci

Six Sigma

Statistical Stratification With Count Data, Part 1

What is your area of opportunity?

Published: Thursday, September 11, 2014 - 11:51

My last column, “Dealing With Count Data and Variation,” showed how a matrix presentation of stratified count data could be quite effective as a common-cause strategy. I’ll use this column to review some key concepts of count data as well as to demonstrate the first of two common statistical techniques that can be useful for further analysis. Obtaining the counts themselves is only half of the job.

First, make sure the operational definition is clear: What’s the threshold whereby something goes from a “nonincident” (i.e., a value of 0) to an “incident” (i.e., a value of 1)? Would two or more people assessing the situation concur that the “incident” had occurred?

In addition, all count data have an implicit denominator that’s defined by the “area of opportunity” for that count. It depends on what's being counted, how it’s being counted, and what possible restrictions there might be upon the count.

A hot topic in healthcare improvement right now is how to eradicate hospital-acquired infections. The number of infections can obviously be counted, but what should the denominator be? The following possibility is quite simple: A hospital could, at discharge, count the number of patients who acquired an infection. This could be expressed as a percentage of patients discharged, and the hospital could obtain perfectly reasonable numbers.

However, even though a number can be obtained, hospital infection control personnel would say that defining the occurrence of infections that way is seriously flawed. For example, a common way for patients to acquire infections is through having a central line, or catheter, during their hospital stay. By looking at infections rather than patients, you can indeed count the number of infections, but, unlike percentage tallying, you can't really count the number of noninfections because exposure to the possibility is constant—i.e., the longer the line is left in, the greater the chance the patient has to acquire an infection. Thus, in this case, the denominator must somehow express the time the patient was potentially exposed to having an infection. Rather than a percentage of patients, it's now considered as a rate of, say, infections per 1,000 central line days.

By using the raw accident counts, last column's data-collection process (and analysis) made an implicit assumption that the window of opportunity was approximately the same for each department and for each month—say, labor hours, for example.

Let's use the previous accident data department totals as central line infection data totals for specific hospital units—only this time, each has a different window of opportunity, as shown below by the number of central line days for each unit.

The denominator is crucial for properly defining the situation and subsequently interpreting the differences in the resulting rates. If you don’t know the area of opportunity for a count, you don't know how to interpret that count.

Comparing rates: the u-chart

In the instance of rate data, an analysis of means (ANOM) using the statistical u-chart is appropriate to answer everyone's basic question:
Are the three above-average units (No. 1, No. 2, No. 5) truly above the six-unit overall average of 12.2?

ANOM assumes that each unit has the average rate of 12.2 / 1,000 central line days unless its individual rate indicates otherwise by falling outside its common cause limit. The limits are calculated to answer the question, “Given the number of central line days for this unit, is its observed result just (common cause) statistical variation from the average?”

The general formula for the common cause range of rates is:

The “3” stands for “three standard deviations.” More about why another time.

For this situation, the formula is:

Note that the only difference for each unit is the number of central line days.

Who is truly “above” or “below” average?

This results in the chart below:

One unit is truly above average (Unit 2), and one unit is truly below average (Unit 6). The others, based on these data, are indistinguishable—from each other and 12.2.

As I’ve said again and again, always be aware of how the data were collected. The raw-count numbers might be useful and good enough when the windows of opportunity are approximately similar, but putting them in their more accurate statistical context will often present another critical view of a situation.

Until next time....


About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.


How to calculate average

See answer to question below

Variable areas of opportunity


This answers the question from my post against your previous column.

In other cases the binomial distribution may be more appropriate and we would then use the p chart.

Thanks for your pragmatic insights.

Looking forward to the next!

Best regards


I may be missing something, however


I need (please & thank you) an explanation of the "u avg". The reason I am asking is that when I try to average the infection rates/1,000 days (via calculator, excel or minitab) the average I keep arriving at is 13.1, not 12.2

Thank you and have an awesome weekend!

Michael Wittke,

Fort Worth, TX

Careful how you calculate the average

Hi, Michael,

Thanks for your comment.

I think you took the average of the six individual calculated rates. Your result gives each of the individual rates equal weight, which is NOT true because each is based on unequal number of hours in the denominator.

The 12.2 is a result of summing ALL of the infections divided by the sum of ALL the individual hours, i.e., (77 / 6.311 -- the numbers in the TOTAL row of the table). That is the correct overall average to use because it is based on SYSTEM performance.

I hope this clarified it for you.

Kind regards,


calculate the average

Thank you, that makes sense!