



© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Published: 06/14/2017
My last column mentioned how doctors and hospitals are currently being victimized with draconian reactions to rankings, either interpreted literally or filtered through the results of some type of statistical analysis. Besides the potential serious financial consequences of using rankings in the current craze of “pay for performance,” many hard-working people are stigmatized inappropriately through what has been called a “public blaming and shaming” strategy. Is it any wonder why many physicians are so angry these days?
[ad:30246]Rankings with alleged helpful feedback to those allegedly needing it are also used as a cost-cutting measure to identify and motivate alleged poor performers. Many are analyzed and interpreted using an analysis that, based on courses people have taken, intuitively feels appropriate, but should actually be avoided at all costs.
In an effort to reduce unnecessary expensive prescriptions, a pharmacy administrator developed a proposal to monitor and compare individual physicians’ tendencies to prescribe the most expensive drug within a class. Data were obtained for each of a peer group of 51 physicians; specifically, the total number of prescriptions written and, of that number, how many were for the targeted drug.
Someone was kind enough to send me this proposal, which included the data—while begging me not to be identified as the source. I quote it verbatim (adding my emphases).
Given the 51 physician results:
“1. Data will be tested for the normal distribution.
“2. If distribution is normal—physicians whose prescribing deviates greater than one or two standard deviations from the mean are identified as outliers.
“3. If distribution is not normal—examine distribution of data and establish an arbitrary cutoff point above which physicians should receive feedback (this cutoff point is subjective and variable based on the distribution of ratio data).”
For my own amusement, I tested the data for normality and it “passed” (p-value of 0.277, which is > 0.05). Yes, I said “for my own amusement” because this test is moot and inappropriate for percentage data like this (the number of prescriptions in the denominators ranged from 30 to 217). The computer will do anything you want.
The scary issue here is the proposed ensuing analysis that will result from whether the data are deemed normally distributed or not. If data are normally distributed, doesn’t that mean there are no outliers? But suppose outliers are present—doesn’t this mean they’re atypical? In fact, wouldn’t their presence tend to inflate the traditional calculation of standard deviation? But wait, the data passed the normality test. It’s all so confusing!
Yet that doesn’t seem to stop our quality police from lowering the “Gotcha!” threshold to two or even one standard deviation to find outliers. In my experience, I am shocked at the extent to which this has become common practice.
Returning to the protocol, even scarier is what’s proposed if the distribution is deemed not normal: establish an arbitrary cutoff point for either what the administrator feels performance should be, or the point that will expose a pre-determined arbitrary percentage (ending in “0” or “5,” of course) of alleged bad performers and/or reward a similar arbitrary percentage of good performers.
I’ll play his game. Because the data pass the normality test, the graph below shows the suggested analysis with one, two, and three standard deviation lines drawn in around the mean.
The standard deviation of the 51 numbers was 10.7.
Depending on the analyst’s mood and the standard deviation criterion subjectively selected, he or she could claim to statistically find one—or 10—“high utilizers” who would receive helpful feedback. Just curious: How does the analyst intend to deal with the 10 performances below the one standard deviation limit of 5.15 percent—and the three zeroes?
He or she could have just as easily decided that “less than 15 percent” should be a standard, resulting in 27 physician high utilizers who would receive feedback.
There is also the common alternative arbitrary strategy: Let’s go after... oops, I mean give feedback to... the—pick one—top quartile, top 10 percent, top 15 percent, top 20 percent.
Another option would be to set a tough stretch goal of “less than 10 percent,” with the following choices:
• Financially reward the 16 physicians below 10 percent?
• Perhaps offer a bonus for those below the one standard deviation threshold of 5.15 percent?
• You could reward the bottom quartile (or 10 or 15 percent), which, along with the previous scheme, would no doubt cause displeasure among the doctors below 10 percent who didn’t get rewarded.
• Should everyone above 10 percent receive feedback? Should there be a financial penalty for a certain level above 10 percent?
The high-utilizer feedback was a thick packet of professional journal articles considered the gold standard of evidence-based practices and rationale.
When I present this example and its proposed actions to a roomful of doctors, they erupt in laughter. When I ask what they do with such feedback, without fail, I see a beautifully synchronized collective pantomime of throwing things into the garbage.
For those of you in education, government, manufacturing, or administration, is this scenario similar to many conversations you routinely experience in any meetings you attend? How much waste in time, money, and morale do analyses and resulting meetings like this cost you? “Unknown or unknowable?” (Does it matter?)
Much of this results from teaching people what Donald Wheeler calls “superstitious nonsense” in the guise of statistics (especially that relating to the normal distribution). Most such material is pretty much useless when it comes to application in a real-world, everyday environment and causes far more confusion and problems than it solves.
Is it possible to change those conversations to make them more productive? More about that next time when I revisit these data.
Links:
[1] https://www.qualitydigest.com/inside/statistics-column/understanding-variation-042717.html
[2] http://www.aweber.com/t/NyxuC