Don’t teach people statistics—teach them to solve problems

There are four statements regarding control charts that are myths and in my experience, just refuse to die. The next time you’re sitting in a seminar and someone tries to teach you how to transform data to make them normally distributed, or at any point during the seminar says, “Normal distribution” twice within 30 seconds, leave. You’ve got better things to do with your time.

The four myths

When you attend statistical seminars, do some statistical calculations seem like this? (2 minutes)

Are you “taught” these four things about control charts?

1. Data must be normally distributed before they can be placed on a control chart.

2. Control charts work because of the central limit theorem.

3. Data must be in control before you can plot them on a control chart.

4. Three standard deviation limits are too conservative.

April fool!

If you happen to be in a seminar where someone tries to teach this nonsense, invite them to click this link. It is a complimentary brain scan that will assess what “color” belt a person’s knowledge warrants. (Take it yourself; it’s a hoot.)

And then, just to make sure they’re certified as claimed, give them the “Certification Activity Book” obtainable through the tab in the left margin of that page.

I’ll leave it to your judgment whether you say “April fool” or not.

OK, time to get serious for a few minutes

As I like to say, the I-chart (a control chart for individual values, i.e., not subgrouped) is the Swiss army knife of control charts. During the early 1990s, statistical process control methods were the favored tool for medical quality improvement. When I used it, I came up against a lot of resistance from the entrenched “randomized double-blind clinical trial” cultural mindset that had been the norm in medicine—and provided the perfect smoke screen not to change.

Let’s consider these four myths in greater detail. (The first three are courtesy of Donald J. Wheeler, Ph.D. in a column 15 years ago.)

**Myth No. 1:** Data must be normally distributed before they can be placed on a control chart.

**Reality:** Although the control chart constants were created under the assumption of normally distributed data, the control chart technique is essentially insensitive to this assumption. The normality of the data is neither a prerequisite nor a consequence of statistical control.

**Myth No. 2:** Control charts work because of the central limit theorem.

**Reality:** The central limit theorem does indeed apply to subgroup averages. Because many statistical techniques use the central limit theorem, it’s only natural to assume that it’s the basis of the control chart. Ready for a shocker? I don’t even teach it.

It does have some justification in the case of X-bar-R and X-bar-S charts, but, especially in manufacturing, people usually miss the point and superimpose specification limits on the chart. (*Wrong!*) As I’ve said, these are rarely used in medicine because one generally does not have the luxury of subgrouping.

Actually, the central limit theorem is pretty much irrelevant to the I-chart. This myth has been one of the greatest barriers to the effective use of the I-chart with management and service-industry data, where data obtained one-value-per-time-period is the norm.

Believing this myth to be true and having no doubt endured a lengthy lecture or demonstration of the central limit theorem, people feel compelled to average something to make use of it. As Wheeler says, “The rationality of the data analysis will be sacrificed to superstition.” As a decision criterion, an I-chart with three standard deviation limits—calculated correctly—is very robust to almost any data distribution.

**Myth No. 3:** Data must be in control before you can plot them on a control chart.

**Reality:** I find that people generally make this conclusion only from computing limits incorrectly. Among the blunders that have been made in the name of this myth are getting rid of “obvious” outliers prior to charting them, and using limits that aren’t three standard deviations (see myth No. 4).

The purpose of the chart is to detect lack of control. It’s a very, very valuable initial diagnostic tool for a process. So tell me: If a control chart can’t detect lack of control, why use it?

**Myth No. 4:** Three standard deviation limits are too conservative.

**Reality:** Walter Shewhart, the originator of the control chart, deliberately chose three standard deviation limits. He wanted limits wide enough so that people wouldn’t waste time interpreting noise as signals (a Type I error). He also wanted limits narrow enough to detect an important signal that people shouldn’t miss (avoiding a Type II error). In years of practice he found, empirically, that three standard deviation limits provided a satisfactory balance between these two mistakes. My experience has borne this out as well.

I’ve seen two standard deviation limits commonly used because people, especially in medicine, are obsessed that they might “miss something.” There are two major reasons people do this:

1. The “two standard deviations” criterion for (alleged) significance has been drummed into peoples’ heads as the gold standard for decision making. This reasoning is based on the central limit theorem and making only *one* decision. (See my newsletter, “Why Three Standard Deviations?”)

2. They have performed an incorrect calculation of the standard deviation that has (unknowingly) resulted in an inflated estimate.

Novices continually think that they know better and invent shortcuts that are wrong. I once had a chart where my three standard deviation limits, calculated correctly, were equivalent to 1 1/2 standard deviations of the proposed analysis (needless to say, calculated incorrectly).

You almost never use the calculation of standard deviation taught in your “basic” statistics class, which, unfortunately, is so readily available in most spreadsheet programs. If the very special causes you are trying to detect are present, they will seriously inflate the estimate. Not knowing this, people will even try to use *one* standard deviation as an outlier criterion.

So, in the spirit of baseball season starting, let’s tap into the wisdom of “The Ol’ Perfesser.” If you ever ask a question in a statistical seminar, and the answer in any way resembles the following, leave:

“Well, I will tell you I got a little concerned yesterday in the first three innings when I saw the three players I had gotten rid of, and I said when I lost nine what am I going to do, and when I had a couple of my players I thought so great of that did not do so good up to the sixth inning, I was more confused but I finally had to go and call on a young man in Baltimore that we don’t own and the Yankees don’t own him, and he is doing pretty good, and I would actually have to tell you that I think we are more like a Greta Garbo-type now from success.”

This is how legendary baseball manager Casey “The Ol’ Perfesser” Stengel testified before a special Congressional House subcommittee on July 8, 1958. The committee was studying monopoly power as it applied to baseball’s antitrust exemption, and Stengal was asked if his team would keep on winning. (This is just a fraction of Stengel’s 45-minute discourse, the rest of which is just as priceless, along with Mickey Mantle’s followup.)

In summary

Trying to teach fancy theory does no one any good. W. Edwards Deming emphasized a basic understanding of variation and taught few techniques in his seminars. To do the type of work required to improve everyday culture, only 1–2 percent of people need advanced statistical knowledge.

Deming is probably rolling over in his grave at the subculture of “hacks” (his term) that has been created in the name of quality. Will the 80/20 rule inevitably apply to quality professionals? I answered that question here. And if the following is how your role is perceived, consider yourself forewarned (3 minutes):

As quality professionals, we must be careful not to perpetuate deeply embedded stereotypes of “sadistics” by making seminars nothing short of legalized torture and keeping our roles self-serving. Take it as a given: The people whom we teach will never like statistics as much as we do. So don’t teach people statistics—*teach them how to solve their problems*.

I will close with some more baseball wisdom. Like the Ol’ Perfesser, Yogi Berra is another beloved baseball icon who tends to unintentionally misspeak. Given the economy, many of us might unexpectedly face a career crossroads during the next few years. In fact, Yogi warns us, “It gets late early out there,” so I’m sure he would advise: “When you come to the fork in the road, take it.” Because (and I paraphrase) if people ain’t gonna go to statistics classes, how we gonna stop ‘em?