Back to basics

I discovered a wonderful unpublished paper by David and Sarah Kerridge several years ago. Its influence on my thinking has been nothing short of profound. As statistical methods get more and more embedded in everyday organizational quality improvements, I feel that now is the time to get us “back to basics”—but a set of basics that is woefully misunderstood, if taught at all. Professor Kerridge is an academic at the University of Aberdeen in Scotland, and I consider him one of the leading Deming thinkers in the world today.

Deming distinguished between two types of statistical study, which he called “enumerative” and “analytic.” The key connection for quality improvement is about the way that statistics relates to reality and lays the foundation for a theory of *using* statistics.

Because everyday processes are usually not static “populations,” the question becomes, “What other knowledge, beyond probability theory, is needed to form a basis for action in the real world?” The perspective from which virtually all college courses are taught—population based—invalidates many of its techniques in a work environment, as opposed to a strictly research environment.

To translate to medicine, there are three kinds of statistics:

**Descriptive .**What can I say about this specific*patient*?**Enumerative.**What can I say about this specific*group*of patients?**Analytic.**What can I say about the*process*that produced this specific group of patients and its results?

Let’s suppose there is a claim that, as a result of a new infection-control policy, acquired-MRSA (methicillin-resistant *Staphylococcus aureus*, a strain of staph that is resistant to the broad-spectrum antibiotics commonly used to treat infections) in a particular hospital has been reduced by 27 percent—a result that would be extremely desirable if that kind of reduction could be produced in other hospitals, or in public health communities, by using the same methods. However, there are a great many questions to ask before we can act, even if the action is to design an experiment to find out more.

Counting the number of infections in different years is an enumerative problem (defining “acquired infection” and counting them for this specific hospital). Interpreting the change is an analytic problem.

Could the 27-percent reduction be due to chance? If we imagine a set of constant conditions, which would lead, on average, to 100 infections, we can, on the simplest mathematical model (Poisson counts), expect the number we actually see to be anything between 70 and 130. If there were 130 infections one year, and 70 infections the next year, people would think that there had been a great improvement—but this could be just chance. This is the least of our problems.

Some of the infections may be related, as in a temporary outbreak or pandemic. If so, the model is wrong, because it assumes that infections are independent; or the methods of counting might have changed from one year to the next (Are you counting all suspicious infections, or only confirmed cases?). Without knowing about such things we cannot predict from these figures what will happen next year. So if we want to draw the conclusion that the 27-percent reduction is a “real” one, that is, one which will continue in the future, we must use knowledge about the problem that is not given by those figures alone.

Even less can we predict accurately what would happen in a different hospital, or a different country. The causes of infection, or the effect of a change in infection control methods, may be completely different.

So this is the context of the distinction between enumerative and analytic uses of statistics. Some things can be determined by calculation alone, others require the use of judgment or knowledge of the subject, others are almost unknowable. Luckily, your actions to get more information inherently improve the situation, because when you understand the sources of uncertainty, you understand how to reduce it.

Most mathematical statisticians state statistical problems in terms of repeated sampling from the same population. This leads to a very simple mathematical theory, but does not relate to the real needs of the statistical user. You cannot take repeated samples from the exact same population, except in rare cases. It’s a different kind of problem—sampling from an imaginary population.

In every application of statistics we have to decide how far we can trust results obtained at one time, and under one set of circumstances, as a guide to what will happen at some other time, and under new circumstances. Statistical theory, as it is stated in most textbooks, simply analyzes what would happen if we took repeated, strictly random samples, from the same population, under circumstances in which nothing changes with time.

This does tell us something. It tells us what would happen under the most favorable imaginable circumstances. In almost all applications, we do not want a description of the past but a prediction of the future. For this we must rely on theoretical knowledge of the subject, at least as much as on the theory of probability.

So, get your head around these concepts and I’ll give you more of Kerridge’s wisdom in my next column that relates to everyday work.

As you see, it is totally different from the clinical trial mindset in which most physicians have been taught. There is an additional problem, since you shouldn’t have acquired infections (or medical errors, or pressure ulcers), epidemiologists have a tendency to treat any infection as a special cause and want to determine the root cause. This would be helpful in an outbreak, but in terms of everyday work, one usually has to take the view that you are perfectly designed to have infections—you must at least consider the possibility of using a common cause strategy, and “plotting the dots” will tell you which.

For more information on using strategies, see Chapter 8 of my book, *Data Sanity: A Quantum Leap to Unprecedented Results* (Medical Group Management Association, 2009).

A review of this book is available here: www.qualitydigest.com/inside/quality-insider-news/books-data-sanity-statistics-are-doable.html.

You can order *Data Sanity *at www5.mgma.com/ecom/Default.aspx?action=INVProductDetails&args=3785&tabid=138.