Chapter 5: Observational Studies



Randomized controlled trials are the gold standard for efficacy, but they are not the gold standard for effectiveness or safety. In addition, randomized controlled trials may not be feasible or ethical. They are nearly always very expensive and time consuming. Therefore, there is a great need for other types of analytic studies to complement and prepare for randomized controlled trials. These types of studies are often called observational studies because the investigators observe the assignment of study and control groups rather than intervene to create these groups. The importance of observational studies has been emphasized in recent years, and their reporting has become more standardized with the publication of the STROBE statement (STrengthening the Reporting of OBservational studies in Epidemiology), which parallels the Consort Statement for randomized controlled trials [(1)].

Population comparisons, case-control, and cohort studies are the three basic types of observational studies that are used as part of our efforts to establish contributory cause and efficacy. As we will see, observational studies are also very useful for investigating effectiveness and safety in practice. Let us take a look at each of these types of investigation and examine the relationships between them.

Population Comparisons [(2],[3)]

Population comparisons of rates of disease or other outcomes can be used for a number of purposes including the following:

  • Studies of etiology often begin with a hypothesis derived from observing a difference or change in rates of disease. For instance,

Mini-Study 5.1

An investigator hypothesized that countries with a high consumption of olive oil have a lower rate of death resulting from coronary artery disease compared with countries with a low consumption of olive oil. On the basis of the results of the investigation, he recommended an investigation to determine whether an association at the individual level exists between consumption of olive oil and a lower chance of death resulting from coronary artery disease.

  • Screening and diagnostic testing relies on comparing rates to estimate the pretest probability before knowing the patient’s symptoms. For instance,

Mini-Study 5.2

An investigator found that the rate of developing coronary artery disease increases with age among men and women, with the rate among women trailing men by approximately 10 years. He used this as the starting point for estimating the risk of coronary artery disease in a 65-year-old man and a 23-year-old woman.

  • Prediction of the future often rests on comparing rates of development of disease and the subsequent rates of death or disability. For instance,

Mini-Study 5.3

Among those with a previous myocardial infarction, the rate of death fell steadily from 50 per 1,000 per year to 20 per 1,000 per year between 1982 and 2012. The investigators predicted that the rate would be approximately 10 per 1,000 per year by 2022.

  • Efficacy may be suggested by looking at rates before and after an intervention when other data establishing efficacy are not available. For instance,

Mini-Study 5.4

The rate of developing Reye syndrome was 1.0 per 100,000 children younger than 12 years per year during the 1960s and 1970s when aspirin was promoted for use by children. The rate fell to 0.1 per 100,000 children younger than 12 years per year after aspirin was widely considered contraindicated for young children.

  • Effectiveness can be evaluated using rates once efficacy has been determined by randomized controlled trials. For instance,

Mini-Study 5.5

A vaccine for a common childhood disease was recently approved in the United States after two well-conducted randomized controlled trials demonstrated its efficacy. Its effectiveness was evaluated by collecting data on the number of cases of the disease before and after approval and widespread use of the vaccine. The investigators reported a dramatic decline in the rates of new cases of the disease.

Common to all these examples is the fact that the investigators are looking at rates of disease or rates of outcomes of disease in one population compared with another. Therefore, to understand the population comparisons, we need to take a look at what we mean by rates and how we use them, as discussed in Learn More 5.1.

Population comparisons compare the rates in two or more populations or in the same population over time. They cannot directly relate the data they collect to individuals. Therefore, we say that they aim to establish group relationships or group associations or differences.

The results of population comparisons are often presented as a ratio of rates or a rate ratio. Rate ratios may look like relative risk or odds ratios, but since they are derived from population comparisons, they do not ensure that an association exists at the individual level.

Real versus Artifactual Changes or Differences

Population comparisons may examine changes in the same population over time or differences between populations. Regardless of the uses of population comparisons or whether we are interested in changes or differences, it is important to try to determine whether the observed changes or differences in rates are real or artifactual. Artifactual changes or differences may also be referred to as spurious or false changes or differences.

Learn More 5.1: Rates and Their Uses

The term rates is a generic term that incorporates a range of measures of the frequency of occurrence of disease or the outcomes of disease. In classifying rates, an important distinction is between proportions and true rates. A proportion is a fraction in which the numerator is derived from the denominator. That is, the numerator is a subset of the denominator, as illustrated in the following example:

Mini-Study 5.6

An investigator measured the number of cases of lupus erythematosus in a community of 1,000,000 people and finds 1,000 cases. She calculated the number of cases of lupus per 100,000 people and concludes that there are 100 cases of lupus per 100,000 people.

This proportion is known as prevalence. Prevalence measures the probability that a disease is present at a particular point in time. That is, a prevalence of 100 per 100,000 represents a probability of 1 per 1,000, or 0.001, or 0.1%.

Another important proportion that is also a probability is known as case fatality. Case fatality is a measure of prognosis or the probability of adverse outcomes once a disease has been diagnosed. Case fatality indicates the probability of dying from the disease once the diagnosis is made. Thus, the numerator contains the number of deaths, whereas the denominator contains the number of cases diagnosed. The case fatality is not relevant to conditions that do not result in mortality. In this situation, other adverse outcomes such as blindness or paralysis may be substituted for mortality and can be used as a measure of prognosis.

Strictly speaking, a rate, or what we will call a true rate, not only satisfies the conditions of a proportion but also includes a period of time. That is, in a true rate, the numerator is a subset of the denominator and also measures the occurrence of events over a period of time often over a 1-year period, as illustrated in the next example:

Mini-Study 5.7

The lupus erythematosus investigator now identifies all new cases that develop in the community during 2011. She finds 20 new cases per 100,000 people in 2011 and concludes that the rate is 20 per 100,000 per year.

This measurement is known as an incidence rate. It measures the probability of the occurrence of an event such as the diagnosis of lupus over the period of a year. Like prevalence, the incidence rate has a numerator that comes from the denominator. Unlike prevalence that measures the situation at one point in time, incidence rates measure the occurrence of events over time. Another important measure that is a true rate is known as the mortality rate. Mortality rate is an important type of incidence rate that measures the incidence of death over a year per 100,000 people alive in population at the start of the year.

Incidence rates, prevalence, and case fatality aim to describe three distinct points in the course of a disease. Together, they provide a description of the clinical course of a disease or other condition as follows:

  • Incidence rate measures the true rate of development of the disease over a period of 1 year.
  • Prevalence measures the probability of having the disease at one point in time.
  • Case fatality measures the probability of dying or having another adverse outcome once the disease has developed.

When using rates to compare two or more populations or a single population over time, it is important to clarify which type of rate is being used.5.1

Changes or differences in rates may be the result of real changes in the incidence, prevalence, or case fatality. Alternatively, changes or differences may reflect changes in the method by which the particular disease is measured. Artifactual changes or differences imply that, despite the fact that a change or difference was observed, it does not reflect changes in the disease but merely in the way the disease is measured, sought, or defined.

Artifactual changes or differences result from the following three basic sources:

  1. Changes or differences in the ability to recognize the disease. These represent changes in the measurement of the disease.
  2. Changes or differences in the efforts to recognize or report the disease. These may represent efforts to recognize the disease at an earlier stage, changes or differences in reporting requirements, or new incentives to search for the disease.
  3. Changes or differences in the definition of the disease. These represent changes or differences in the criteria used to define the disease.

The following example illustrates the first type of artifactual change, the effect of a change or difference in the ability to recognize a disease:

Mini-Study 5.8

Because of an improvement in technology, a study of the prevalence of mitral valve prolapse was performed. A complete survey of the charts at a major university cardiac clinic found that in 1977 only 1 per 1,000 patients had a diagnosis of mitral valve prolapse, whereas in 2012, 60 per 1,000 patients had mitral valve prolapse included in their diagnoses. The authors concluded that the prevalence of the condition was rapidly increasing.

Between 1977 and 2012, the use of echocardiography greatly increased the ability to document mitral valve prolapse. In addition, the growing recognition of the frequency of this condition led to a much better understanding of how to suspect it by physical examination. It is not surprising, then, that a much larger proportion of cardiac clinic patients were known to have mitral valve prolapse in 2012 compared with 1977. It is possible that if equal understanding and equal technology were available in 1977, the prevalence would have been nearly identical. This example demonstrates that artifactual changes may explain large differences in the prevalence of a disease.

Changes in the efforts to recognize a disease may occur when the available treatment improves, as illustrated in the following example:

Mini-Study 5.9

A new treatment for migraine headache is approved for use and is widely advertised in the medical journals and in major newspapers. The number of patients presenting for care with migraine headaches doubles in the year after approval of the new drug. These patients meet all the criteria for a diagnosis of migraine.

This apparent doubling of the prevalence of migraine is most likely due to the increased proportion of individuals with migraine headache who present for care after becoming aware of the new treatment. A high proportion of individuals with many self-limited or nonprogressive diseases do not seek health care. Changes in the types of patients who seek care can produce dramatic but artifactual changes in the rates. It is important to recognize that at times the increased ability to diagnose a disease such as mitral valve prolapse or the increased interest in its diagnosis such as migraine headaches may lead to real changes or improvements in the outcomes of the disease.

Finally, artifactual changes or differences in rates may result merely from changes in the definition used to define the disease. The following example illustrates how the definition of a disease may change over time and thus produce an artifactual difference in the apparent rate:

Mini-Study 5.10

The incidence rate of acquired immunodeficiency syndrome (AIDS) increased every year between 1981 and 1990. In 1 year during the early 1990s, there was a sudden, dramatic increase in the reported rate. One investigator interpreted this sudden increase as a sign that the epidemic had suddenly entered a new phase. It was later recognized that no sudden change had occurred.

The dramatic increase may have been due to a change in the Centers for Disease Control and Prevention’s definition of AIDS, which meant that more individuals with human immunodeficiency virus infection fell within the definition of AIDS. When sudden changes in the incidence rate of a disease occurs, one must suspect artifactual differences, such as changes in the definition of a disease. In this case, one suspects that an artifactual change was superimposed on long-term changes. Long-term changes in rates are referred to as a temporal or secular trend.

Even if we conclude that the changes or differences in rates are real, we need to be aware that different populations may have differing demographic characteristics that can affect the outcome being investigated. For populations, the most common demographic characteristics for which data are available is age. Age is a strong predictor of many diseases, so differences in age distribution of the populations being compared are important to recognize and to address.

Issue of age distribution in a population may occur even when comparing the same population especially over extended periods. Demographic changes can produce a substantial increase in the average age of a population over a period of a few decades. Let us look at the next scenario, which illustrates the importance of recognizing differences in age distribution of the populations being compared:

Mini-Study 5.11

The incidence rate of pancreatic cancer in the United States per 100,000 population was compared with the incidence rate in Mexico. The rate in the United States was found to be three times as high as the rate in Mexico per 100,000 population per year. The authors concluded that U.S. residents have a risk of pancreatic cancer three times as high as the risk among Mexicans.

The risk of pancreatic cancer may or may not be higher in the United States. Pancreatic cancer is a disease that increases with age. Therefore, the higher incidence rate may be due to the fact that the United States has an older age distribution than Mexico. Thus, a comparison of pancreatic cancer in Mexico and the United States requires taking into account the age distribution of the two populations. This can be done by calculating the incidence rate per 100,000 population that would have occurred in Mexico if it had the same age distribution of the population as the United States.5.2

5.1 Rates may utilize different lengths of follow-up time for different individuals. If individuals are followed for differing lengths of time, a measure known as a person-year is often used. A person-year is one individual followed for 1 year. At times, the term rate is used as a generic term to indicate any fraction or ratio with a numerator and a denominator. A ratio may consist of a numerator that measures one phenomenon and a denominator that measures a different phenomenon. For example, in perinatal mortality rates, the numerator consists of the number of stillbirths in a population and the denominator is the number of live births during the same time period. This special type of ratio can be confusing because the numerator is unrelated to the denominator. This type of ratio does not have any predefined limits. In other words, theoretically, it can vary from zero to infinity since the numerator and the denominator do not depend on each other

5.2 Or, alternatively, the incidence rate for the United States could have been calculated assuming that it had the same age distribution as Mexico. This is known as the direct method of age standardization. There are two basic forms of age adjustment or age standardization known as direct and indirect age standardization. Direct age adjustment uses the age-specific rates from one population (population A) and applies it to the number of individuals in the corresponding age group in the comparison population (population B). This allows one to ask the question: How many deaths would have occurred in population B if it has the same age distribution as population A? Direct age standardization allows a comparison of the number of deaths that did occur in population B with the number that would have been expected to occur if population B had the same age distribution as population A. Indirect standardization, as opposed to direct standardization, does not require knowledge of the death rates in each age group in the population of interest. Indirect standardization uses an external population such as the U.S. population in 2000, where the age-specific death rates are known. The age-specific death rates in the U.S. population in 2000 are then applied to the number of individuals at each age in the population of interest. This allows calculation of an expected number of deaths. The observed or actual number of deaths in the population of interest can then be compared with this expected number of death in the population of interest. This ratio of observed to expected number of deaths is called the standardized mortality ratio. Standardization can be misleading when the rates for one age group are increasing while the rates for another are decreasing. In addition, the choice of standard population can affect the results, especially if the population distribution is changing.