Chapter 5: Risk: Exposure to Disease


Studies of Risk

This chapter describes how investigators obtain estimates of risk by observing the relationship between exposure to possible risk factors and the subsequent incidence of disease. It describes methods used to determine risk by following groups into the future and also discusses several ways of comparing risks as they affect individuals and populations. Chapter 6 describes methods of studying risk by looking backward in time.

The most powerful way to determine whether exposure to a potential risk factor results in an increased risk of disease is to conduct an experiment in which the researcher determines who is exposed. People currently without disease are divided into groups of equal susceptibility to the disease in question. One group is exposed to the purported risk factor and the other is not, but the groups otherwise are treated the same. Later, any difference in observed rates of disease in the groups can be attributed to the risk factor. Experiments are discussed in Chapter 9.

When Experiments are Not Possible or Ethical

The effects of most risk factors in humans cannot be studied with experimental studies. Consider some of the risk questions that concern us today: Are inactive people at increased risk for cardiovascular disease, everything else being equal? Do cellular phones cause brain cancer? Does obesity increase the risk of cancer? For such questions, it is usually not possible to conduct an experiment. First, it would be unethical to impose possible risk factors on a group of healthy people for the purposes of scientific research. Second, most people would balk at having their diets and behaviors constrained by others for long periods of time. Finally, the experiment would have to go on for many years, which is difficult and expensive. As a result, it is usually necessary to study risk in less obtrusive ways.

Clinical studies in which the researcher gathers data by simply observing events as they happen, without playing an active part in what takes place, are called observational studies. Most studies of risk are observational studies and are either cohort studies, described in the rest of this chapter, or case-control studies, described in Chapter 6.


As defined in Chapter 2, the term cohort is used to describe a group of people who have something in common when they are first assembled and who are then observed for a period of time to see what happens to them. Table 5.1 lists some of the ways in which cohorts are used in clinical research. Whatever members of a cohort have in common, observations of them should fulfill three criteria if the observations are to provide sound information about risk of disease.

Table 5.1.Cohorts and Their Purposes

Characteristic in Common To Assess Effect of Example
Age Age Life expectancy for people age 70 (regardless of birth date)
Date of birth Calendar time Tuberculosis rates for people born in 1930
Exposure Risk factor Lung cancer in people who smoke
Disease Prognosis Survival rate for patients with brain cancer
Therapeutic intervention Treatment Improvement in survival for patients with Hodgkin lymphoma given combination chemotherapy
Preventive intervention Prevention Reduction in incidence of pneumonia after pneumococcal vaccination
  1. They do not have the disease (or outcome) in question at the time they are assembled.
  2. They should be observed over a meaningful period of time in the natural history of the disease in question so that there will be sufficient time for the risk to be expressed. For example, if one wanted to learn whether neck irradiation during childhood results in thyroid neoplasms, a 5-year follow-up would not be a fair test of this hypothesis, because the usual time period between radiation exposure and the onset of disease is considerably longer.
  3. All members of the cohort should be observed over the full period of follow-up or methods must be used to account for dropouts. To the extent that people drop out of the study and their reasons for dropping out are related in some way to the outcome, the information provided by an incomplete cohort can misrepresent the true state of affairs.

Cohort Studies

The basic design of a cohort study is illustrated in Figure 5.1. A group of people (a cohort) is assembled, none of whom has experienced the outcome of interest, but all of whom could experience it. (For example, in a study of risk factors for endometrial cancer, each member of the cohort should have an intact uterus.) Upon entry into the study, people in the cohort are classified according to those characteristics (possible risk factors) that might be related to outcome.

For each possible risk factor, members of the cohort are classified either as exposed (i.e., possessing the factor in question, such as hypertension) or unexposed. All the members of the cohort are then observed over time to see which of them experience the outcome, say, cardiovascular disease, and the rates of the outcome events are compared in the exposed and unexposed groups. It is then possible to see whether potential risk factors are related to subsequent outcome events.

Other names for cohort studies are incidence studies, which emphasize that patients are followed over time; prospective studies, which imply the forward direction in which the patients are pursued; and longitudinal studies, which call attention to the basic measure of new disease events over time.

Figure 5.1.Design of a cohort study of risk.

Design of a cohort study of risk.

Persons without disease are divided into two groups—those exposed to a risk factor and those not exposed. Both groups are followed over time to determine what proportion of each group develops disease.

The following is a description of a classic cohort study that has made important contributions to our understanding of cardiovascular disease risk factors and to modern methods of conducting cohort studies.


The Framingham Study 1 was begun in 1949 to identify factors associated with an increased risk of coronary heart disease (CHD). A representative sample of 5,209 men and women, aged 30 to 59 years, was selected from approximately 10,000 persons of that age living in Framingham, a small town near Boston. Of these, 5,127 were free of CHD when first examined and, therefore, were at risk of developing CHD. These people were re-examined biennially for evidence of coronary disease.

The study ran for 30 years and now continues as the Framingham Offspring Study 2. It demonstrated that the risk of developing CHD is associated with elevated blood pressure, high serum cholesterol, cigarette smoking, glucose intolerance, and left ventricular hypertrophy. There was a large difference in risk of CHD between those with none and those with all of these risk factors. Combining the risk factors identified in this study gave rise to one of the most often used risk prediction tools in clinical medicine—the Framingham Risk Score for cardiovascular disease.

Prospective and Historical Cohort Studies

Cohort studies can be conducted in two ways (Fig. 5.2). The cohort can be assembled in the present and followed into the future (a prospective cohort study), or it can be identified from past records and followed forward from that time up to the present (a retrospective cohort study or a historical cohort study). The Framingham Study an example of a prospective cohort study. Useful retrospective cohort studies are appearing increasingly in the medical literature because of the availability of large computerized medical databases.

Figure 5.2.Retrospective and prospective cohort studies.

Retrospective and prospective cohort studies.

Prospective cohorts are assembled in the present and followed forward into the future. In contrast, retrospective cohorts are made by going back into the past and assembling the cohort, for example, from medical records, then following the group forward to the present.

Prospective Cohort Studies

Prospective cohort studies can assess purported risk factors not usually captured in medical records, including many health behaviors, educational level, and socioeconomic status, which have been found to have important health effects. When the study is planned before data are collected, researchers can be sure to collect information about possible confounders. Finally, all the information in a prospective cohort study can be collected in a standardized manner that decreases measurement bias.


How much leisure time physical activity is needed to achieve health benefits? Several guidelines suggest a minimum of 30 minutes a day for 5 days a week, but most people do not follow the recommendation. Can less physical activity achieve health benefits? A prospective cohort study was undertaken among more than 415,000 adults who answered a standard questionnaire about their physical activity and were followed for an average of 8 years 3.

Results showed that increasing amounts of leisure activity were correlated with reduced all-cause mortality and longer life expectancy compared to those who reported no activity. As little as 15 minutes of activity per day correlated with a decreased mortality of 14% and an increased life expectancy of 3 years, even when accounting for education level, physical labor at work, and other health conditions.

Historical Cohort Studies Using Medical Databases

Historical cohort studies can take advantage of computerized medical databases and population registries that are used primarily for patient care or to track population health. The major advantages of historical cohort studies over classical prospective cohort studies are that they take less time, are less expensive, and are much easier to do. However, they cannot undertake studies of factors not recorded in computerized databases, so patients’ lifestyle, social standing, education, and other important health determinants usually cannot be included in the studies.

Also, information in many databases, especially medical care information, is not collected in a standardized manner, leading to the possibility of bias in results. Large computerized databases are particularly useful for studying possible risk factors and health outcomes that are likely to be recorded in medical databases in somewhat standard ways, such as diagnoses and treatments.


The incidence of autism increased sharply in the 1990s, coinciding with an increasing vaccination of young children for measles, mumps, and rubella (MMR). A report linking MMR vaccination and autism in several children caused widespread alarm that vaccination (or the vaccine preservative, thimerosal) was responsible for the increasing incidence of autism. In some countries, MMR vaccination rates among young children dropped, resulting in new outbreaks and even deaths from measles.

Because of the seriousness of the situation, several studies were undertaken to evaluate MMR vaccine as a possible risk factor. In Denmark, a retrospective cohort study included all children (537,303) born from January 1991 through December 1998 4. The investigators reviewed the children’s countrywide health records and determined that 82% received the MMR vaccine (physicians must report vaccinations to the government in order to receive payment); 316 children were diagnosed with autism, and another 422 with autistic-spectrum disorders.

The frequency of autism among children who had been vaccinated was similar (in fact, slightly less) to that among children not receiving MMR vaccine. This, along with other studies, provided strong evidence against the suggestion that MMR vaccine causes autism. Subsequently, the original study leading to alarm was investigated for fraud and conflict of interest and was retracted by The Lancet in 2010 5.

Case-Cohort Studies

Another method using computerized medical databases in cohort studies is the case-cohort design. Conceptually, it is a modification of the retrospective cohort design that takes advantage of the ability to determine the frequency of a given medical condition in a large group of people. In a case-cohort study, all exposed people in a cohort, but only a small random sample of unexposed people are included in the study and followed for some outcome of interest.

For efficiency, the group of unexposed people is “enriched” with all those who subsequently suffer the outcome of interest (i.e., become cases). The results are then adjusted to reflect the sampling fractions used to obtain the sample. This efficient approach to a cohort study requires that frequencies of outcomes be determined in the entire group of unexposed people; thus, the need for a large, computerized, medical database.


Does prophylactic mastectomy protect women who are at increased risk for breast cancer? A case-cohort study was done to examine this question in six health maintenance organizations, all of which had computerized databases of diagnoses and surgical procedures on their members. Investigators identified all 276 women who underwent bilateral prophylactic mastectomy over a number of years and followed them forward over time to determine if they developed breast cancer.

For the comparison group, the investigators randomly sampled a similar group of women not undergoing the procedure and enriched the sample with women who subsequently developed breast cancer. “Enrichment” was accomplished by knowing who among 666,800 eligible women developed breast cancer—through examination of the computerized database. The investigators randomly sampled about 1% of comparison women of a certain age who developed breast cancer,† but only about. 01% of women who did not.

Adjustments for the sampling fractions were then made during the analysis. The results showed that bilateral prophylactic mastectomy was associated with a 99% reduction in breast cancer among women at higher risk 6.

† Strictly speaking, the study was a modification of a standard case-cohort design that would have included all cases, not just 1%, of 26,800 breast cancers that developed in the comparison group of women not undergoing prophylactic mastectomy. However, because breast cancer occurs commonly, a random sample of the group sufficed.