The Function of Tests
Diagnostic testing is an information-gathering task that differs from the processes discussed in the previous chapter only in respect to the risks and costs that tests incur. Testing is used in the process of hypothesis refinement to help formulate a working diagnostic hypothesis, defined previously as one that is sufficiently unambiguous to set the stage for making decisions about further invasive testing, treatment, or judgments about prognosis.
Because diagnostic tests elicit new information, they usually reduce diagnostic uncertainty and are often used selectively to distinguish among competing hypotheses. Tests virtually devoid of risk (e.g., those obtained by collection of blood and urine) and those low in cost are not different in their information-processing function from the direct questions asked the patient or from the findings gleaned from the physical examination.
Quantifying Testing Decisions
( Case 20, Case 23, Case 26, Case 27, Case 29, Case 30– Case 31)
Physicians order diagnostic tests and process the data from these tests implicitly, but we have little data on the cognitive basis of the decisions to carry out the tests and their interpretations of the results. We do have extensive experience, however, with the prescriptive, quantitative approaches alluded to in the preceding section.
Elaboration of these quantitative approaches yields valuable principles of diagnostic testing. In fact, because many test results are expressed numerically, such data are particularly amenable to quantitative interpretation. However, data from tests are not the only information that can readily be expressed in probabilistic terms. The frequency of clinical symptoms, findings, complications of tests, favorable and morbid outcomes, and the efficacy and risks of therapies all can be expressed in probabilistic terms.
Before describing techniques for combining probabilistic information, some attention must be paid to the concept of probability as it applies to medical diagnosis. A probability is an expression of likelihood—an opinion of the relative frequency with which an event is likely to occur. In medical practice, a probability is a belief about some aspect of a patient’s state of health; it can never be known with certainty and can only be estimated. The basis of the belief could ideally be objective from large collections of such data, but usually such collections of data are unavailable or not readily at hand, so the usual source of these estimates becomes subjective opinion on the basis of personal experience with like cases.
Because probabilities have their basis in different data sources, not all probabilities are alike. Some probability assessments can be accepted with considerable confidence and some with little confidence. Our confidence in a probability assessment is couched in terms of ambiguity: The greater our uncertainty about the validity of a given probability assessment, the greater is the ambiguity. Ambiguity in probability assessments increases when available information is scanty, when data are unreliable, and when the test results, facts, or opinions of putative experts are conflicting.
The least ambiguous probability assessments are those solidly grounded in large bodies of data. Unfortunately, such data are not always available, and in some instances, the physician must accept considerable ambiguity in his or her probability assessments.
Probabilistic interpretation of the results of diagnostic tests is invaluable in the process of discriminating among diagnostic hypotheses because the approach combines both the physician’s diagnostic hypothesis before testing and the test result itself.
These concepts are effectively understood in terms of certain kinds of probabilities. A prior probability is a belief about the likelihood of a diagnostic hypothesis—for example, the prevalence of a disease such as acute myocardial infarction among patients presenting with chest pain. This pretest probability may be modified by all information collected up to that point, including symptoms and signs.
A posterior probability represents the revised belief in the likelihood of the diagnosis (myocardial infarction) after interpreting the test result (e.g., one or more creatine kinase or troponin levels). Test characteristics are defined as conditional probabilities, that is, as probabilities specific to certain (disease) conditions. Conditional probabilities describe the frequency with which a given result (e.g., an elevated creatine kinase [CK],) occurs in a given disease and in all other diagnoses of potential interest. In a patient suspected of having an acute myocardial infarction, for example, alternative possible hypotheses of potential interest might include angina pectoris, acute pericarditis, esophageal spasm, and anxiety.
Conditional probabilities for an elevated CK would describe the frequency of high CK values in each of these alternate hypotheses. Combining the prior probabilities of acute myocardial infarction and its diagnostic competitors with the conditional probabilities of the CK results in each of the diagnostic hypotheses yields posterior probabilities (revised probabilities after testing) of all diagnostic possibilities under consideration. These probabilistic data can be combined implicitly without formal calculations, but experience shows that many physicians fail to combine such data accurately when interpretation is carried out in an implicit fashion. For this reason, carrying out an actual calculation of posterior probabilities has special advantages.
Sensitivity and Specificity
( Case 20, Case 23, Case 26)
When considering only the presence or absence of one disease, the conditional probabilities of test results can be described as the sensitivity and specificity of a test (Fig. 4.1). The sensitivity of a test applies to patients known by some independent criterion to have a given disease. It is defined as the true-positive rate or equivalently the probability of a positive test result in patients known to have the disease (a mnemonic is PID for “positive in disease”). Unfortunately, few tests are exclusively positive in patients with a given disease (pathognomonic) and exclusively negative in those who do not have the disease (sine qua non). Overlaps are virtually the rule. Negative test results in patients known to have the disease are described as false negatives.
The specificity of a test applies to patients known by some independent criterion to be free of the disease (a mnemonic is NIH for “negative in health”). It is therefore the true-negative rate or equivalently the probability of a negative test result in patients known not to have the disease. Positive test results in patients who do not have the disease are considered to be false positives. Given the nearly universal overlap between test results in patients who have and who do not have the disease, it is necessary to define a positivity criteria or cutoff point above which the test is considered positive and below which it is considered negative. If the cutoff point is made stricter (i.e., raised), then the number of false-negative results increases (or, equivalently, sensitivity decreases); however, the number of false-positive results decreases (or, equivalently, specificity increases); and vice versa (Fig. 4.2).
FIGURE 4.1. Outcomes of a test with a binary result (either positive or negative) in a population of patients who either have or do not have a given disease.
As shown, patients with the disease may have a positive test (true positive) or a negative test (false negative); patients who do not have the disease may have a negative test (true negative) or a positive test (false positive). The probability of a true-positive result in patients with the disease is the sensitivity of the test, and the probability of a negative result in patients who do not have the disease is the specificity of the test.
FIGURE 4.2. Interpretation of a test, the results of which are in the form of a continuous function.
Individuals who do not have the disease have low test values and are distributed under the shorter curve on the left. Patients with the disease have high test values and are distributed under the taller curve on the right. However, test values in normal and in diseased individuals overlap. The vertical lines represent different cutoff points or positivity criteria: for each of the three segments of the figure, any value of the test to the right of the cutoff point is defined as a positive test and any value to the left of the cutoff point is defined as a negative test. Segment B, in the middle of the figure, defines a cutoff point with equal sensitivity and specificity. With this criterion as the cutoff, the true positives (90% of those with the disease) are to the right of the cutoff, and the true negatives (90% of those who do not have the disease) are to the left of the cutoff. As the criterion for a positive test is made stricter (segment C, bottom), the specificity increases but the sensitivity is reduced. As the criterion for a positive test is made more lax (segment A, top), the sensitivity increases, but the specificity falls. FN, false-negative result; FP, false-positive result; SENS., sensitivity; SPEC., specificity; TN, true-negative result; TP, true-positive result.
( Case 20, Case 23, Case 30, Case 51)
We present a specific example of calculations with Bayes’ rule when both sensitivity and specificity are known. Although this “prostate cancer screening test” example is simplistic, it illustrates the relevant principles. Surveillance Epidemiology and End Results (SEER) data suggest that the prevalence of prostate cancer is 108 of 1,000 men aged 60 to 64 years. Of note, if prior screening with a highly sensitive test had been performed previously, the incidence of disease since the prior screening test should replace the prevalence estimate as the pretest likelihood of disease. In this case, assuming a screening test 1 year ago, the annual incidence of prostate cancer would be between 2 and 9 of 1,000, depending on race.
Based on a published study,[ 43] 71% of patients known to have prostate cancer have a positive test (sensitivity) and 51% of patients known to be free of cancer (benign prostatic hyperplasia [BPH]) have a negative test (specificity) (the data are summarized in Table 4.1). In the population described, what is the significance of a positive test? How likely is it that a person with a positive test has cancer? Calculations are shown in the accompanying figures. Three different approaches to the calculations are illustrated: a “tree” or flow diagram approach (Fig. 4.3), a tabular approach (Fig. 4.4), and the use of Bayes’ formula (Fig. 4.5). More detailed examples of the actual use of Bayes’ rule, or Bayesian analysis, are given in Part II (see Case 23 and Case 30).
TABLE 4.1. Data for the Prostate Cancer Screening Test: Prostate-Specific Antigen (PSA)
|Prior Probability (equivalent here to disease prevalence)||0.108|
|True-positive rate (sensitivity)||0.71|
|False-negative rate (1 – sensitivity)||0.29|
|True-negative rate (specificity)||0.51|
|False-positive rate (1 – specificity)||0.49|
FIGURE 4.3. A “tree” or flow diagram approach to the prostate-specific antigen (PSA) “cancer test” using Bayes’ rule.
This illustrates one solution to the PSA prostate cancer test described in the text. Starting with a population of 100,000 individuals, of whom 108 of 1,000 are expected to have cancer, we add the positive tests in those with cancer (true positives) to those who do not have cancer (false positives) and determine the fraction of patients with a positive test who actually have the disease (true positives divided by the sum of true positives and false positives). The origin of the data in the figure is shown in Table 4.1. With the relatively low specificity of the test at 0.51, more than 85% of positive tests are found in patients who do not have cancer. The low prevalence and the high false-positive rate of the test (0.49) account for this result.
FIGURE 4.4. A tabular solution to the prostate-specific antigen (PSA) “cancer test” using Bayes’ rule.
The prior probability of each condition (cancer or no cancer) is multiplied by the conditional probability (in this case the probability of a positive test, given each condition). The products are summed, and the fraction of positive tests in each condition is calculated. Note the similarity between this calculation and that shown in Figure 4.3. For interpretation, see legend for Figure 4.3.
FIGURE 4.5. Solution to the prostate-specific antigen (PSA) “cancer test” using Bayes’ formula.
Note that the calculation is identical to that shown in Figures 4.3 and 4.4.
( Case 20, Case 22 , Case 23, Case 29)
Bayes’ rule combines data on sensitivity and specificity of tests with prior probabilities, yielding a probabilistic view of various diagnoses that incorporates the test results. The application of Bayes’ rule to diagnostic testing yields important testing principles: The specificity of a test is critical for case finding, especially when screening asymptomatic patients, because the higher the specificity, the lower is the false-positive rate. In populations in which disease prevalence is low, most positive tests will be false positives unless a test is exceptionally specific so that almost all patients without disease have a negative test.
Indeed, if the disease prevalence is extremely low, a test (if it is the only one available) should not be done unless it is nearly perfectly specific. Thus, when a test is highly specific, a positive test result helps “rule in” a disease (a mnemonic is Positive SpIn for “positive test with high specificity rules in the disease”). Tests that are not highly specific are most useful for screening if they are applied in populations with a high disease prevalence. When other confirmatory tests are available, a test with only a moderately high specificity may be worth using (assuming no cost and no risk) as an initial screening test if it has high sensitivity.
For example, screening for HIV typically involves enzyme immunoassay (EIA) followed by Western blot testing, a very sensitive test followed by a more specific test if the first test is positive for a disease in which accurate diagnosis has a high expected utility or benefit. Thus, when a test is highly sensitive, a negative test result helps “rule out” a disease (a mnemonic is Negative SnOut for “negative test with a high sensitivity rules out disease”).
Bayesian Revision for Multiple Results
The previous example of prostate-specific antigen (PSA) screening involved the simplest model of Bayesian revision (disease either present or absent; test either positive or negative). A more refined estimate of prostate cancer can be based on knowing the actual PSA result, or “how positive it was.” To do so, results that are reported as continuous variables or patterns (e.g., serum enzymes, serum electrolytes, electrocardiographic stress tests, or in this case PSA results) usually must be broken into discrete intervals or discrete categories so that they can be used in calculations. Instead of simply positive or negative, test results describe several levels of positivity.
Table 4.2 summarizes the likelihood of different PSA levels for prostate cancer and for BPH.[ 43] Figure 4.6 illustrates the likelihood of prostate cancer if the PSA is 12 (10 or above). Figure 4.7 illustrates the results for a PSA of 7.0 (falling in the 6.0–9.9 range). In these cases, the interpretation of a test result no longer depends on the result simply being positive by falling above a cutoff value. Thus, the previous sensitivity and false-positive rate (1 – specificity) conditional probabilities cannot be applied. Rather, conditional probabilities become the likelihood of a result of 10 or greater or the likelihood of a PSA falling between 6.0 and 9.9 among patients with and without prostate cancer.
TABLE 4.2. Data for the Prostate Cancer Screening Test: Prostate Specific Antigen (PSA)
|PSA Level (ng/mL)||Prostate Cancer||No Cancer (Benign Prostatic Hypertrophy)|
FIGURE 4.6. Solution to the prostate-specific antigen (PSA) “cancer test” for a specific test range.
This figure demonstrates the benefit of knowing the exact PSA result (Table 4.2). Sensitivity and specificity are typically defined as test values falling above or below a “cutoff” value or positivity criterion. However, in a given patient, the positive or negative results may be close to or far from this cutoff. For a test result of 12 (exceeding 10), which is far from the 4.0 positivity criterion, the likelihood of cancer is higher at 0.246 than the 0.149 for a positive test in Figures 4.3 to 4.5. With regard to simply an entire group of patients with positive test results, some have results close to 4 and others have values that are much higher, greater than 10. Patients with benign prostatic hyperplasia, however, are much less likely to have results exceeding 10, so the likelihood of cancer is this subset with high PSA (greater than 10) is consequently higher because false positives drop. Note that if 10 were used as a positivity criterion cutoff, many patients with cancer would have negative tests and be missed, so choosing a cutoff is a tradeoff between false-positive and false-negative results, balancing the benefit of treating true positives against the harm of treating false positives.
FIGURE 4.7. Solution to the prostate-specific antigen (PSA) “cancer test” for a specific test range.
As in Figure 4.6, this figure demonstrates the effect of knowing the exact PSA result. For a test result of 7.0 (between 6.0 and 9.9), which falls closer to the 4.0 positivity criterion cutoff, the likelihood of cancer is a bit lower at 0.134 than the 0.149 for a positive test in Figures 4.3 to 4.5 and the 0.246 for a test result of 12 in Figure 4.6.