TESTS: "FALSE POSITIVE" [false alarm] AND "FALSE NEGATIVE" TEST
to this file/page, you have been directed to & are now on Dr. Shaw's personal
The practice of medicine is (1) an art based on (2) observation, (3) questioning and listening, and (4) scientific information. It is made hugely more difficult by (a) the reliability complications of statistics, (b) the psychology of the "placebo effect" in compiling scientific studies AND treatment effect on patients, and (c) the inability of so many patients to understand both placebo effect and the concept of false positive and false negative diagnoses.
CALCULATOR: I long for calculators that help patients predict outcomes. I like the following one. For back problems, a calculator now exists to help a patient to look at odds of surgery or nonsurgery making things better, no change or worse, the SPORT calculator.
First, let's talk statistics. Results stimulate various levels of response. A positive result may (1) demand immediate treatment, whether false or true positive. Or a positive result (2) may trigger the need for another test or study to add clarity between the array of possibilities for diagnosis that a positive initial test result presents to the doctor (sort of like a foul ball in baseball). Or, a positive result (3) may simply say that the patient now needs to be watched more carefully. (4) Or, a positive result may be a false positive...a HUGE problem when screening for disease in populations at low risk (and, therefore, with low disease prevelance. And, the false positive category includes a result that is correct for a sought-after diagnosis but negative for any obvious adverse clinical impact on the patient (as in many cases of ACA of the childbirth placenta and other initially disturbing placental diagnoses such as HEV or chronic villitis, VUE type...to just use a few examples in only one human organ). Using this approach, there is an attempt to "bet" on (a) a correct diagnosis, (b) to predict the future outcome (prognosis) of a treatment plan, and (c) to estimate the riskiness of various procedures and treatments through the use of statistics.
MAIN POINT OF THIS PAGE: A whole lot more exactitude is implied (and/or inferred) than actually exists in the field of medicine!
Firstly, the PLACEBO EFFECT:
In studying what "works" and what does not "work", it is hugely important to be clear that, just because outcome A follows treatment B, it does not necessarily mean that treatment B caused outcome A. From an on-line article: "The magic of medical success has always been, and to a large extent still is, the remarkable power of the placebo effect—getting better as a result of  a tincture of time and  the magical power of hopeful expectations. Many popular treatments are popular more because of placebo effect than because of any physiologic changes.
Placebo is the best medicine ever invented.
The placebo effect is endlessly fascinating: we know that 2 pills produce a greater placebo effect than 1, that brand-named pills work better than a non-branded; that expensive pills are more powerful; and that placebos work even when patients know they are placebos. And placebo injections work even better than placebo pills.
The study of placebo effect in surgery has lagged far behind its role in medicine because doing placebo surgery is harder than giving a placebo pill. And there is every reason to believe that surgery is especially prone to placebo effects. The more dramatic the procedure, the more likely it raises hope of cure.
“Quick, operate before the patient gets better” is [said to be] one of those jokes orthopedic surgeons tell among themselves, barely covering a hard truth: that a lot of elective surgery might be unnecessary or even harmful.
- See article HERE. Finally, one of the most shocking examples of placebo gullibility is the case of the "goat gland doctor", John Brinkley, who made millions "curing" men of impotence by surgically implanting goat testicles into them (and there was MORE) that he did...The Travel Channel Mysteries at the Museum episode S03E12 YouTube video is HERE.
Now, the STATISTICAL EFFECT:
The PSYCHOLOGY (the patient's emotions) of prognosis & risk perception (especially risk perception) is a huge factor, especially in decisions related to this latter (c) situation [see discussion reference #4, linked below 4]. "It is intrinsic to the human animal that our perception of risk will be subjective [see early part of this YouTube video 4]." That video discusses the "perception gap" between what risk data analysis says and what we laypersons perceive and believe. Take worries about cancer, for example. Non-medical people are unaware that there are 1000s of cancer types and varieties (all types of malignancies). Let's put them all into three brackets: (1) Killer: Nearly 100% of the time, wolves in the wild will kill you if you don't kill them first. (2) Dangerous: German shepherds and pit bulls can be dangerous and do kill, but not often. (3) Harmless: Though harmless by themselves, lap dogs can't directly kill you, but their bite infections can kill. With all of the big words and scientific talk with medical providers, most patients become confused about zones or brackets, especially when the doctor has a hard time explaining due to the threats from our legal system if he thought yours was relatively harmless but ends up killing you. The family could come back with a mental pain and suffering malpractice lawsuit.
Prognosis (b) estimations are population based and then roughly applied to the individual. Degrees of exactitude are often quoted which I feel are not justified but are well received by patients who feel more emotionally comfortable by the appearances of exactitude. The problem is that the patient is a single individual...he/she is in a population composed of just one person (there is no 85%, 50%, or 2%). The statistical studies on which prognosis estimates are based are, however, composed of outcomes data on numerous patients.
Riskiness (risk calculations) (c) of likelihood of getting a disease or of riskiness of procedures and treatments for complications or adverse reactions also are population based. Yet the particular patient represents a statistical universe of one. The best that we can actually do is to tell a patient and loved ones that there is low, medium, or high risk. Testing for a correct diagnosis looks for the truth here and now and often with real urgency.
Prognosis and risk estimates are about the future...often the very near future with respect to risk of complications and adverse events. These are filled with emotional aspects that revolve around different patients having highly variable fears.
TESTING FOR THE CORRECT DIAGNOSIS:
As to (a) testing for a correct diagnosis, the determination of normal ranges for each of the 100s (1000s) of tests is an approximation and not an absolute certainty. The reliability of the statisical gambling with tests is strongest when the prevalence of the diagnosis being considered (the "concern" of the moment) is high within the population being checked (one such population might be females coming to a hospital emergency department with belly pain). Unfortunately, the USA medical system has (1) been conned by researchers and academicians into a scientific mindset (depending on "science" exactitude) of exactitude of diagnosis that is not justified by common sense or rules of statistics. This state of mind has been generated by government (2) research incentives and
(3) regulations along with the intense culture within the medical field to self-protect (CYA) against medical malpractice threats. Most patients are unable to emotionally deal with words which seem uncertain such as "most likely" and "probably".
When medical people tell you that you have had a
false-positive test or a false-negative test, it does not automatically mean that a
mistake was made. In fact, such a statement actually does not always truly even mean that the
result was actually false. The result might be a simple aberration of the way that "normal ranges" are set. And the significance of the situation varies upon what kind of
- a "screening
test situation" (a test in search of a disease in a person who does not appear to have
it...example: PSA test for prostate cancer),
"diagnostic test situation" (a test for a specific, particular disease...example = lung
cancer...in a person who has a more general finding such as "a spot on the lung"...typical of
the specific disease we are concerned about),
- a "treatment
decision test situation": patient had a stroke which seems to be because a clot traveled to the
brain. Did it come through a patent foramen ovale (PFO)?
- or a
"monitoring test situation" (a test which helps doctors "keep track" of how you are
doing with a known disease...example: hemoglobin A1c in a
test is (1) a test result (such as blood [serum] PSA) or (2) finding which suggests the presence
of a disease which turns out to apparently not be there. But, another disorder may be
found that explains the result. Example:
We have a recent case example of a close friend
with a 22 gram prostate gland coming to our attention due to PSA going from 1.3 in 2003 to
5.9 in 2007 for an alarming PSA velocity of 1.16 ng/mL/year and doubling time of 1.82 years
and density quite elevated at 0.268 ng/mL/cc of gland. Twelve patterned biopsies found active
periglandular lymphocytic chronic prostatitis. This was "positive" for a diagnosis explaining
the elevated PSA parameters but "negative" for prostate cancer. So, it was "false positive"
for cancer because the test was being used as a cancer screening test. [By
2011, as with many men his age, he did finally develop prostate cancer.]
Another example would be a positive test for "flu" when tested in a non-flu time of the year or on the very early or very late margins of the flu season. A "positive" such test at those times can only be a presumptively positive (rather than definitively positive) until further details prove it. If subsequent developements prove it to be "flu", then the test result becomes definitively positive. If it was a false positive, it is incorrect to say that the original test result was "wrong". It was simply a positive test that did not pan out to be a definitely positive test. Imagine a medical practitioner trying to explain such a thing to an upset family who begins to wonder if anyone really knows what is going on!
A false-negative test is a result
or finding which suggests that the dreaded disease is not there but which, on further
investigation, such disease is/was, indeed, found to be present. False positive and false
negative results either cause unwarranted concern or unwarranted relief, and they can lead to
additional expense...as do true positive & true negative tests. Testing of any type
almost always leads to more expense3!
A borderline test is one with a
result but with the result maybe not clearly answering our question. It might be "not
negative" yet the "positive" finding does not fulfill the criteria needed to define a
positive test at a level high enough (positive enough) to trigger the beginning of a
treatment (see "false positive example above).
A 62 year old female smoker presents with headache
to the ER. CT scan of head shows lesions in the distribution of the right middle cerebral
artery (MCA). CT angiogram indicates might be atherosclerosis, arteritis, or recanalizing
thromboembolic clot. Trans-esophageal echocardiogram (TEE) is negative for atrial lesions.
There is negativity for Doppler flow through any PFO but occasional positivity for bubbles
right to left in 1 in 5 Valsalva coughs. The TEE is not negative; but the "positivity"
(bubbles) is not straightforwardly positive as an indicator for PFO
Prevalence & Test-for-Diagnosis Performance:
Lab test results interpretations are based on statistics. Lab
testing for the influenza virus early in the "flu season" produces poor test-reult reliability statistics
because the actual disease is rare in patients tested at that time. The prevalence of flu is very
low at that time. So, the statistical odds are also low that a positive test is a true,
correct, "positive" test and more likely that it is a false positive...a false
Therefore, the impact of test performance
statistics weighs heavily upon how widespread the disease tested for is in that patient's
population (what percentage of that population truly has that disease at that time)...the prevalence of that disease...the probability that the disease
actually exists in that person. Test performance dramatically improves when the sought-for
disease has a high percentage chance of actually existing in members of the case population. That is, a
test for lung cancer on ALL persons will poorly perform as compared to a test on a person who
is a (1) male (2) long-time cigarette smoker (3) who has a spot (lesion) on the lung (4) which does
not have any visible calcification and (5) has a stellate shape by imaging
studies. Each of those 5 factors cummulates to increase the prevelance of the sought after disease in the population tested.
Furthermore, in the case of a screening test, such
test is hoped to have the actual effect of attempting to re-position the "positive patients" into a new
population in which the sought-for disease is more prevalent. At that moment, a positive "might" have the disease. Then, more definitive testing
and questioning is done in THAT group so as to corroborate or refute what is becoming a presumptive diagnosis.
Tests: Don't Throw Baby Out with the Bath
Some tests have awful rates as to being falsely
positive for a disease; for those, a positive result is likely reported by our lab as
"indeterminate". Yet the same test may have a great (extremely low) false negative rate, such
that "negativity" truly means negative and rules out the presence of the disease in
that patient. The very rapid SUDS test for HIV was hugely reliable when negative (the bloody-covered
emergency patient did NOT have HIV) but poorly reliable when positive (it went off the market
about 2005, unfortunately). So, positive results were reported by our lab as "indeterminate";
and the patient was tested with another test which took 24 additional hours to get a confident
result as to "positivity".
Abuse of Medical Tests:
Sadly, there can be economic gain by shrewdly
playing these statistics. A medical test for an infection which is falsely positive will
likely lead to the buying of medicine to treat the infection (which really does not exist).
The patient and the prescribing doctor will never know the difference...the pharmaceutical
company which markets both (1) the test and (2) the drug to treat the infection gains (incidentally
or deliberately) by a test which has a significant false positive rate. We, the people, demand all of this "testing" in our zeal to "know RIGHT NOW!"
Research: (1) Some researchers ignorantly or shrewdly &
intentionally design "test-performance studies" of rates of false positivity and false
negativity in order to get a grant ($)...and without being SURE that the factors are clear as
to applicability to real-word practice situations or not. The HPV test controversy in 2005 is
an example1. (2) Researchers, hoping to push their product to improve health, also often switch back and forth between relative risk comparisons and absolute risk comparisons in order to most favorably portray their product or position. Relative risk vs. absolute risk and your health, HERE.
But, as with the following examples... THAT'S
Some more examples
of the concept:
- In the
medical arena: a mammogram shows suspicious findings (a "positive" breast cancer
screening test by imaging), but biopsy results are negative (pathology studies were, however, concordant with the abnormal imaging and showed
a known benign breast condition that explained the mammogram findings)...a "false-positive"
situation...you were scared to death but are now relieved. This is an example of a false
positive cancer screening test. The test was truly & accurately positive in screening...it
found an abnormality. But it was false in the sense that the abnormality was not
- In the
automobile repair arena: your car won't start and is towed to a garage. The mechanic
tests the battery and says it is weak (a "positive" battery-status diagnostic test) and
replaces the battery. 24 hours later you are stranded, and further work shows that the
battery-grounding cable had actually been loose and was the real problem...an example of a
false-positive diagnostic test. The test was not wrong (it was not false and should not have
actually been negative) at all; the interpretation of the result was
- In the arena
of spousal relations: a wife notices lipstick on a husband's shirt collar (a
"positive" test) and worries about marital infidelity. But she later verifies that her husband
actually encountered his Aunt Jane at a lunch break at a local fast food outlet. Old Aunt Jane
partly missed her husband's neck with her lipstick-kiss and hit his collar...a case of a
false-positive finding (which was positive for female kiss but negative for infidelity...wife
"walked the line" of possibilities of the significance [her interpretation] of the finding
and got to the truth of it).
- In the legal
arena: a lawyer accepts the complaint of a client and ...after brief
investigation...agrees to file a lawsuit in behalf of the client and names 16 suspect people in
the suit (suggesting that the lawyer has "positive" reasons for naming them...a positive
screening/diagnostic test); 12 months later, 15 people are dropped from the lawsuit because it
became abundantly clear that those 15 were not at all liable...possibly only "named" so that
they could be forced to endure questions under oath (depositions) and worry for a year or so as
the plaintiff lawyer "fishes" for helpful evidence.
- In the
political arena: an individual desires to run for office and feels certain from his
investigation (a "positive" test of judgment) that the majority of voters favor a continuation
of video poker gambling and are not on the side of his non-gambling opponent. But, he is
out-voted by his opponent's voters. His judgment was wrong based on faulty tests of voter
sentiment...false-positive rate of sentiment indicators.
- In the
marketing arena: an entity (say, a hospital) has an annual incidence of 250 breast cancer patients treated in its Breast Cancer program. Let's say that about 245 of those patients and their families are anywhere from neutral to highly positive about their experience with the program. Most live and return to happiness in the community, building up a sizeable prevelance of program positivity in the community. But, what about those five per year who are tremendously displeased with (or focus negative blame on) the program? Depending on how broad their connections are within the community, this low-prevelance but highly toxic negative cluster can cause serious damage by swaying new cases to go to some other program. That minority prevelance can be like "one bad apple can spoil the bunch".
- Welch, H. Gilbert,
M. D., MPH, Should I Be Tested for Cancer?, 2004.
- Galen RS, Gambino
SR., Beyond Normality: The Predictive Value and Efficiency of Medical Diagnosis. New
York, NY: John Wiley and Sons; 1975.
- The Vassar Stats on-line calculator and info website, HERE.
- Wikipedia discussion of "risk perception", HERE. David Ropiek and emotional distortions of risk perception (7 minute YouTube video, HERE). He is author of, How Risky Is It Really? Why Our Fears Don't Always Match the Facts, published in March 2010 by McGraw Hill.
***give me your comments about this
check out the Highest
(Posted 21 July 1998; latest addition 29 September