How’s this for a provocative title: Why Most Published Research Findings Are False:
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
The article is relatively short and readable, and makes me wonder why I haven’t heard about it before — I guess because it isn’t in scientists or journalists interest for the public to know this. Here’s something to ponder as you read the lastest research finding:
Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. This seemingly paradoxical corollary follows because, as stated above, the PPV of isolated findings decreases when many teams of investigators are involved in the same field. This may explain why we occasionally see major excitement followed rapidly by severe disappointments in fields that draw wide attention. With many teams working on the same field and with massive experimental data being produced, timing is of the essence in beating competition. Thus, each team may prioritize on pursuing and disseminating its most impressive “positive” results. “Negative” results may become attractive for dissemination only if some other team has found a “positive” association on the same question. In that case, it may be attractive to refute a claim made in some prestigious journal. The term Proteus phenomenon has been coined to describe this phenomenon of rapidly alternating extreme research claims and extremely opposite refutations [29]. Empirical evidence suggests that this sequence of extreme opposites is very common in molecular genetics.
Make sure you read the part “Claimed Research Findings May Often Be Simply Accurate Measures of the Prevailing Bias” – and wonder how much out there is a null field and ponder this the next time someone tells you about “the scientific consensus” in a particular field.
There is science, and then there is science – like the time my mother claimed a study showed the worst weather was on Saturdays and the best was on Tuesdays. Does such a study have any meaning – the weekly cycle is a human invention that has no basis in meterology, but statistically you can pick out a “best” and “worst” based some definition of weather quality (rain, snow, departure from mean temperature, or whatever).
But even within real science, there is research and then there is research. Number one would be studies that are just too small to pick out the effect they are looking for. When examining probabilistic effects, sample size matters. How much does smoking increase heart disease? It’s not a simple smoke and get heart disease, or not smoke and don’t. It’s normally 50% of non-smokers get heart disease, and 75% of smokers do (in made up numbers). Teasing that kind of information out of an assemblage of non-identical people requires lots of people. I’d be willing to bet most health studies simply lack enough participants out of the gate to be reliable. Yet they still happen, the results are still reported breathlessly, and some other equally unreliable study will be equally breathlessly reported when it contradicts the first – or worse, the study that comports with accepted ideas will be given far more play than the one that doesn’t.
Common sense has a part in science:
Finally, instead of chasing statistical significance, we should improve our understanding of the range of R values –the pre-study odds — where research efforts operate [10]. Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship. Speculated high R values may sometimes then be ascertained. As described above, whenever ethically acceptable, large studies with minimal bias should be performed on research findings that are considered relatively established, to see how often they are indeed confirmed. I suspect several established “classics” will fail the test.
Sadly, common sense isn’t common enough.
I suppose this is one of the things turn of the century physicist types disliked about quantum mechanics – probabilistic vs. deterministic results. The great thing about all the classic phyiscs experiments is that they are deterministic — the speed of light in a vacuum is a constant that can be measured; either there is an ether or there isn’t (as Michelson and Morley proved), the charge on an electron is constant and can be measured exactly (for which Robert Milikan won the nobel.) But I digress.
There have been a couple of recent responses to Dr. Ioannidis. First is
Most Published Research Findings Are False — But a Little Replication Goes a Long Way:
In a recent article in PLoS Medicine, John Ioannidis quantified the theoretical basis for lack of replication by deriving the positive predictive value (PPV) of the truth of a research finding on the basis of a combination of factors. He showed elegantly that most claimed research findings are false [6]. One of his findings was that the more scientific teams involved in studying the subject, the less likely the research findings from individual studies are to be true. The rapid early succession of contradictory conclusions is called the “Proteus phenomenon” [7]. For several independent studies of equal power, Ioannidis showed that the probability of a research finding being true when one or more studies find statistically significant results declines with increasing number of studies.As part of the scientific enterprise, we know that replication — the performance of another study statistically confirming the same hypothesis — is the cornerstone of science and replication of findings is very important before any causal inference can be drawn. While the importance of replication is also acknowledged by Ioannidis, he does not show how PPVs of research findings increase when more studies have statistically significant results. In this essay, we demonstrate the value of replication by extending Ioannidis’ analyses to calculation of the PPV when multiple studies show statistically significant results.
Sorry Virginia, don’t trust a result until it’s been replicated more than once. When will you know, since you’ll never read about even a second study replicating the first in general publications? Now you’re starting to see the problems I hope.
The other response is When Should Potentially False Research Findings Be Considered Acceptable?:
As society pours more resources into medical research, it will increasingly realize that the research “payback” always represents a mixture of false and true findings. This tradeoff is similar to the tradeoff seen with other societal investments — for example, economic development can lead to environmental harms while measures to increase national security can erode civil liberties. In most of the enterprises that define modern society, we are willing to accept these tradeoffs. In other words, there is a threshold (or likelihood) at which a particular policy becomes socially acceptable.In the case of medical research, we can similarly try to define a threshold by asking: “When should potentially false research findings become acceptable to society?” In other words, at what probability are research findings determined to be sufficiently true and when should we be willing to accept the results of this research?
Here’s the basic conundrum: If you don’t do any research, you won’t discover anything. If you do do research, you will discover all kinds of stuff that isn’t so — and you won’t be able to tell the accurate from the spurious without even more research. And you will do things that while intended to help will in fact cause harm. Of course, the same thing will happen without doing any research.
The conclusion:
In the final analysis, the answer to the question posed in the title of this paper, “When should potentially false research findings be considered acceptable?” has much to do with our beliefs about what constitutes knowledge itself [24]. The answer depends on the question of how much we are willing to tolerate the research results being wrong. Equation 3 shows an important result: if we are not willing to accept any possibility that our decision to accept a research finding could be wrong (r = 0), that would mean that we can operate only at absolute certainty in the “truth” of a research hypothesis (i.e., PPV = 100%). This is clearly not an attainable goal [1]. Therefore, our acceptability of “truth” depends on how much we care about being wrong. In our attempts to balance these tradeoffs, the value that we place on benefits, harms, and degrees of errors that we can tolerate becomes crucial.
…
We conclude that since obtaining the absolute “truth” in research is impossible, society has to decide when less-than-perfect results may become acceptable. The approach presented here, advocating that the research hypothesis should be accepted when it is coherent with beliefs “upon which a man is prepared to act” [27], may facilitate decision making in scientific research.
So why do research? Because you will have less imperfect information on which to act.