I wrote a rather lengthy, badly phrased post here asking about alternatives to the current peer review process: Do alternatives exist? Are they better, worse or just different to current practices? Should we promote a discussion about how to best review and judge our colleagues work for the betterment of science (and humanity)?
Rereading the above paragraph, I think I’ve just rewritten that 1st post in a much better way. More succinctly and without all the whining, borne of frustration claptrap that so obfuscated my main points in the original post.
However, thanks to the good people at EcoLog, my attention was drawn to a similar discussion being held elsewhere, in the morally and financially bankrupt hallowed pages of the Economist]. They highlight an article in PLoS Medicine1 by Folk Rock Legend Neal S. Young, John Ioannidis & Omar Al-Ubaydli, on “Why Current Publication Practices May Distort Science”. Do selection processes in high profile journals weight certain aspects of submitted papers more highly than their scientific rigour?
Earlier, one of these authors investigated the most “significant” research (i.e., cited > 1000 times) in the academic journals with the highest impact factors. They conclude that this research is the surprisingly likely to turn out to be wrong! They showed that around 1/3 of a sample of 49 papers was later refuted by other studies2.
Now, I’ve only scanned through the most recent article1, but they acknowledge the “paucity of empirical observations on the process of scientific publication”, they don’t attempt to disguise the assumptions they borrow economics (the “Winner’s Curse”) to underlie their thesis, and they highlight the relevance of uncertainty in the scientific process. But they don’t seem to carry out a comparison of refutation within lower impact factor journals.
The discussion in PLoS makes for interesting reading – with voices for and against some of the arguments, and some interesting alternatives to the current peer-review system3.
Technological advances are changing the way we publish science, how long before we harness them to change the way we edit and judge it before publication?
I’m now going to enjoy some people breaking the blues with John Mayall before my weekend starts in earnest. Have a good one everybody!
You might also find the recent discussion at our blog interesting, see Peer Review V
Peer Review And Innovation In Science
A. “The new face of peer review”, in “Funding Opportunities and Advice” forum, at
http://www.the-scientist.com/community/posts/list/298.page
refers to “changes to the peer review process”.
B. However, “peer review process” is the least disturbing aspect of “peer review” in science
Samples of factual observations of other negative aspects of peer review in science:
- http://www.digibio.com/archive/SomethingRotten.htm
“A U.S. Supreme Court decision and an analysis of the peer review system substantiate complaints about this fundamental aspect of scientific research. Far from filtering out junk science, peer review may be blocking the flow of innovation, and corrupting public support of science.”
- “Peer review stifles innovation, perpetuates the status quo, and rewards the prominent. Peer review tends to block work that is either innovative or contrary to the reviewers’ perspective.”
C. “Peer Review” is, factually, a tool of a “Subversive Activities Control Board”
The most revolting corrupt aspect of peer review in science is its exploitation by the Science Establishment to tightly clamp its political and financial omni-everything rule and control, including stifling of any shred of scientific innovation.
D. The corruption is not inherent in the tool, but in the nature of the Science Establishment
“Implications Of Science And Technology Evolution”
http://blog.360.yahoo.com/blog-P81pQcU1dLBbHgtjQjxG_Q—?cq=1&p=419
The peer review process is but a tool of the Establishment. The corruption is not inherent in the tool, but in the nature of the Science Establishment.
As long as Science and Technologhy are considered and handled, conceptually and administratively, as one realm and one faculty this corruption cannot and will not be overcome. This conception and attitude is THE CORRUPTION OF SCIENCE BY THE 21st CENTURY TECHNOLOGY CULTURE.
Dov Henis
(A DH Comment From The 22nd Century)
http://blog.360.yahoo.com/blog-P81pQcU1dLBbHgtjQjxG_Q—?cq=1
Thanks for the links, Sabine and Dov.
Sabine, I agree with many of the ideas in your Peer Review V post (and others on Backreaction). To suggest that peer review is completely “broken” is probably an oversimplification. If there are repeating problems with poor reviews, reviewers or editors, then something probably needs fixing though (I’m excluding poor science deliberately here – that will always get done). We need to filter these things more carefully somehow, to waste less time at each stage of the review/editorial process.
To ensure that the scientific method is as robust as possible (which has to be an aim of science), we must also strive to only publish the most interesting, sound ideas. While interest is clearly a subjective, shifting measure, soundness should be more objective.
Better training? More feedback?
There’s going to be variation anyway, but at least dealing with the worst performances should be possible. I suspect another problem is that typically only two or three referees are asked to view a paper: it’s more difficult to sieve out the subjectivity that way. But asking (say) 10 isn’t feasible. Oh dear, what to do?
Well, discussing experiences more openly is a start. I watched Kinsey last night (with Liam Neeson). A rather fun film, which also raises the serious point, how can we hope to understand what normal behaviour is, unless we collect a lot of data, examine and analyse it, then compare our own experiences to those of a suitably large sample? I think some large scale studies of the effectiveness of the peer review system could be a useful exercise.
Well done, Mike. I disagree with Dov’s point C in particular, but I suppose I might be seen to now represent the “establishment” albeit in a lowly, junior manner. I do agree with Sabine’s points, though.
She wrote:
“My largest frustation is that people don’t take it seriously. It has happened in many instances that I wrote a long report on a flawed paper and recommend rejection, only to see later that the paper got published in a different journal in exactly the same version.”
and I confirm her experience. I personally very much appreciate a presumably impartial, well-written, thoughtful review and know how much time it takes, and how much commitment to improving the scientific enterprise overall.
Peer review is a civic gesture designed to improve the whole community’s production. Reducing pressure to publish would possibly help improve review quality, but as long as it is easier for some than others because of the variability among the perceived value of disciplines and journals, I suspect there will always be those who value quantity over a difficult to appreciate quality.
Heather – I don’t know whether to comment on your blog here or there… As you’ve just scratched my back here I suppose I should haul ass over there to repay the favour :)
The point you & Sabine mention here (authors ignoring feedback and resubmitting elsewhere) relate to stochasticity in the amount of shared knowledge in a field. If you are in a field with a lot of researchers, it’s likely that resubmission elsewhere will lead to a new set of reviewers, with different knowledge/ability.
If you’re in a small field, it’s much more likely your (re)submission to a new journal goes back to the same person.
There’s pros & cons in both cases – the more objective reviewers & authors are, however, the less of a problem this should be.
No problem – it’s your forum! The issue is more about not being taken seriously in a set of critiques into which, generally, a fair bit of thought has been placed. If overall they appear too difficult for the authors to undertake, rather than making any attempt to improve their work, they hope that they will hit on someone else who will make different or less exigent comments. Then you, as a reviewer, feel like all your time has gone to waste.
The discussion of linearising the review process (discussed/linked to somewhere in here) helps to avoid these problems. By keeping an article within a journal/publishing house, the reviews can follow the MS, reducing the extra number of reviews required and avoiding ignorance of constructive reviewer comments.
That only works if articles can be streamlined in this way though. If Nature rejects but suggests another NPG journal, but the author doesn’t want to resubmit to Nature PubGossip instead, they can ignore reviewer comments and go straight to Science, hoping for different reviewers.
If overall they appear too difficult for the authors to undertake, rather than making any attempt to improve their work, they hope that they will hit on someone else who will make different or less exigent comments
It seems to me that you’re being a little unfair with the authors here. It may happen, for example, that one of the reviewers simply dislikes the conceptual framework you used for a given paper, and writes a very thorough comment explaining why. However, you remain unconvinced and still think that your manuscript has intrinsic merits as it is. In that case, rather than engaging in a long battle of wills with the reviewer and the editor, it just makes sense to submit it elsewhere, doesn’t it?
Granted, this is very difficult to distinguish from those who are being lazy and rather than addressing the shortcomings of their work, resubmit serially until it sticks somewhere. But that’s precisely why this is such a difficult problem
Christian, I think you’ve hit on a very interesting point with respect to peer-review here, and that is whether a reviewer’s point-of-view has more weight than an author’s in a methodological/conceptual discussion.
In the case you describe, the reviewer and author may be in the same general field, but if either/both of them insist upon using a certain method to the exclusion of others, are they really “peers”?
I completely agree that rejection of a MS on conceptual/methodological grounds must be supported by a thorough explanation of why it should be rejected on those grounds, but I fear (and have experienced) that this is not always done. Reviewers can ask for supporting reasons for a certain methodology if none are given in the MS, but authors don’t always get the chance to ask for the same info from reviewers to back up their criticism.
Cristian – true about the unfairness if you are talking entire conceptual frameworks. But when there are interpretational flaws, or shoddy statistics, and you see them unchanged elsewhere, it’s a bit frustrating.
I’ve been on both ends of the stick. As an author, I think that if I only ever see a comment made once in the rounds, it’s more a perspective issue from that one reviewer. If more than one person brings up the problem, it might well be me.
One of the more difficult papers I had to submit was regularly described as being “too descriptive”. It was, from their point of view. However, this almost flip comment betrays a bias of the reviewers as to the worth of careful observation, which is all the paper purported to be. After offering many examples of predictions that could be tested by looking at these observations, the paper was accepted. I (and apparently the editor) thought it was improved by its trial by reviewer.
This comment has been picked up in the Nature Network summary today (9 Dec): while Mike Fowler notes that the findings of one in three top papers are later refuted by other authors, according to PLoS Medicine.
I’ve read your post above, and I’ve read (quickly) the Plos Medicine essay. I don’t see that information. “one in three top papers are later refuted by other authors” – does this mean that the earlier papers are wrong and are formally retracted? Or does it mean that later authors disagree with some of the interpretation? (ie that the papers are not “refuted” but “argued about” which is very different).
Which are the “top papers”?
Attention-grabbing statements, but what do they mean, really? How robust and thoughtful are they? (I am asking the person who wrote that Nature Network summary as well as you.)
The PLoS Medicine article cites another by Ioannidis in JAMA, stating “An empirical evaluation of the 49 most-cited papers on the effectiveness of medical interventions, published in highly visible journals in 1990–2004, showed that a quarter of the randomised trials and five of six non-randomised studies had already been contradicted or found to have been exaggerated by 2005.”
That study seems sound at first glance; I can quote more from it if there is restricted access, but the summary of it seems just, given that its author was one of the et alia in Young et al. Ioannidis used “the 3 general medical journals with the current highest impact factor (New England Journal of Medicine, JAMA, Lancet) or in medical specialty journals with impact factor exceeding 7.0 (according to the Journal Citation Reports 2003) that are likely to publish clinical research”.
I had not read the Young at al. essay either in detail, but I found many of the statements rather depressing. (But that’s my general reaction to economics.) Be that as it may, I think there are actually sufficient outlets for good work – as I tell my personnel, it’s easy to get published, but it’s more difficult to attract attention. So when we had been going the rounds with our “descriptive” paper, we had aimed for journals which would be read by the peers whom we would like to read the report. That is why review by those same peers is so valuable. Otherwise, you know that your paper will get pulled up sooner or later by really interested people doing keyword searches in PubMed.
I think the real commodity is not the print publication or the journal’s impact factor but the real impact, which is only imperfectly measured by those surrogates. Young et al. say that one reason that authors choose a journal is for branding purposes, and I would tend to agree with that much.
Just thought I would highlight some editorial changes at the EMBO journal, which they hope will enhance the transparency of the editorial process. One of the changes they have made will be to publishing online author and referee comments. This will be implimented in 2009.
It will be very interesting to see what happens as a result.
Heather – I am aware of the earlier Ioannidis study, which has been discussed online quite a bit – and that it concludes this about clinical trials.
But that is a long way from “Mike Fowler notes that the findings of one in three top papers are later refuted by other authors, according to PLoS Medicine.” I think that people making statements such as these should support them better. It is so easy to write this kind of thing, less so to provide proper substantiation. I have seen no evidence anywhere that “one in three top papers have been refuted” – criticised, corrected, added-to maybe, but “refuted” means that they have been found to be wholly wrong. I do not believe this and would like to see the support for that statement (preferably from those people who made it).
The design of clinical trials is a topic in itself, but scientific journals do not publish clinical trials – you cannot extrapolate the two. Even if you believe the Ioannidis view on clinical trials (which some people like to do), this is not in the slightest bit relevant to the scientific research literature.
By the way, this is probably clear from Mike’s post and other discussion, but it is the article(s) by Ioannidis et al. that forms the view of the (very weak, in my opinion) Economist article, an article which chose to ignore in toto the answers to questions and other points made by at least one scientific journal that was approached in advance while the piece was being written.
So there is some circularity to the discussion both within this conversation thread, and between this conversation thread and previous online rehashes of the Economist and PLOS (and JAMA) articles.
I think that people making statements such as these should support them better. It is so easy to write this kind of thing, less so to provide proper substantiation.
Maxine, you are quite right, not least because the type of report needs to be specified (a “paper” is not necessarily comparable to another “paper”), but also because there is clearly a market for scandal and splash that is being served by only selective inclusion of comments from journal(s) that quite rightfully can be included among the “top” publishers of scientific (if not clinical) work.
This is not quite the standard that one would hope to see represented by a “top” publisher of economic news.
Hi folks, thanks for continuing the discussion in my absence (an unplanned hospital visit wrecked my timetable last week).
Maxine, I’ve finally found the offending article that cites my post. Heather has linked to the correct, original paper, which was not linked in my post (thanks Heather, I was being a bit lazy, sorry). Please remember though, these posts are blogs – brief, scurrilous summaries of interesting scientific ideas, not robust peer reviewed research articles! Interested readers can follow links to investigate things more deeply.
Here’s another Editorial point of view published as an essay recently in Ecology Letters that relates to this discussion1. Hochberg et al. (all Editors themselves) highlight two pressures that may lead to a “Tragedy of the Commons” in academic publishing.
I completely agree that any review that points out errors of fact… must lead to corrections in a manuscript, regardless of where it will next be submitted. However, I think the above quote confuses fact with opinion a little. Editors should be able to discern between these and arbitrate in any disagreement between reviewer and author.
@ Maxine (10.12.08, 08:22) -
this sounds like you have more personal insight into this Economist article than most of the rest of us ;) Do share!
1 Hochberg et al (2009) The tragedy of the reviewer commons. Ecology Letters 12: 2-4
doi: 10.1111/j.1461-0248.2008.01276.x
I’ll also stress this article by Ray Hilborn (2006)1, where he highlights 4 cases of major errors slipping through the peer review/editorial process in fisheries science. It’s a really short essay, pointing out how this research (and the errors within) sometimes made it to national media outlets, and is well worth a look. Hilborn also links to this NY Times spoof.
1 Faith-based Fisheries, Hilborn, R. (2006) Fisheries 31: 554-555
Of course errors, large and small, slip through the peer review process. To err is human, etc. But journals correct errors when they are pointed out. This is very different from stating that a substantial number of papers in “top journals” are “wrong”.
Mike, if you define a (this?) blog as “a brief scurrilous summary” etc, then I think I’ll leave the discussion at this point.
I note that whoever wrote that summary of your post has not popped over here to defend it, though!
I have a little difficulty getting the concept of a paper that is just plainly “wrong”. Some of the examples that Hilborn gives seem to me to be more like a matter of interpretation. If we are going to count every article in which we disagree with the conclusions derived by the authors (either because we think they are overstating their case, or they don’t have enough data to support them, or they are interpreting their data the wrong way, or whatever) as “wrong”, then these kind of meta studies are going to be highly variable, depending on who writes them, don’t you think? After all, we all have at least a particular article published on a high profile journal that we simply hate and think that did not deserve to get there…
Maxine, I was simply trying to make the point that I use my blog as a non-technical, sometimes light hearted (sometimes less) sounding post for interesting ideas that come to my attention. Things I’d like to discuss and hear other views about with a different community than my immediate colleagues, friends or family.
If I (or you) wanted to write a robust, technical work in support or contradiction of another published work, I’m not sure NN would be the best outlet for it.
Errors do slip through peer review. It’s not clear how often, yet, but the only work I know that tries to quantify it comes up with figures of 16% of clinical study results being contradicted, and 16% finding stronger effects than originally reported (see the Ioannidis JAMA article for more details). If you know of other analyses that investigate these issues and find contrasting results, please do let me know.
Deep breath, insert fuzzy animal picture here.
blleeeeeeaaaaaahhhhh!
As mentioned above, Mike, Nature does not publish clinical trials, but basic science.
Of course a blog post is not the same as a scientific paper. But to say in a blog post that a substantial number of papers in “top journals” are “wrong” is quite a statement and I believe requires proper substantiation, or to be qualified. I have seen no evidence that justifies this sweeping statement. To the contrary, the evidence is that the body of scientific literature is built upon and developed. Incidentally, various studies that pick holes in the peer-reviewed literature and then extrapolate way beyond their data to conclude that peer-review does not work (a conclusion that I don’t support), are not comparing the right things, as they need to compare the un-peer-reviewed (submitted) manuscript with the published (peer-reviewed) manuscript.
I confess to having published a “wrong” paper. Embarassingly it is number 2 on my citation score. It still racks up citations every year and this is despite me pointing out the error in a later paper, which is not cited very often. In this case the error is in a mathematical derivation and I slipped up in an integration. It was published when I was a post-doc working with a very, very eminent FRS (he will not like being described as such!) who was the other author and did not pick up the error. In fact the error was only picked up some years later when I received a letter (see it was before e-mail was prevalent, that will date me) from Australia from someone who could not repeat my calculation.
What are the morals from this?
1) Referees probably do not check every step of a calculation, especially when it is in a field that had not seen a lot of modelling work up to then.
2) If you find a mistake and the author (or co-author) is an FRS maybe you believe it is you who is in error.
3) Nobody reads corrections.
On another note, when I was newly appointed to the faculty and refereeing was a novel experience, I received a paper that was wrong. I could show it was wrong because a stress could not be resolved in the direction it was supposed to be acting. I checked my analysis with an FRS down the corridor and another senior member of staff, who both agreed with me. I sent my review to the editor and 4 months later I saw the paper published, unchanged.
Moral? Possibly junior lreferees do not carry a lot of weight with senior editors.
I confess to having published a “wrong” paper. Embarassingly it is number 2 on my citation score. It still racks up citations every year and this is despite me pointing out the error in a later paper, which is not cited very often. In this case the error is in a mathematical derivation and I slipped up in an integration. It was published when I was a post-doc working with a very, very eminent FRS (he will not like being described as such!) who was the other author and did not pick up the error. In fact the error was only picked up some years later when I received a letter (see it was before e-mail was prevalent, that will date me) from Australia from someone who could not repeat my calculation.
What are the morals from this?
1) Referees probably do not check every step of a calculation, especially when it is in a field that had not seen a lot of modelling work up to then.
2) If you find a mistake and the author (or co-author) is an FRS maybe you believe it is you who is in error.
3) Nobody reads corrections.
On another note, when I was newly appointed to the faculty and refereeing was a novel experience, I received a paper that was wrong. I could show it was wrong because a stress could not be resolved in the direction it was supposed to be acting. I checked my analysis with an FRS down the corridor and another senior member of staff, who both agreed with me. I sent my review to the editor and 4 months later I saw the paper published, unchanged.
Moral? Possibly junior lreferees do not carry a lot of weight with senior editors.
Maxine, I promise I’m not just having a go at Nature, or any particular journal here. Honest! But I will repeat what I said above, that the only scientific study I know of that tries to assess the accuracy/quality/robustness of results in published literature, shows that a reasonably high proportion of results in the sampled data cannot be replicated. The wording in my original post is clearly too sensational for some, but it helps illustrate an interesting situation.
Of course, we should be asking the authors of the original work why their results can’t be replicated – they should be allowed to defend or correct their work where possible. However, Ioannidis’ results clearly highlight some of the difficulties inherent in the current peer-review process. I’ve just noticed his JAMA paper has been cited 107 times! Further reading is required here!
I discussed these ideas with colleagues over coffee yesterday, and we thought it should be reasonably easy to come up with an evolutionary model that shows that it doesn’t really pay to put a lot of effort into reviewing/refereeing papers. This is pure speculation at the moment – I’m a bit too busy to sit down and do this in practice, but cheating often arises in evolutionary models as a stable strategy for some proportion of the population. Policing does as well, if that’s included in the model, leading us to a virtual game of co-evolutionary cops & robbers.
Brian and many others (including me) have anecdotal evidence of both errors in their own work, as well as papers that have ignored errors highlighted in the review process. Once a paper is published, it is all too easy for others to cite it in support of arguments that are false or incorrect, even if a correction has been published. This leads to a lot of wasted work!
It really would be interesting and useful to quantify how often errors of fact or interpretation occur in the literature, before we can correctly judge how effective peer-review (as it is generally done now) is.
Oh, and I forgot to pick up on this:
That all depends on whether your question is about the effectiveness of the peer-review process in improving submitted manuscripts, or about the effectiveness of the peer-review process on the accuracy/quality of published literature. Two different questions.
To quote you, Mike, “The only scientific study I know of”….etc, is a study of clinical trial literature. The scientific literature that stands accused as a result of this paper does not publish clinical trials – so perhaps you should redefine the discussion here as being about the clinical literature.
There are massive grey areas between what is defined as an error and what a technical disagreement. It would be very hard to do a study along the lines you suggest to assess errors because people’s definitions of errors are not the same. There are many nuances, which is why some journals publish technical comments and other debate on papers as well as formal corrections. For example, someone submits a technical comment to Nature saying a paper is incorrect in some way. The authors provide a substantiated reply saying that it wasn’t. Referees offer an opinion on the exchange. Most of the time, it boils down to a matter of perspective, which is publishable and of interest to some people, though it does not mean the paper was “wrong” in any sense. (If it is, the journal publishes a formal correction or retraction – and these could certainly be counted up if anyone wanted to do that.)
In many ways, these types of study remind me of “impact measures”, where scientists repeatedly make the point that the metric used to define “impact” is flawed – and they make many good, and different reasons, as to why. One of them is that the scientific literature is not homogeneous. This is a similar issue in attempting “peer review studies” or studies of “errors in the literature”. There is no clean definition of the parameters. It is social science, not basic science.
Brian – what lessons you have drawn! We discussed some of them in the earlier comments on this very post as well as here… but your perspective on trying to get your correction noticed in the days before easy hypertext links was an eye-opener!
Brian, your experience could be explained by the authors of the original article showing why your “correction” of their work was wrong. If this was true, then it would be nice of the Editor to at least explain this to you. It’s frustrating (and not rigorous) to put in the time and effort of checking someone else’s work, then having your input ignored.
Did the work appear in the journal you reviewed it for, or another one? This was a topic brought up in the Hochberg et al. essay I linked to above.
I suppose the silver lining is that it gives you a chance to publish a paper showing why the original work is flawed…
That’s a very interesting and thought-provoking essay. I couldn’t help noticing this particular fragment, though:
Many authors seem to view anonymous peer review as a stochastic process: if the outcome was not successful with one journal, try again elsewhere and perhaps a new reviewer will have a different reaction. The reality is usually quite different. Different reviewers frequently focus on the same persistent set of criticisms of a manuscript.
Seems to me that if that were indeed the case, people would eventually drop this practice of serial submission. The fact that they don’t indicates that the outcome is sometimes entirely different: a particular set of reviewers may give your many-times-rejected manuscript a free pass, simply because they are overburdened with other reviews or fighting to submit their own.
That’s a good point, Cristian. Again, these are all anecdotal examples, Brian even provides a great example of this above! As Maxine suggests, it’s likely to be rather complicated, but it must be possible to design a study that can quantify some of these questions. It’s a problem that requires careful choice of the question though.
Mike – It is such a long time ago that I cannot remember the full details of the paper but it is always possible that if an editor is not an expert in the sub-field that he/she may be swayed by argument rather than proof.