Who wants to be eating dirt? Musings on search, self-archiving, citations and "findability"
Hilary Spencer
Wednesday, 30 April 2008 22:54 UTC
A lively (and long) discussion follows Jennifer Rohn’s post, In which I get into a little muddle about archiving. I’d like to respond to something said about 2/3rds of the way down the comment thread, but I’m starting a new post as I’m not sure how many people will take the time to read all 109 comments (go Jennifer!)
In comment 50 or 60-something (ok-I didn’t count), Henry Gee poses the following hypothetical example:
Group X discovers something which they write down with a time stamp. Group Y makes the same discovery – by the time Group X gets wind of this and has mobilized its army of intellectual property droids, Group Y has published it and has gotten the credit. In the eyes of the world, Group Y has made the discovery and Group X is eating dirt.
Let’s modify this a bit: Say Group X posts a preprint of their manuscript on a preprint server like ArXiv, time-stamping it, but more importantly, making it easily available via Google. Even though Group Y publishes first, say they publish in a closed-access journal to which few universities have subscriptions (at the worst, say the journal doesn’t even make their articles available online—it’s print only). Now whenever someone is looking for articles on Z in Google, they tend to find Group X’s paper on ArXiv, but not the journal article. Group X’s article on ArXiv is cited a couple of times, while Group Y’s languishes behind a subscription wall, or, at the worst, in the university library’s stacks… So is “getting credit” getting the first publication, or getting one’s work consistently cited and used as a reference point?
Now, most journals have been moving towards making their content available online, and many universities have extensive journal subscriptions, so this might be an extreme example. Most people will cite a peer-reviewed article when given the choice of either citing a preprint or a peer-reviewed article. But people will almost always cite articles that they can find and read, over those that they can’t. (Isn’t it a breach of ethics to cite documents one hasn’t read?) Most publishers realize this, and are working to increase the “findability” of these articles, but sometimes they don’t do the best that they can. Sometimes sites like also Google rank papers on ArXiv higher than those on journal sites (though the question of the role of page rank in literature reviews might best be left for another day.)
One might also argue that no one uses Google to search for research, though trends suggest that this is changing, especially with the current crop of undergraduates. 1, 2
If the published version is difficult to find, but one is able to read the preprint and finds it useful, then I suspect that one is more likely to ferret out a copy of the published version (even in the university stacks) for citation purposes. One is more likely to spend the time and effort trying to get a copy of an article that one already knows will be useful over an article that may or may not be. (This might explain why papers with posted preprints tend to get more citations than those without available preprints 3, 4 and why open access articles tend to have higher citation rates 5).
There are many stories of disputes over claims of inventions (the modern computer, photography, the radio, the telephone, the steam engine…) A recent article from the NY Times’s Week in Review discusses Thomas Edison’s invention of the phonograph, noting that 17 years prior to Edison’s patent, a Parisian inventor had already created a device to make visual recordings of songs. Who was he? Who knows? The article goes on to note that Edison was perhaps not the first to invent the lightbulb, and was only credited as doing so when the Supreme Court ruled that the prior inventor’s patent was too broad. Did you know this? I didn’t.
The Times article suggests that credit for inventions is often correlated with who is able to make theirs accessible to the public, and not necessarily with who was first, even in the filing of patent documents. The author also notes the importance of timing: “Great ideas, while perhaps not novel, are delivered to us…just as we’re hungry for them.” Perhaps being the first to publication isn’t always the key to receiving credit, just as being the first to patent doesn’t mean that you won’t be eating dirt later.
1 Student Searching Behavior and the Web: Use of Academic Resources and Google
2 Information Illiterate or Lazy: How College Students Use the Web for Research *Disclosure: I didn’t read this article because I don’t have access, so I’m citing the abstract.
3 The Citation Impact of Digital Preprint Archives for Solar Physics Papers
Preprint version
4 E-prints and Journal Articles in Astronomy: a Productive Co-existence
Preprint version
Updated 01 May 2008 19:59 UTC
-
Replies
Jump to resultsResults
-
David — I accidentally discovered once that to make strike-throughs you type a dash before and after the words to be so struck. It works.
-
I am coming to this lively discussion a little late. Possibly because all the examples are in biology and a dumb physical scientist like me gets put off by long words. What is interesting in my field (materials science), is that in the good old days when the internet was not even science fiction (and I hasten to add before my time) there were many examples of duplicate discovery. These now have double barrelled names. The Frank-Reid source, Nabarro-Herring creep, the Hall-Petch relation. Of course citation metrics had not been thought of either in those halcyon long gone days.
More seriously, you are all correct, priority is all these days. There is a real temptation to publish early when your work is not complete to make sure you get in there first. I am living in nervous horror at the moment. I am working with a student on a new model for nanomechanical deformation that no-one else has seen – even though it is blindingly obvious. I nervously scan preprint servers for my nemesis, a frisson opf horror when I see the title, a quick read of the abstract and we are safe. When can we publish? But we must be complete in our argument and model, or at least complete enough before we dare publish.
-
Hi Brian – Would you post your work on a preprint server when you feel it is complete? If you wouldn’t, why do you think your nemesis would? And finally, if you did find that your nemesis had posted something, what would that mean for your research? (I’m assuming here that you scan ArXiv… but if you use another preprint server, please let us know.)
-
Hilary, Implications of others beating you to it are different depending on your position. If you are a tenured academic (such as me) you curse and move on. However, for the student it will appear a much more serious issue, possibly even the end of the world.
Of course things aren’t really that simple, it is highly unlikely that someone else will have exactly the same model as you are proposing. You have a good idea from the literature how others see the problem and you can be cofident that they won’t spring any surprises. Of course I may be wrong (in my work) and that is why no-one else is doing it!
I don’t really like preprint servers other than as a way of setting out something you are already submitting.
-
I think Brian’s point really goes to the heart of the argument. He says ‘the implications…[of being scooped]…for the student will appear much more serious, possibly the end of the world’
Yet our response to this is not to put make a clear claim ‘we have done this, we think this’ which would protect the student, but to delay; actually risking precisely the thing we are trying to avoid. The potential consequences are devastating yet we constantly play this game of brinksmanship. And this is despite the fact that the work would probably go faster and be more complete and comprehensive with other’s input.
My view is; if you make a clear, easily findable, and unambiguous claim to a piece of data or an idea, whether or not that claim is peer reviewed, for someone else to describe the data or idea in a peer reviewed publication without attribution is unethical behaviour of the sort that should get people tarred, feathered, and run out town.
The problem in my view is building a system that makes these claims easily and effectively findable. Currently it is possible for someone to publish a paper and say later ‘it wasn’t in PubMed/WOK/whatever so I didn’t see it’. If we can change that then I think we can make the whole endeavour more productive.
Science is not a zero sum game. We all win when more people make a contribution (paraphrased from Deepak Singh)
-
I’ve discovered a new forum on Nature Network called Citation in Science. I think it is great to have a focus to discuss the issues in Allan’s topic list (at the link). Please join if you are interested in continuing the conversation there.
-
I think the decline in impact factor of Cell Press has a lot to do with their findability, and the ease of findability of NPG articles certainly hasn’t hurt them.
Results
-