Prepublication data sharing: the Toronto statement
Sarah Tomlin
Wednesday, 02 September 2009 20:39 UTC
Sharing knowledge is key to the progress of science, but researchers do not always release data and research materials, even after publication. Earlier this year, two research communities held meetings with a broad range of stakeholders to discuss ways to promote data sharing in biology, and they report their recommendations in two Opinion articles in this week’s Nature. See special issue on data sharing here.
Data producers and users met at a workshop in Toronto in May to discuss the benefits and best practices of rapid data release prior to publication. Ewan Birney, Tom Hudson and colleagues report the main conclusions of these discussions in a community statement, free to access here.
The Toronto group propose that the principles for early release of genomics data should be extended to other large datasets in biology and medicine. A grace period should be allowed, if requested, to enable data producers to analyse and publish their dataset, but this should be limited to one year. The authors also suggest a set of best practices for funding agencies, scientists and journal editors.
The recommendations are intended to spark community discussion on this subject. Ewan Birney, Tom Hudson and others will be responding to reader comments in this forum. Be sure to have your say.
Updated 09 September 2009 17:06 UTC
-
Replies
Jump to resultsResults
-
I’m all for this. I am trying hard to convince some collaborators that we’ve been sitting on some transcriptomics data so long that keeping it to ourselves is no use to us or to anyone else, and we may as well release it, even if we haven’t yet submitted the paper(s) to go with it. That is, we’ve well passed the one-year point.
But it’s an uphill battle. Fear of being scooped is deeply ingrained. I say, if someone else can datamine and see something we didn’t, so much the better for science!
-
There is always this interplay about not necessarily being “scooped” but being embarrassed that someone else can do something with the data that you missed – it’s an understandable fear. I’ve found that once one has gotten used to regularly releasing data, actually the number of times this happens is pretty low, and as you say, surely it’s a good thing if someone else finds some use for it.
These days with journals like BMC Genomes/Plos One etc, getting a very technical paper published about the data is far easier (you don’t have to reach some “sexy” criteria), so you might want to think about a technical publication on what the data is so at least there is something published and citeable if anyone does use it.
-
Sharing knowledge is key to the progress of science, is basic, I know.
So, I have a database of NIR spectral signatures of herbal drugs (Brazilian).
The questions are:
Where could I place it?
Where are this resources for data sharing international for
scientific research?
How much time it is possible to maintain this database?
For now, to me, here, are only thoughts, very good ideas. Greetings! -
Ah, good to see the outcomes of this workshop published (I was not in attendance but Steve Scherer and Lars Feuk from our group were – I’ll point them in this direction in case they want to weigh in).
-
Perhaps a bigger (and more difficult to admit) issue than the fear of being scooped is the lack of confidence in your own work, and the fear that someone will find a flaw in it?
-
I do think the fear of being “shown up” as missing something important does play a factor in this – certainly I’ve had these concerns myself on occasion, in particular when you don’t feel completely confident in the data that you are processing/analysing.
There are two things in my experience to help frame this. Firstly often you can achieve what you set out to do – in some sense if a dataset provides you with the result that you want, then you can be happy that you at least achieved your own goals. The second is to take a genuinely “I want to learn” view of watching how other people use your dataset.
Of course, in practice, not that many people jump on datasets as soon as they are made public, and usually people want to contact you directly to understand some aspect better. In other words this becomes a really just something internal.
I just think you have to get used to releasing data early. Once one does this regularly, it becomes second nature. At the start, being forced by a funding agency helps, but I know always expect my data to come out as soon as it is sensible to do so.
-
Dear members of this network, think about … the data availability or utility needs visibility and scientific discussion, for example, about data acquisition methods and or pre-treatment, pre-processing, etc. It is many times more significant than when some groups only use it.
I think the group Nature, could host such databases, perhaps? May be, we need affirm something like the terms of the free software for scientific data sharing !? -
In biomolecular data, there are well established data collections, many of them run by the EBI:http://www.ebi.ac.uk/ or by NCBI:http://ncbi.nlm.nih.gov/ – often as joint projects between them – that provide both a place to archive, retrieve and search/manipulate the data. Often there is an established set of guidelines for metadata, and often journals (such as Nature, but many others) require submission of data to these databases. Although many of the detailed aspects here could be improved – and this is my day job, so I am all to aware of those – I think overall biology has a good conceptual way of handling this.
One long standing headache in this area is that although many funding agencies both explicitly and implicitly acknowledge the need for these data archives, creating the right sort of funding streams which is matched to the demands put on these databases is complex. This partly because many funding agencies are understandably focused primarily on research, and not on the “infrastructure” for research, and partly because as this is a very inter-linked worldwide activity, there is a tendency to let another funding agency handle the load. This headache is now being addressed, for example by projects such as ELIXIR:http://www.elixir-europe.org/ in Europe which is part of the ESFRI process, but it is still unclear precisely how this will work – we are all very positive about these developments, but we still have a complexity of handling the technical issues about funding this sort of “on-going” scientific infrastructure. In the US, the NIH, via their intramural institution, NCBI has been the clear leader in the US provision of biomolecular databases, but there are interesting questions of scale (as biology becomes more data intensive) and that the scope of biological data goes beyond the medical focus of NIH.
These however I think are complex issues that need to be worked through to provide a vibrant and effective “public goods” arena of biological data, free for all scientists to use in every country for the good of everyone.
-
It would be great to be able to carry out research through knowledge sharing.
As always in science exist people that are really creative in the science, and others
personsscientist who collected and operate very well data only (operate well in the operation of scientific data base, but are poor in creation of ideas).The union of these qualities in the scientific are factors that make science progresses!
-
Of course, Mark.
The other important factor is honesty and mutual commitment to cooperate, in order to develop and provide the research you want to do
Results
-