STRUCTURAL BIOLOGY forum: topic

This is a public forum

posting raw data of peer reviewed experiments in lab websites

kamesh narasimhan

Thursday, 19 Jun 2008 10:11 UTC

xxxxxxxxxxxxxxxxxxxx

I have been thinking about this for quite sometime. It is a given fact that the volume of peer reviewed publications churned out in any field of research would always continue to increase…..But peer reviewed research need not always mean that the published research work will not be scrutinized in the future.

I see that one of the better ways for the scientific community to benefit from a peer-reviewed work is that the raw data of the different experiments performed be put online like in a lab website. Though it may sound at the beginning like hectic and complicated work – I think it has lots of benefits… coming to think of the volume of research output in the next few decades. The format for putting the raw data need not be very rigorous - and could be sufficient if its understandable with a reasonable degree of xplanation. Though its firmly upto the individual labs to decide on whether raw data could be posted online – it would be encouraging if its a routine way of doing research (research that does not involve sensitive technologies ofcourse) in the future decades…..

Apart from being of immense help to the scientific community in being able to reconstruct the pieces of experiments done by a lab – the biggest beneficiaries would be graduate students – who can learn many subtler aspects of data processing and manipulation that is of publication quality.

Putting raw data online also sounds stupid in that the downside of this is proabably the fear – that the raw data could be easily misused…..by many wannabes ….or by researches with competing interests.

So what do you think is the way for future research to go ? Is there a middle ground that can be reached that balances both the positive and negative issues of posting the raw data online ???

Do send in your suggestions.

  • Replies

    Post a reply
    • It sounds like a good idea, Kamesh, but as you say, I think labs would be worried about the data being misused. We find at Nature that some authors are reluctant to deposit data in public databases, and it is completely standard for them to hold release of, say, structural coordinates until the date of publication. Therefore I think it would be unlikely that people would want to post raw data before the work releated to it is published. Afterwards, though, is different, though not necessarily straightforward.

    • xxxxxxxxxxxxxxxxxx

      yes maxine,

      Can misuse of raw data be prevented by something like a “Data transfer agreement”?. I can envision a “Data transfer agreement” could be arranged between the interested parties with regulations that would discourage misuse of data (of peer reviewed work).

      Yes maxine, I do realize that posting raw data is very idealistic….but let’s see if there are any more interesting ideas that pop up….that can make posting raw data a routine and fool-proof way of doing research in the future

    • Hi Kamesh. Great to hear someone suggesting this! There are a small number of people doing exactly this in different ways. The main one is Jean-Claude Bradley’s group who started off ‘Open Notebook Science’ by publishing all their results as they happen. You can see all of their data at the UsefulChem Wiki

      My group’s work is available on Chemtools – try ‘Investigations in Neutral Drift’ or ‘Sortase Cloning’ and there are others trying a range of different approaches. The point is, it may be a bit tough but it can be done.

      One point to make is that just putting up raw data isn’t really helpful. It has to be well described, and findable to be useful to other people. In the future many people may be looking for particular types of data without connections to any paper so they have to be able to find the data, and to understand what they’ve found.

      Finally your suggestion for a data transfer agreement is understandable but there are real problems with making it work. Data by their nature are in the public domain. You have the choice of making them public for anyone to do what they like with or keeping them secret. I’ve blogged elsewhere that there is no middle ground and you should look at John Wilbanks’ posts here on nature network for a wonderfully clear explanation of why licenses are ultimately not very helpful.

      Some of us believe that it is better to simply make things immediately available and hang the consequences. For those who are a bit more worried there is still a lot of value in making the data available after publication and we would certainly advocate that as a good step – this is not so different to making pdb files only available once a paper is published.

      Good luck with your ideas!

    • Of course, one has to keep in mind that the mere fact of making your data as public as possible, be it before or after peer-reviewed publication (duly noted, Jean-Claude!), takes work. It takes time away from other experiments. There does need to be some incentive.

      What’s appealing about open notebook science is that we scientists are supposed to keep lab notebooks anyhow, so it shouldn’t take a lot more time than it would anyhow. If you are the type of person to keep up a reasonable notebook.

      Thinking, though, about my recent generation of tons of data, we have posted SAGE banks with a release date post-publication at the Gene Expression Omnibus. Yet of course we were manipulating lots of that data before publication. The beginning transfer to open lab notebook took place afterwards, but I doubt seriously we would have wanted to put the raw sequencing data out there immediately for a number of reasons having to do with my group’s viability in the interim. I also don’t write the name of whatever candidate gene I am testing while it is happening, because others could certainly do it faster than me, and I would like to finish the test of my hypothesis myself. It’s human nature.

      What is the incentive then for making negative data available after a peer-reviewed publication? For example, if we tested a couple of candidate genes in a cohort of patients, found an undescribed variant, sequenced it in 100 patients and found it in five unaffected people, we would not say that the variant is associated with the disease. But it is still good to submit to the dbSNP database. Yet we, and many colleagues, don’t – this too takes time away from experiments with potentially positive results. The benefit seems to be entirely to others, unless we realize that we too have benefited from the data earlier or later. So you are relying on people’s better nature, and that doesn’t always work as efficiently as one would like.

    • Heather – you are dead right, there needs to be rewards for making these observations available. At the end of the day I think the most effective way to do this is a simple one – pay people to do it. If you have a nice little income stream that comes from getting the data online then you will do it – you will be popular in your department for bringing in money so it will be regarded as ‘an important community activity’ and will raise the profile and importance of negative results as a valid contribution. The question is where does the money come from? One answer is for funders to actually withhold final payments (or some proportion) of money until results are deposited. This though goes close to compulsion so may not be palatable.

      Nonetheless it is clear that data deposition costs money and that has to come from somewhere, both to provide the infrastructure to do it, and to make it appealing for scientists to buy in. I wrote a bit on possible ways of making this work here

    • Hi.

      First, a quote from OpenSource:the university have a mission to advance human knowledge

      Second, a quote from myself: My taxes, my knowledge

      Third, a link to the end of science

      From my point of view, we are supporting scientific investigation because we were told to do so. And scientists got used to play with their toys without shame.

      Science started in 1800, maybe in 1600, and died in 2008. Open the data means that all the scientist that is just playing will not have jobs in a near future. That people can take decisions about the kind of research that is financed by his country. It means that the poeisis of science is not different from the rest of the productions of the world.

      It’s time to science to learn from open source, and time that governments insist in obtain data from the money that they put in science.

      For me is clear: Nietzsche killed god, Duchamp killed art, Cage killed music and Google is killing science.

    Post a reply

Search forums Advanced search

web feed

Submit this topic to

Advertisement