One offshoot of this data licensing discussion is that we really are throwing data online without a lot of thought. Someone’s going to need to fund this stuff and it’s not going to be cheap. I have started using the quip “open data is free as in a puppy” in my talks (hat tip to Anna Gold, in whose talk I first heard the phrase).
Cameron Neylon has a must-read post on the this aspect of the data commons :
So I started with the notion of paying researchers to make data available, originally phrased as ‘pay the journals to buy papers’. What I really meant was paying people to put research results somewhere useful. So let us imagine we can pay people to deposit data (we’ll figure out how later). We don’t want to be swamped with rubbish so the data has to be well structured, tagged up and machine readable. If we’re paying for it, we set the standards. We also want to encourage re-use of data, perhaps by paying a premium for the deposition of data that re-uses other data. And in turn, perhaps pay a premium to those whose data is re-used.
Funders are putting hundreds of millions into data centres that no-one is too sure what to do with. Maybe that money could be used more effectively to drive data deposition quality. Some funders may also see this as a good model for direct funding. Putting money in to drive the generation of specific data set. Channel funding through the foundation to pay groups to deposit the results rather than pay them to do the research. For small foundations or charitable concerns this may be a much more effective means of driving the outcomes they want.