• john wilbanks' blog by john wilbanks

    Agitating for innovation through open licensing and good technology.

    • On a data commons

      Sunday, 11 May 2008 - 15:14 UTC

      One offshoot of this data licensing discussion is that we really are throwing data online without a lot of thought. Someone’s going to need to fund this stuff and it’s not going to be cheap. I have started using the quip “open data is free as in a puppy” in my talks (hat tip to Anna Gold, in whose talk I first heard the phrase).

      Cameron Neylon has a must-read post on the this aspect of the data commons :

      So I started with the notion of paying researchers to make data available, originally phrased as ‘pay the journals to buy papers’. What I really meant was paying people to put research results somewhere useful. So let us imagine we can pay people to deposit data (we’ll figure out how later). We don’t want to be swamped with rubbish so the data has to be well structured, tagged up and machine readable. If we’re paying for it, we set the standards. We also want to encourage re-use of data, perhaps by paying a premium for the deposition of data that re-uses other data. And in turn, perhaps pay a premium to those whose data is re-used.

      Funders are putting hundreds of millions into data centres that no-one is too sure what to do with. Maybe that money could be used more effectively to drive data deposition quality. Some funders may also see this as a good model for direct funding. Putting money in to drive the generation of specific data set. Channel funding through the foundation to pay groups to deposit the results rather than pay them to do the research. For small foundations or charitable concerns this may be a much more effective means of driving the outcomes they want.

      Last updated: Sunday, 11 May 2008 - 15:14 UTC

      • Comments

        • Date:
          Monday, 09 Jun 2008 - 23:06 UTC
          sebastian gonzalez said:

          This remember me an interview I made to a CS about open acces to data a week ago. Is in spanish, but I will quote and translate some ideas here:

          *The problem:

          1.- Scientific investigation is a circus. Every one measures what they want.

          2.- This is made with money from the government the most of the time.

          3.- Every time somebody want to test an hypothesis, must make her own experiments and lose time and money in this.

          Proposal:

          Make a model of the principal measurements (in eeg, genomics, etc) that allows to interchange the data. This data must be public because is made with money from the government.

          1.- XML based model

          2.- Must be a necessary condition for publication.

          4.- In every XML scheme must be a header with: author, date, GPS, observations and ID.

          5.- It must define obligatory fields, even if they are not used in the experiment to avoid doubts about the validity of the data.

          6.- When it want to be used, it’s enough with indicate the ID.

          7.- All the data must be stored in open access databases, and there must be web interfaces that enable any user to launch querys on the data. First they will be made by query language, in the future better interfaces enable humans to make querys in human language.*

          Here the complete interview.

          As I like to say: my taxes, my knowledge.

          Best,
          Sebastian


Search blogs

web feed Want a blog?

Submit this post to

Advertisement