Please log in
  • john wilbanks' blog by john wilbanks

    Agitating for innovation through open licensing and good technology.

    • Open Access Data: Boring, but Important

      Monday, 17 Dec 2007 - 19:27 UTC

      (Home Team / Science Commons post warning)

      ((Life trumps blogging warning: I won’t be back posting here until after the new year – whatever you celebrate, I hope you have fun celebrating it))

      Over at the work blog yesterday, I announced the release of the Science Commons Open Access Data Protocol – as the title indicates, it’s not a license, but a guide to how to make data and databases available under terms inspired by Open Access.

      The best blog response I’ve seen so far is from Glyn Moody , who said that “Something calling itself a “Protocol for Implementing Open Access Data” sounds about as exciting as a list of ingredients for paint.”

      (he does go on to praise the protocol after that lede, thankfully)

      My official position on all of this is of course the position on the SC blog. But I thought I’d reveal a little of the personal story that led us to the position that we reached. It’s long, and a little self indulgent, but hey – it’s a blog!

      More after the jump…

      This whole project started with a meeting SC hosted in October of 2006. It was a small, invite only affair, and the National Academies were gracious enough to host it for us. We had about 50 people there, a crosscut of science from the genome world to chemistry to anthropology to astrophysics to social sciences, as well as representatives of various government and funding agencies.

      It was a lot of fun for me. My dad was the chair (see proud son and father picture) – he’s where I get my science tendencies – and that alone was great. But the meeting itself was a lot of fun, because it was sort of a summit of people who care about Open Science getting together and talking. It also marked Kaitlin Thaney’s debut at SC.

      We spent a couple of days hashing through what it would really mean to have a Commons in science, and we kept returning to the fundamental role of data. This is on the surface obvious. Of course data is fundamental to science. But what hit me over and over again was that no one understood the legal aspects of data very clearly, no one could figure out an algorithm for when copyright applied and when it didn’t, and everyone wanted a solution.

      Thus began the 14 months that led us to the Protocol.

      We had started with a FAQ, written largely by then-Creative Commons General Counsel Mia Garlick (a lovely piece of drafting, I might add – one of several Mia did for SC in addition to her day job at CC before we hired Thinh) on CC licenses and databases.

      At the end of the meeting, we moved away from recommending CC licenses on databases generally, and started to focus in on just the Attribution license. It had become clear at the meeting that we didn’t really know how something like Share Alike or Non Commercial or No Derivatives would fit in a data world, much less in a world where you took polar data and mashed it up with tribal anthropology data. Or what privacy rights did in this context. So, we figured, Attribution can’t hurt, let’s look into that.

      We did. We plodded along, taking measurements and temperatures. I tested these ideas in conferences all over the place, notably the CODATA anniversary meeting in Beijing in late 2006, a SC-cosponsored Cyberinfrastruture conference in early 2007, OAI 5 at CERN in late spring, and more CODATA meetings in Brazil.

      One key step was to hammer out some misalignments in the CC licenses with the EU Database Directive. We worked that out at the iSummit in Dubrovnik in July (the “Dubrovnik Agreement” – which sounds like something out of a spy novel, but anyway…) thanks to Catharina Maracke and Paul Keller among many others.

      By this point, we were hearing a drumbeat about the need to push something out, fast. But haste and good law don’t spend a lot of time hanging out.

      We were already planning to co-organize an event with CODATA in the fall, and we decided to zoom in on the data issue. We gathered a bunch of people in Paris in September to further examine the issues at hand – biodiversity and geospatial were the big scientific groups there, and we brought in a bunch of the international CC community that was already serving as an informal advisory group to us. Paul Keller took my favorite picture, of Harlan Onsrud in contemplative mode (see his set here ).

      That meeting led to a fair amount of consensus that facts are indeed free, but that it was also pretty hard to have a single tool that accomplished our goals. It was also clear that there was a ton of data that already met our desired conditions.

      The question I struggled with on the plane back was how to harmonize these issues? On top of that, I kept hearing from various people in the search business that they were really, really afraid of cascading Attribution requirements over the 70-years lifetime of copyrights.

      Could we really expose a scientist to an infringement lawsuit in 2025? It seemed we had to think about taking all these social goals and moving them outside the legal world, and into the world that scientists controlled – norms. When the Bermuda Rules got written, they weren’t a contract. But they worked. That was the core inspiration for the protocol in many ways, to genericize the Bermuda Rules process and allow it to use the CC methods and materials, human-readable deeds and metadata.

      We knew that the legal stuff was going to take longer, and that we needed a working group to do it. That’s all coming in 2008.

      But even as we sat in Paris, Jordan Hatcher published his new draft of an open data license. Every week I got 5-10 emails about open data. Peter Murray-Rust – a delight! – started making a steady, brilliant case for open data and asking what it meant.

      The answer hit me as I stood at the printer at MIT one day, watching an RFC spool out for someone else in the Project MAC space. A protocol is a great answer. It allows lots of implementations. It lets SC continue legal work while feeding back some of the results of a frankly long and difficult research project. And it gives us a chance to try and harmonize the existing regime with Jordan’s work and the Open Knowledge Foundation.

      From that draft, we had a six-week sprint to work with Jordan and OKF to get this all done. it was a lot of fun, and a lot of stress. And the bulk of work remains to be done, legally and technically but even moreso socially.

      We quote Lincoln in the protocol, so I’ll close here by alluding to Churchill. This is not the end of the open access data discussion. It might not be the beginning of the end. But hopefully, by planting a flag in the territory, it’s at least the end of the beginning. From now on, let’s focus on getting data online and integrated into the Research Web and not on the law.

      Last updated: Monday, 17 Dec 2007 - 19:27 UTC

        • all tags

          • No tags for this post.

Search blogs

web feed Want a blog?

Submit this post to

Advertisement