• john wilbanks' blog

    Agitating for innovation through open licensing and good technology.

    • On a data commons

      Sunday, 11 May 2008

      One offshoot of this data licensing discussion is that we really are throwing data online without a lot of thought. Someone’s going to need to fund this stuff and it’s not going to be cheap. I have started using the quip “open data is free as in a puppy” in my talks (hat tip to Anna Gold, in whose talk I first heard the phrase).

      Cameron Neylon has a must-read post on the this aspect of the data commons :

      So I started with the notion of paying researchers to make data available, originally phrased as ‘pay the journals to buy papers’. What I really meant was paying people to put research results somewhere useful. So let us imagine we can pay people to deposit data (we’ll figure out how later). We don’t want to be swamped with rubbish so the data has to be well structured, tagged up and machine readable. If we’re paying for it, we set the standards. We also want to encourage re-use of data, perhaps by paying a premium for the deposition of data that re-uses other data. And in turn, perhaps pay a premium to those whose data is re-used.

      Funders are putting hundreds of millions into data centres that no-one is too sure what to do with. Maybe that money could be used more effectively to drive data deposition quality. Some funders may also see this as a good model for direct funding. Putting money in to drive the generation of specific data set. Channel funding through the foundation to pay groups to deposit the results rather than pay them to do the research. For small foundations or charitable concerns this may be a much more effective means of driving the outcomes they want.

    • On the Erosion of the Public Domain

      Saturday, 10 May 2008

      This Chemspider licensing brouhaha is generating some needed discussions around open data, and something I keep hearing about is that it is GPL v. BSD all over again (link is to Egon Willighagen, who has two nice posts on this linked in the comments on my last post).

      I need to come back to this point, and I’m actually going to sit down with some lawyers and prepare an “appellate brief” on this topic (thanks Jordan Hatcher for the suggestion). But I want to get this into the discussion right now.

      The public domain is not an “unlicensed commons”. The public domain does not equal the BSD. It is not a licensing option.

      It is the natural legal state of data.

      It is a damn shame that we no longer think of the public domain as an option that is attractive. It’s a sign of the victory of the content holders that the free licensing movements work against that something without a license – something that is truly free, not just just free “as in” – is somehow thought to be worse. We’ve bought into their games if we allow the public domain to be defined as the BSD. The idea of the public domain has been subjected to continuous erosion thanks to both the big content companies and our own movements, to the point where we think freedom only comes in a contract.

      The public domain is not contractually constructed. It just is. It cannot be made more free, only less free. And if we start a culture of licensing and enclosing the public domain (stuff that is actually already free, like the human genome) in the name of “freedom” we’re playing a dangerous game.

      There’s a lot more to get at here. For one thing, queries to a set of databases create a data product – which means that naive web users have to keep track of licensing of their Google results – unlike code, where as a naive user I don’t worry about GPL v. BSD. For another, the EU database directive and the UK tradition of “sweat of the brow” can make the public domain harder to achieve. And yet third, no one knows how moral rights relate to this – those are rights that emerge from the classical idea of an author’s rights, but how can someone claim moral rights on a fact of nature? Fourth, what’s the difference between fundamental data (GPS coordinates or species genome sequences) and “state change data” (like when you probe the genome with a drug, which might well trigger a right akin to moral rights)? Fifth, not all data can even be in the public domain or virally licensed due to privacy rights (anything collected from a person can’t be made free by any magic contract, privacy trumps contract like paper covers rock).

      Speaking as someone who got into this two years ago convinced SA was the way for data, this stuff is complicated.

      I don’t know the answers. But I do know that if we start to frame “public domain data v. viral data” as “BSD v. GPL” that we’ve already lost the debate, because we’ll have bought into the erosion of the public domain that led to the need for commons licensing in the first place.

    • This is a comment I posted on the ChemSpider blog, one of two I tried to post. I’m cross posting here to make sure it’s public. Make sure to click through to the blog, it’s on the topic of using CC licenses on data. I sent an email to a list that got blogged, before I could get a chance to reconcile everything and contact the Chemspider guys. I think they should get complimented for their intentions and that they deserve tea and sympathy, because this licensing stuff is really complicated, and all they wanted to do was share.

      In short, it’s a demonstration of how confusing data licenses make the position of data providers essentially untenable. From my perspective, the answer is either go public domain, or don’t. If you don’t, please make the metadata public domain. Anything is simply too confusing to figure out, and it’s going to be worse.

      Part of the problem is that we have created a cargo cult around licenses. A contract will come from the heavens and make us free! But in data we’ve got the public domain right there to teach us. All we have to do is look up from the lawyer’s desk and follow the yellow brick road…er, the NCBI’s lead.

      jtw

      >>>>>>>>>>>>>>>

      I tried to post a comment but don’t know if it got through.

      I did not intend for my comments to become public – that was a post to an advisory board list, intended to highlight precisely how this issue demonstrates the difficulty providers have in understanding licensing of data.

      Creative Commons licenses were built for cultural works, like this blog or a website or music. They weren’t built for data. Data has different qualities and characteristics and thus requires different licensing approaches.

      I would recommend you read the official CC position on this, which is the Science Commons Open Access Data Protocol (http://sciencecommons.org/projects/publishing/open-access-data-protocol/) and that you look at the best available legal tool to achieve the protocol (http://www.opendatacommons.org/odc-public-domain-dedication-and-licence/). These are regimes that facilitate data integration, unlike the CC BY SA license.

      Please know that I salute your intent here and don’t want to slander you – you’re trying to share, and you’re confused on how to do so. I do believe that in our conversations I did indeed recommend to you the idea of releasing an RDF dump of your database in the public domain, using only the NCBI approach listed on this very blog. That’s essentially what we recommend at CC, as you’d see in the protocol.

      Again, it was not my intent for this to go public before I could reach you, and I’m very sorry for that. It is never fun to make a decision and get pummeled for it, and from my perspective you don’t deserve the pummeling.

      I’ll cross post this to my blog to make sure it gets online.

    • Kneeling at the altar of patents

      Monday, 28 Apr 2008

      Jamie Boyle, one of my Board members and longtime mentors in the law, does his usual brilliant job in deconstructing the relationship of patents and drug discovery .

      I agree fully with Jamie’s writing here. He does a very elegant job in noting that the patent system is itself not the cause of the lack of research into diseases of the global poor, but that the drug companies that depend on patents are also blocking the creation of new incentive systems that might address market failures in global disease research.

      His key point for me is that patents are like religion. Boy howdy, is that ever my experience.

      I’ve had the temerity to suggest on occasion that patents aren’t the big problem in global disease research – that our research at science commons indicates the problems are much more to do with fundamental knowledge gaps, process failures, non-patent-related transaction costs. That isn’t popular in many places.

      I’ve also had the temerity to suggest that prize models need to be accompanied by systematic changes in the fundamentals of drug discovery: knowledge sharing as a first principle, investment in biobanks to move materials around the world smoothly, investments in web infrastructure to make a scientist in Brazil as powerful as a scientist in Pfizer’s Groton offices. Also not popular in some places (actually not popular either in the prize advocate community or the pharma community, oddly).

      I don’t believe prize models are going to scale as long as the fundamental uncertainty of drug discovery persists – we have to change the game at the same time we change the rewards granted to the winners. Remember how little we know about the body: we don’t know how Tylenol works, much less how to predict what a new drug is going to do.

      In all the costs of drug discovery there’s one big one, clinical trials, and it comes from knowledge gaps. We don’t know if Magic Drug X is going to kill people or not until we give it to humans and watch to see how many die. That knowledge gap costs money – lots and lots and lots of money. We have to eliminate that knowledge gap and lower the cost so that more can play, and more can fund – and in turn, so that we can have process competition at the funding level like prize models without an $80B pot of gold needing to be located. Otherwise, only rich people will play, and they’ll play by market rules. That’s what markets do, as Jamie points out. The question is how to manage the places where the markets fail, and we can do that only if we lower the chance of failure by bridging the knowledge gaps, and doing so in a public and open access manner.

      We’ll know we’ve been successful not when a prize model replaces the pharma dependence on patents, but when it sits alongside and outcompetes the patent system. I am personally looking forward to that day. To get there we’ve got a lot of roads to walk, and we need to get away from patents as religion – whether we’re pro or con on the topic – and we need to focus on filling the knowledge gaps that drive up costs and steal precious research cycles.

    • Preprint: The Control Fallacy

      Saturday, 26 Apr 2008

      So, I have been spending my copious amounts of plane time lately writing. I’ve got a series of articles under draft, including one on datuments, one on the dangers of naively porting licensing concepts from code to data, and another one on integrating data and databases. But I’ve got one pretty well ready to go. It’s a paper on the relationship between OA and innovation.

      The problem is, it sprawled from the original 1000 words to, well, a lot more. I’m going to hack it way down and it’ll be a totally different article. But I thought this one deserved its own URI as well. As such, I’m eating my own preprint dogfood. It’s online in Nature Precedings under CC BY 3.0 as The Control Fallacy: Why OA Out-Innovates the Alternative and I welcome comments either to me directly, here at this blog, or on the Precedings page for the paper.

    • If you’ve got an interest in next-generation publishing in science, and you’ve always wanted to live in Paris…I’ve got a job opening you might be interested in after the jump. Please forward this far and wide. It’s a great project. If I were younger – and had a doctorate so that I could be a postdoc – I’d be all over this one.

      continue reading this post
    • Voices from the future of Science

      Wednesday, 02 Apr 2008

      Hi everyone – Science Commons is going to be building a public aggregator and spotlighting some of the open science debates – and debaters. Donna Wentworth, our community blogger, has just issued a call for links and suggested people to profile. Please head on over to SC’s blog and let her know your opinions…

    • More Data = WIN

      Tuesday, 01 Apr 2008

      I saw this on Slashdot and wanted to make sure I posted it here…it’s a blogger who teaches data mining at Stanford (oh, and what do YOU do?) weighing in with the results of his work on whether or not better algorithms trump more data. Interesting analysis especially in re: Google.

      Money quote: if you have limited resources, add more data rather than fine-tuning the weights on your fancy machine-learning algorithm.

      Harumph. With apologies to George Clinton, Free Your Data ...And Your Searchers Will Follow.

      (edited post to add in link to slashdot thread)

    • Creative works, copyrights, and publishing...

      Wednesday, 26 Mar 2008

      This is a reply to a post over at Plausible Accuracy, asking some questions about my talk at MIT (online here) from last fall…

      The author is a scientist and asks a good question about one of the points I make in the talk, regarding the relationship between copyright and creative works and scholarly publishing. It’s a point I have actually removed from my talks recently because I was finding it misconstrued – it’s a little subtle and hard to grok sometimes, and it’s an example of how hard it is for the lawyers and the scientists to understand each other.

      But it’s a perfect conversation for the blogworld, and I’ll be pinging the author at PA to engage in a conversation. I’m still figuring out how to engage the community and this seems a good place to start :-)

      More after the jump…

      continue reading this post
    • Cyberinfrastructure, University Policy, Innovation

      Thursday, 21 Feb 2008

      These are a rough version of my comments at today’s conference on Cyberinfrastructure, University Policy and Innovation. Text after the jump.

      continue reading this post

Search blogs

web feed Request a blog

Advertisement