This Chemspider licensing brouhaha is generating some needed discussions around open data, and something I keep hearing about is that it is GPL v. BSD all over again (link is to Egon Willighagen, who has two nice posts on this linked in the comments on my last post).
I need to come back to this point, and I’m actually going to sit down with some lawyers and prepare an “appellate brief” on this topic (thanks Jordan Hatcher for the suggestion). But I want to get this into the discussion right now.
The public domain is not an “unlicensed commons”. The public domain does not equal the BSD. It is not a licensing option.
It is the natural legal state of data.
It is a damn shame that we no longer think of the public domain as an option that is attractive. It’s a sign of the victory of the content holders that the free licensing movements work against that something without a license – something that is truly free, not just just free “as in” – is somehow thought to be worse. We’ve bought into their games if we allow the public domain to be defined as the BSD. The idea of the public domain has been subjected to continuous erosion thanks to both the big content companies and our own movements, to the point where we think freedom only comes in a contract.
The public domain is not contractually constructed. It just is. It cannot be made more free, only less free. And if we start a culture of licensing and enclosing the public domain (stuff that is actually already free, like the human genome) in the name of “freedom” we’re playing a dangerous game.
There’s a lot more to get at here. For one thing, queries to a set of databases create a data product – which means that naive web users have to keep track of licensing of their Google results – unlike code, where as a naive user I don’t worry about GPL v. BSD. For another, the EU database directive and the UK tradition of “sweat of the brow” can make the public domain harder to achieve. And yet third, no one knows how moral rights relate to this – those are rights that emerge from the classical idea of an author’s rights, but how can someone claim moral rights on a fact of nature? Fourth, what’s the difference between fundamental data (GPS coordinates or species genome sequences) and “state change data” (like when you probe the genome with a drug, which might well trigger a right akin to moral rights)? Fifth, not all data can even be in the public domain or virally licensed due to privacy rights (anything collected from a person can’t be made free by any magic contract, privacy trumps contract like paper covers rock).
Speaking as someone who got into this two years ago convinced SA was the way for data, this stuff is complicated.
I don’t know the answers. But I do know that if we start to frame “public domain data v. viral data” as “BSD v. GPL” that we’ve already lost the debate, because we’ll have bought into the erosion of the public domain that led to the need for commons licensing in the first place.
Ping.
Manual pingback to further thoughts at my blog.
Interesting, John – I agree that the true potential of the Commons is much more than Sharealike, although this is a step in the right direction. So is Public Domain, and I think we could be doing much more to encourage people to make their work Public Domain. Why not have this as the first option that comes up with Creative Commons, for example? The option at the top is likely the one people think of as the default.
GPL is Gnu Public License, right? What is BSD?
John, wonderfully written. I will add a pingback as well.
The thing is we’ve been doing public domain for the longest time. I do think that there is a lot of confusion around the differentiation around content (Creative Commons) and data (which is different). The data commons needs a different set of rules, and starting with a clear understanding of what Public Domain means and why it is a good thing.
But things are complicated
For example, some might argue that protein interaction content belongs out in the public domain, but until there is a ton of curation required (and there is value in that), content companies will not make these data public and one can’t fault them. We need to be able to educate people on what constitutes Public Domain, what their options are, etc. As I wrote in my post, we’ve ended up making things very complicated and confusing.
John, I think an analogy you might find useful is the status of copyright for writings of U.S. Federal Government employees. Copyright law doesn’t apply to such writings. (Under 17 USC ยง 105, copyright protection “is not available for any work of the United States Government.”) There is no license for anyone to use these works. Rather, copyright simply does not exist. This provides the maximum possible freedom, and is a case where “less is more.” Had the government chosen to explictly define the terms of a license for such work, no matter how liberal, the public’s rights would have been diminished.