• john wilbanks' blog

    Agitating for innovation through open licensing and good technology.

    • Seeding the Social Web for Science

      Tuesday, 27 Nov 2007 - 21:21 GMT

      John from Cal Tech posted an interesting comment on my last post.

      Paraphrasing slightly, if all 60,000 neuroscientists took up arms and began a round of tagging, collaborative filtering, and more…would that be enough to mark up the literature?

      John and many others call for mass Social Web, because they believe the crowds are big enough to be wise as crowds qua crowds. I call for the Research Web, because I believe the task is so incredibly large – so much more complex than wikipedia, even – that even an army of neuroscientists could not map the information in a meaningful way, much less 60,000.

      Sadly, I think my greater point about the interdependence of Research and Social Webs upon each other tends to get lost in the details of the argument. I think neither one is sufficient, and I think both are necessary. The two are far greater than the sum of their parts.

      As for scooping – which was another interesting set of comments – I’ll try to address that in an upcoming post.

      More on Seeding the Social Web after the jump.

      I think that it is not effective to port the ideas of culture one-for-one into science. The nature of the information is different, the people are different, and the goals are different. And my my argument here is different. The point of Research Web isn’t to denigrate the point of Social Web, it is simply to permit Social Web to actually function in science.

      Let’s think about the task of wikipedia. It is, in addition to being a shining exemplar of Social Web, an encyclopedia. I have an oldish version of the Britannica at home, and it takes up most of a shelf, running the length of a room. Lots of text. It’s also an encyclopedia: a compendium of stuff we know.

      Let me be clear: this isn’t to denigrate wikipedia, which I love and use daily. But the complexity of an encyclopedia – of even the sum of human knowledge – is nothing compared to the complexity of the operations of even a simple bacterium, much less the human genome. All the kings’ knowledge cannot create a unicellular synthetic organism with a nucleus. It isn’t a matter of willpower or money. We simply don’t know how to do it.

      IBM built the world’s largest computer to try to model a single second of a biological protein folding. That’s what we’re talking about here. Unbounded complexity. We don’t know what happens in there well enough to build basic predictive models. We test our drugs in the only model we have: human bodies. And we evaluate them based on whether or not those bodies die in response. Again, unbounded complexity.

      My point is to properly contemplate the scope of the problem we’re dealing with here – trying to figure out how the human body works, from genome to phenotype, and then to intervene in that function in a reliably predictable way, is something we are a good ways away from accomplishing.

      That’s the bad news. The good news is that one of the reasons we are far away from that goal is that we have to date published our knowledge in formats designed for a different world. One idea, one lab, one gene, one protein, one paper, one database. And if we can mark up the knowledge better, we can do a lot better without a gee-whiz theoretical breakthrough, just by better using what we do indeed already know.

      The question of Social Web v. Research Web is itself part of the problem. The answer is we need both.

      I happen to believe that John is right when he says 60,000 neuroscientists is enough to start seeing positive network externalities with Social Web. I happen to believe he’s wrong when he thinks that those 60,000 will just show up and start tagging. I think we need to give them a reason to show up.

      Those positive externalities are fundamentally baked into my conception of Research Web, in a way that I’m still learning to communicate. I see Research Web intertwined with Social Web, where Research Web represents the established consensus and Social Web is the rowdy bazaar of ideas.

      If this all works the way I hope, we’ll be able to watch the evolution of theories in close to real time. Kuhn and Polanyi and Latour and more, we’ll be able to test philosophy of science against reality. I digress…

      Here’s the other thing. I think science – once we can jumpstart the effort – is not only uniquely suited to Research Web (because of the hierarchical structure of consensus information in science) but also to Social Web. What, in the end, is the typical paper if not a minor edit to the knowledgebase? What is more wiki than the publication system, in theory if not in practice?

      The catch is that we have to provide enough bait for those 60,000 to see a reason to participate. There’s little worse than curating an empty graph and never seeing a network effect. Trust me, I know this one from experience.

      And the right metaphor isn’t wikipedia. It’s the fate of the community with 60,000 speakers trying to translate wikipedia, except instead of wikipedia it’s tens of millions of scientific articles and a thousand databases.

      That’s where Research Web can jumpstart Social Web. As wikipedia relies on machine translation to seed and people to edit in the case of a small linguistic population, we need to look to similar solutions. And though text mining is part of it, the huge lifting right now is in getting databases to work in concert via the web. That’s not a task for which Social Web is well suited.

      But the edits to this must come from Social Web, and Social Web will be where the most interesting battles of the future of science will take place. And as those battles are finished, that knowledge will be canonicalized into Research Web, just as no one debates anymore about DNA’s double-helix structure, or the location of organelles.

      I know I’m not explaining this as well as I need to explain for the long term. Part of the reason I’m blogging is to figure out how to explain it. And maybe along the way we can bridge some of that divide between those who believe in AJAX-based platforms and those who worry about OWL conformance.

      Because if we’re going to wind up at the end goal, my gut tells me we’re going to need both types.

      Last updated: Tuesday, 27 Nov 2007 - 21:21 GMT

      • Comments

        • Date:
          Saturday, 01 Dec 2007 - 12:42 GMT
          Deanne Taylor said:

          Hi, John—

          Could a research web app/db/etc be jumpstarted by a small group of people dedicated to building it?

          Could those people further jumpstart it by creating a research web out of Pubmed abstracts and WWW web pages and scientific society indexes to start?

        • Date:
          Thursday, 06 Dec 2007 - 03:39 GMT
          john wilbanks said:

          Deanne – I think it’s possible to build an app or a DB through a small group. And indeed, the pubmed abstracts and open web pages are the only place to jumpstart (everything else is closed, sadly) if you’re thinking of the traditional web. But I think the key is more to index the web of databases. Much, if not most, of that content has been carefully repurposed from the literature, curated, and has a high knowledge content per byte compared to a PDF.

          I believe that we need a community of people dedicated to building a Research Web, aided by machine translation of the canon from the PDF and SQL formats into more common formats. The community’s time would be spent curating the outputs of the machine translation – editing, improving, weeding the garden so to speak…


Search blogs

web feed Want a blog?

Submit this post to

Advertisement