• john wilbanks' blog by john wilbanks

    Agitating for innovation through open licensing and good technology.

    • The Research Web

      Monday, 12 Nov 2007 - 17:12 UTC

      I was at a FasterCures event this past week that got me thinking about the disconnect between the Web 2.0 / Social Web and what gets called the Semantic Web. Lots of discussion centered on the use of Second Life to bring patients together, how Social Web can get patients to help doctors and each other, how new incentive mechanisms can drive innovation, and more.

      I think I was the only one in the room focused on research (everybody else was patient-oriented, which is great – but the drugs do have to come from somewhere). I talked about the Semantic Web. It went ok, but the questions and the general response got me thinking about phrasing and messaging.

      We need to start talking about the Research Web, which is the reality of what we’re building here. Just as the Social Web uses, but is more than, AJAX , the Research Web will use the Semantic Web, RDF and other technologies and strategies to accomplish something we deeply care about: Making medical research more effective so that we can cure and relieve the suffering of patients.

      We need the Research Web because the existing Web doesn’t work for research. Here’s what I mean: Googling a phrase like signal transduction genes in pyramidal neurons doesn’t get you a list of genes. It should get you a list of genes. No amount of collaborative filtering makes it easy to read 188,000 papers – and this is stuff where you tend to want experts moreso than the “wisdom of crowds” – advice from someone who doesn’t understand signal transduction tends to be less reliable than from someone who does.

      The Research Web is about integrating lots of stuff that wasn’t designed to be integrated with anything. It’s about getting precise answers to complicated questions instead of a mess of Web pages. It’s about the move to industrialize the way scientists annotate data. The Research Web is about making the Web work in a complex data environment, where machines make and transmit terabytes of content that humans have to interpret.

      There’s a lot of phrases out there for this. Network Science. Cyberinfrastructure. E-Science. I like Research Web a lot more, because it ties into the ideas of the Web and what it means to us day to day. Research Web means that search engines work for research, like they do for pizza. Research Web means that biological materials can be ordered in a setting closer to one-click than pre-industrial artisanal fulfillment. It means open source tools, pipes that run on data like they run on music, and much more.

      This isn’t about grids or ontologies (though it definitely builds on them). It’s about making the user experience for research on the Web more like…well, the Web for other stuff.

      It doesn’t require anything deeply novel to make a Research Web. The “and then a miracle happens” problem doesn’t surface. The Research Web doesn’t tell you anything that isn’t yet known. We’re just talking about re-designing information that is already digital into a format that works better for research – a format with more context and more structure.

      It’s hard and annoying to do this – it takes discipline to use the right URIs, to harmonize existing ontologies, and reading the stuff natively is misery perfected. See the Web consortium’s Health Care and Life Sciences group work on this and you’ll see what I mean – it’s difficult work, requiring the cooperation of lots of people. But at least no one has to have a magical insight to pull this off, which makes this a lot easier than many vexing problems in life sciences.

      This is where Research is different than Culture. Culture doesn’t need that structure, and people can provide context through distributed networks, blogs, filters, etc. The Social Web works out of the box for culture. But science isn’t like that, for example. Scientific knowledge – the stuff that needs re-formatting – is significantly more hierarchical than cultural knowledge. It’s worth going through the pain and annoyance because the payoff is so significant: the ability to really thoroughly use all the databases and literature on the Web..

      In this, the Research Web is fundamentally different from the Social Web. Using the Social Web is by definition easy. I can mash up two songs in my browser with no knowledge of music. I can start a blog without blinking an eye, share a link with Digg with a few clicks, post my pictures to Flickr, and so on and so on. It’s easy to do. That’s the point.

      But there is frequently an assumption that the Social Web renders the Research Web unnecessary, that the wisdom of crowds will sweep away journals, that folksonomy renders ontology irrelevant. (I am not ascribing this to anyone at the FasterCures event – this is not-quite-a-strawman, inspired by dozens of conversations on the topic)

      And it isn’t going to happen, at least, not any time soon. The Research Web needs to be based on accurate and precise knowledge, in this case, knowledge of what we know about biology and human bodies and how drugs work in those two cases. And that knowledge is highly structured, highly hierarchical. Accurate representation is more important than anything else if you’re going to run models.

      That means boring disciplined work like re-using the same names for things instead of creating new ones. Work like minting stable names for new things, that pass strict Web standards. It means designing information for interoperability, for collaboration at the queryable machines level, not the human level. The Social Web doesn’t give us that.

      I am curious to see where the two converge. I think there is a natural Venn intersection between the two in the area of using these boring existing names for things like genes and proteins as tags. That can simply link blog posts, papers, wiki pages, and more into the Research Web without a lot of effort or technical skill asked of the users. I’d love to see journal club reviews – a Social Web kind of thing if there ever was one – rendered as wikis and accessible to the world.

      And of course, both the Social Web and the Research Web require sanity in licensing. The Research Web totally fails when a thousand terms of use bloom. When papers are locked behind firewalls and databases can’t be integrated, the whole idea behind the Research Web seems almost unachievable. Research Web is built on Open Access to information, content, tools, software, and data. Period.

      By the way, if you’d like to get an answer to that question about signal transduction and pyramidal neurons, head on over to the Neurocommons and try this query, making sure you completely replace the text in the console with the code in the grey box on the linked screen. You can also edit source in the query by using the MESH Browser and GO Browser to change the terms away from signal transduction (replace with alternate GO process) and pyramidal neurons (replace with alternate MESH term). Make sure to replace both GO process mentions in the query, as this one finds children of your selected GO process as well…)

      It’s ugly – we’re talking HTML in 1994. But that’s not the point. It works. It works much, much better for getting research answers than anything else does that’s in the public commons. It’s a glimmer of the Research Web’s real potential – imagine what it’s going to be like when it’s easy to use.

      Last updated: Monday, 12 Nov 2007 - 17:12 UTC

      • Comments

        • Date:
          Monday, 12 Nov 2007 - 23:01 UTC
          Ben Toth said:

          “Googling a phrase like signal transduction genes in pyramidal neurons doesn’t get you a list of genes.” But it could do if someone builds a collaborative search engine inside Google. And I’m sure it would be possible to incorporate Neurocommons into Google. It’s not the semantic web but it might be a quicker time to value (ala HTML vs SGML?)

        • Date:
          Tuesday, 13 Nov 2007 - 01:11 UTC
          john wilbanks said:

          Ben – one reason we’re building the Neurocommons is to let people build search engines. The way scientific information is designed right now, Google and other link based engines don’t work so well in the absence of links (and scientific papers aren’t exactly chock full of links). Google scholar finds papers about genes, not lists of genes. Since RDF is nothing more than a bunch of typed links, providing RDF about entities and papers is one of the fastest ways to get existing link-based search engines a better view of scientific information. Metaweb is a good bet here for example (and you’ll note their open licensing terms).

          As for “collaborative engines inside google” – Google tends to be open about everything in which they are not number one. They’ve got an open phone alliance, but not an open search alliance, as Fake Steve Jobs noted.

          I remain pretty well convinced that reformatting the information into RDF is actually not that long of a process. We did this in less than a year, with two full time employees working hand in hand with the health care and life sciences community, and that work doesn’t have to get re-done. The HCLS group is actively building out more databases to their list, and SC is working with the neuro community to add more to the Neurocommons. It’s a lot faster than waiting on the science community to embrace Social Web, I’m afraid (I’ll post on the slow uptake of 2.0 in science soon).


Search blogs

web feed Want a blog?

Submit this post to

Advertisement