Science Blogging forum: topic

This is a public forum

Taming the blogosphere: A manifesto

Chris Taylor

Monday, 30 Mar 2009 14:43 UTC

Hi.

I’ve been thinking about how on earth to cope with the zillion tiny information streams that the all-new web has borne, mostly since the Web2.0 Talk@BL thing a while back. After reading some Asimov last week (http://en.wikipedia.org/wiki/Sucker_Bait), and chatting to people at the SESL meeting in Hinxton today (http://www.ebi.ac.uk/Rebholz-srv/SESL/sesl.html), I thought it was about time I asked the hive mind for a Saturnalia gift.

I want a mechanism that can read blogs in a para-intelligent fashion rather than just cluster obvious concepts and people’s own annotations, allowing me to search intelligently, and to have my preferences intuited. This needs:

1. Mature, interrelatable ontologies to support the use of compounded high-level search terms (e.g., ‘separation science equipment’, ‘mammalian evolution’, ‘science policy’). [Coming soon, more or less.]

2. Beefed-up WordNet-like things (relating near but non-synonymous concepts) to support topic-hopping ontology-style inferences (“You like mammals and evolution, so you might also be interested in dog breeding, or memetics.”). [Matter of time, hopefully.]

3. Lots of lists of typos, por speling and malapropisms. (Apparently the NLM have a good list of common errors for science, for example; Apple may know how to spot ‘misses’ on keyboards by layout (presumably qwerty) I hear; and there are of course spell checkers etc.). [So, here, now.]

4. An automated tagger that isn’t completely dumb, that cruises the blogosphere relentlessly, marking up any and all content (and reporting back with work samples to humans who can help it be better). [On the way, to a point (lots of use of context and corpora these days, especially around the scientific literature, but limited by 1-3 above).]

5. A tool/service that pulls all of the above together, that learns about me as I go in a more intelligent way than Amazon’s thingy, and that can extract ‘ideal’ portions of whole documents/posts (and for free lol).

This would allow me to both search for and be alerted about blogs in a very broad yet equally focused manner. All the news I (may not know I even want or) can use. In a nutshell, what I want is to try to prove that our Isaac may have been premature in suggesting that the Mnemonic Service can only work if it consists of super-free-associator humans (the cheat being that human knowledge be encoded in various ways for computers, acting as agents and assistants). All helping me reap even 1% of the potential benefit that we’re all basically sold on (ish). I still hate the word ‘blog’ though…

Footnote on tagging: Just watched a presentation about the Word/OBO add-in, which raises the issue of ‘annotation at source’: While some people (especially when authoring papers) may take the time to work with such a tool (and there are others, such as OLS, which is used both in authoring and query interfaces), I think the thing about (even non-trivial) blogging is that it is frequently in from-the-hip mode, so while author annotation will help, post hoc annotation is still crucial (and as the speaker observed, there are many ways to screw up such manual annotation — too general, wrong sense, etc.).

Cheers, Chris.

Updated 31 Mar 2009 11:45 UTC

  • Replies

    Post a reply
    • Sounds ideal. Maybe we could retrain a few bankers as super-free-associator humans . On second thoughts, perhaps not, they would probably associate the wrong things and then we’d be in a real mess. Seriously, I would love to be able to “aggregate the blogosphere” in order to really focus it on what I want to know about, and would be interested in if I knew it was there. Of course one can do this to some extent now, manually and with the aid of non-Asimov-like tools, but as you point out, this seems to be a tip of a large iceberg.

    • Hi Maxine.

      Thanks for wading through my semi-structured opinion dump (for a start).
      Here comes the second wave (and honestly, twenty minutes ago this was a short reply)…

      A quick reference for a start (to be revisted in a sec): http://www.bbsrc.ac.uk/media/releases/2009/090402_robot_scientist.html (this was something Doug Kell (BBSRC chief — good blog his btw) was actually part of, along with the Aberystwyth posse mentioned in the article, one of whom is now playing a part in OBI, one of the likely components of my little dream tool).

      Anyway, back to the plot: The Asimov thing is really interesting to me (he was a clever bugger) — what the mnemonic service did in the story was simply report correlations and parallels, never trying to deduce or assert anything. This is crucial, because one of the things that irked/sparked me at the SESL meeting was the assumption that scientists’ views of (assertions about, deductions from) their work are either trustworthy or sufficient. Not to slag them off, but neither assumption is appropriate (though it is pragmatism in the face of lack of resources that is driving the turning of a blind eye, so I’m not being fair but what the hell).

      The re-uses of well-annotated data will be wider than the original purpose of the work, so to annotate a data set as ‘for’ something is limiting its (er, cough) ‘discoverability’. A more general way of tagging (by an automaton, whether carbon-based or not) is needed both for shared data sets in public repositories and (to stay near the point) for blog entries (i.e., a blog is ‘about’ whatever I find it to be about — a scientist’s progress report could feed into my analysis of rates of progress in differently resourced labs for example). Of course the elephantine corrolary is that such an approach could easily tag everything as somehow ultimately relevant to everything, diluting/destroying value. Robot hand-holding by curators (the last great polymaths) would be vital to prevent such annotation creep.

      What would be very cool indeed though is to have the intent-agnostic blog harvesting thingy feed the robot scientist, which is all about a kind of brute force deductive/inductive reasoning (pharma could be doing this for lead discovery, for example [which btw would not just be nicking ideas from individual lab blogs]); i.e., to raid blogs for bits of info like an indexing bot does for a search engine (with excellent error-tolerance), then have it cross-correlate based on the codified human ‘text book’ knowledge in things like OBI and WordNet (and also by tracking human behaviour in manner similar to ‘people who looked at X also look at Y’ on amazon), and then come up with experimental designs to fill in gaps or confirm the assertions the robot scientist can make (which it already does, but only within a tightly-constrained parameter set). I’d be talking myself out of a job if I ever did anything but talk these days.

      To change tack, an aspect of the BBSRC release that is very important for me (with my work head on) is that robots really don’t mind giving LOTS of detail when they ‘write up’. Robot support for human scientists would therefore be fab, though as with much of this dream, it’s way off not for technical but for economic reasons. We need some moon shot spirit.

      A very NPG-y aspect of this would be the ability to track the sources of information fed into such a correlation engine. NPG-y because it would (1) enable a form of micro-accreditation that would (2) encourage further and richer blogging. No longer would you be likely blogging into the void — the robots would always be listening beep beep like a secular god.

      Course as the recent hoo-ha over Google’s copyright fraud prevention thing shows (many of those filing complaints were actually competing companies trying to restrict each other’s trade, not genuine complainants at all — http://www.theregister.co.uk/2009/03/25/google_takedown_process_flawed/), there would definitely be ‘negative blogging’, but presumably this could be handled as long as there was always this traceability keeping people honest (now there’s one for the b/*ankers harumph).

      Ho hum. I suspect I’ll be waiting for this series of prosaic miracles for a while yet (especially the joiny-uppy meta-miracle). In the meantime I’ll have to just rely on serendipty and the good will of others to get value from blogs, along with everyone else.

      If you’re still reading, your logorrhoea endurance medal is in the post.

      Cheers, Chris.

    • And Postgenomic of course (http://www.postgenomic.com/).

    Post a reply

Search forums Advanced search

web feed

Submit this topic to

Advertisement