Bioinformatics forum: topic

This is a public forum

Towards a cyberinfrastructure for the biological sciences

Euan Adie

Tuesday, 19 Aug 2008 13:52 UTC

Lincoln Stein has a new article in Nature Reviews Genetics this month,

Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges

Biology is an information-driven science. Large-scale data sets from genomics, physiology, population genetics and imaging are driving research at a dizzying rate. Simultaneously, interdisciplinary collaborations among experimental biologists, theorists, statisticians and computer scientists have become the key to making effective use of these data sets. However, too many biologists have trouble accessing and using these electronic data sets and tools effectively. A ‘cyberinfrastructure’ is a combination of databases, network protocols and computational services that brings people, information and computational tools together to perform science in this information-driven world. This article reviews the components of a biological cyberinfrastructure, discusses current and pending implementations, and notes the many challenges that lie ahead.

It’s free to read for a month for logged in nature.com users (you can use your Nature Network credentials).

    • all tags

      • No tags for this topic.
  • Replies

    Post a reply
    • Hi. I’m a little concerned by a passage (which I’ve copied from the online summary) concerning the required skill sets of future biologists:

      “In order to become active and effective contributors to the cyberinfrastructure, biological researchers will need to become familiar with the basics of computer science, learn to use ontologies to describe their data and protocols unambiguously, and have the skills to put this information in a form that can be readily adapted and re-used by others in the community. This will require changes in the way biology is taught at the undergraduate and graduate levels.”

      While I of course agree fully with this sentiment, I just don’t see a huge enthusiasm amongst many biologists to become ‘more sophisticated’ users (for want of a better phrase) of computing and information services. And I’m not clear that the correct response is to present the Computer Age as a fait accompli and thereby ‘require’ a reskilling of people whose primary interest isn’t, surprisingly, data entry…

      Obviously this is the computer age, and obviously searching, shopping, doing email, etc. are increasingly part of the fabric, but the reason they have been seamlessly woven into our lives (ish) is that these are tools that did not demand that the user learn very much at all. ‘Popular’ computing revolves around some very simple principles and skills.

      So, and again obviously, science is more complex than shopping (if not, you are doing one of them wrong). But, to finally get to the point, the lesson I take is that developers must strain every sinew to meet the user as close to where they are as possible (in terms of the skills they already possess or can straightforwardly acquire). And more than that, the objective should be to routinely hide as much as possible/sensible. What user will care which format is being used as long as the plumbing is apparently sound? Resource integration is of course crucial here, as is the support of, for example, instrument vendors when considering things like automated capture and annotation of instrument metadata by management software using common ontologies. I know there are dangers to hiding stuff—automatically interpreting/annotating data is riddled with pitfalls—but there has to be a sensible midpoint (we’re looking for least effort, not no effort, after all). Some kind of user acceptance testing of existing resources would go a long way to highlight what is good (in the usability sense); and going forward, we need development that takes account of users, viewed as valuable customers not irritants.

      And lastly (though a bit off-topic) funders could do way more (some are doing some things) to encourage developers of new tools and resources to: (1) ‘fit in’ with what exists (without crippling healthy competition and invention); (2) ensure there is a solid long-term plan for resource maintenance, or it doesn’t get built; and linked to that, (3) to stop treating all infrastructure provision and development as research (though a portion is, of course), as this hinders good service provision, inhibits further development (including the much sought-after integration) and occasionally kills perfectly good projects, putting an unnecessary brake on scientific progress.

      Cheers, Chris Taylor (mostly working for MIBBI though I stress that all the above is very much my own personal opinion).

    • Chris, I totally agree with your last point. If infrastructure is deemed so important, than people with the skills to provide it must also be deemed so. In other words: more permanent, better-payed jobs and recognition of people with those skills. I know many such people in academia who feel that “the system” simply doesn’t know how to employ them effectively.

    • So, as an ex-funder (MRC) and an advocate for better, wider and easier discovery and re-use of data/information, I would have thought that ‘infrastructure’ was not just about kit, but equally applied to people. In other words, a skilled workforce of recognised value to research. There have been various ‘capicity building’ drives across the UK bioscience research funders in the past, and several that have focussed specifically on bioinformatics. Of course, this sort of ‘pump priming’ is not the same as a sustainable career path that motivates and retains skilled researchers. But of course, this applies to many other areas of science…

    • Chris, you raise some very good issues, views which I think we need to hear more of. I hope you can come to Allan, Sarah and colleagues’ British Library evening (24 Sept) on Scientific Researchers and Web 2.0, Social Notworking?, it would be great to continue the discussion there.

    • To amplify what both Neil and Allan have said there is a problem with a lot of these capacity building and interdisciplinary training programmes in that they train scientists for whom there are no jobs.

      This has actually got a lot better recently in UK academia (although its impact on the undergraduate teaching would be an interesting thing to discuss) as departments have grasped the need to appoint people ‘out of area’ but there is still a big problem in providing the person infrastructure that will be needed to make ‘cyberinfrastructure’ work.

      What Neil says is absolutely the case. Infrastructure must include the people that make it work. And currently we don’t value them highly enough.

    • Hi all.

      I’d like to support the point about infrastructure meaning people+stuff. Maybe the EBI has polluted me to the point where I just see people as somewhat non-standard plugins for the computers, our gods :)

      Anyway, yes, we need people both to develop and maintain tools and resources, and (imho most crucially) to offer support (I mean technical, but maybe hugs too).

      The NERC Environmental Biofinformatics Centre (Oxford) gets data out of NERC fundees by, ultimately, once their other support models have failed, sending crack bioinformaticians to literally suck the data from a site (NERC are very big on data sharing, predating many, due to the nature of much environmental science). Obviously this kind of hand-holding approach doesn’t scale so well, but both as an example of how hard it can be to get data (in the right form, or at all) and as an exemplar of a proven solution, their experience is very valuable. Both aspects, for me, underline this critical need for software (webby or not) that is simple, solid and well-supported.

    • This article is accompanied by a nice looking little wiki – however, there doesn’t seem to be the ability to add references to the wiki text. Is there any plan to include this functionality?

    • We have been recently involved in the Open Source Drug Discovery Initiative www.osdd.net and have been working on a Wiki based open researcha nd sharing paltform which is accessible at http://sysborgtb.osdd.net (needs registration and verification to access most of it ). At the moment it features an Open Project Space, an Open Lab Notebook and a data wiki on Mycobacterium tuberculosis research. The idea is to have an open peer review and open sysnthesis of knowledge.

      We would soon be integrating a workflow system which would make most of bioinformatics research online, reproducible and sharable across the community.

      More information on the initiative is available at OSDD.NET and SysBorg

    • “In order to become active and effective contributors to the cyberinfrastructure, biological researchers will need to become familiar with the basics of computer science, learn to use ontologies to describe their data and protocols unambiguously"

      I think all it really means is knowing the basics of computing (most people do), and learning to use a specific tool e.g. AutoCad or Microsoft Excel, that’s not too hard an adjustment!

    • Hiya.

      I agree completely — the basic skills for surfing, mail, documents and spreadsheets are entry-level requirements for science these days (and to some extent, presentational/graphical ability, if only in PowerPoint).

      However, my point is that to rely on there being much more than that is probably counter-productive. The skills gained to carry out the above activities should form the ‘limited vocab’ to which (interface) developers are restricted, unless there is a very good reason to go beyond.

      As an example, the EBI used to have a search box on the front page that did some, but not all jobs, and a kind of massive dendrogram on another page that did a lovely job of displaying all the fab things going on at EBI, but was a pig to navigate. It was fine if you knew what you were doing, but if not… We’ve now got someting called EB-eye (www.ebi.ac.uk at the top of the page), which is basically a Google-style “I’m gonna type it in this box, whatever it is, and press go” approach (plus one small drop down filter to specify databases / our web pages / the literature). The results pages are coarsely structured around the kinds of info (a search might turn up several kinds of data) to assist the eye, but basically we’ve moved from something that took a (little) bit of getting used to, to something more (if not completely) familiar. Went down pretty well, and works too. For example, searching for ‘uuu’ in databases gives http://tinyurl.com/6fmasd

      Btw that EBI example isn’t supposed to be the epitome of user-friendly design; just what sprang instantly to mind.

      So that’s all I’m getting at. Don’t seek to ‘educate’ (i.e., make you into what I want); seek to leverage what exists. N.B. my control of my gag reflex is these days equal to that of any sword swallower, so I can use words like that :)

      Footnote: ‘Training’ forms part of the business model of too many companies (naming no names, a fresh installation of one particular common commercial database is generally worse than useless till an experienced/certified engineer tunes it up). This is pernicious stuff, though we have no control over it. Of course that model should not appeal to non-profit projects, especially given the usual lack of resources.

    Post a reply

Search forums Advanced search

web feed

Submit this topic to

Advertisement