Bioinformatics forum: topic

This is a public forum

Commissioning Bioinformatics Protocols: Where do I begin?

Bronwen Dekker

Tuesday, 16 Oct 2007 15:33 UTC

As most of you know, I work for Nature Protocols and my research background, while quite varied, does not include Bioinformatics. As I am learning about it, I am overwhelmed by the extent of this subject and the number of open-access computer programs available.

We are interested in commissioning more bioinformatics protocols and at the moment my plan is to work through the methods sections of the “Genomics, Proteomics and Bioinformatics” papers in Journal of Biological Chemistry (e.g.) to generate a list of resources that people are using.

*This of course might not be the best approach, so I was wondering if any of you had ideas:

(1) Of how I should go about finding methods that people are going to want protocols for.

(2) Regarding specific methods that should be covered in our content.*

……….
Nature Protocols Website (Volume 2 is free at the moment)

A google document with lists of our content and ideas (in the process of being edited; absolutely don’t feel that you have to read it to answer the questions!!)

    • all tags

      • No tags for this topic.
  • Replies

    Post a reply
    • I think you’ve set yourself a difficult task!

      Bioinformatics doesn’t really have a concept of protocols, in the same sense as say, molecular biology. You have an idea of what you want to achieve (output) and an idea of how to obtain a dataset (input), but what happens inbetween (process) is up to you. It depends on what software is available (or what you need to write if there is none) and how you glue all the parts together. This is a creative process that different people will achieve in different ways, depending on their own preferences and skillset.

      That said, people are working to standardise some of the common procedures as workflows. Some of the Taverna team are members of this group; you should probably talk to them. Focusing on workflows would give some structure to your thoughts about protocols. Otherwise, as you say, it just becomes a very generalised overview that says “wow, what a lot of software there is”.

      Another approach is to start with particular types of experimental data. For instance: microbial genome sequences, high-throughput peptide identification by mass spectrometry, metabolic flux data – these are all distinct areas with their own set of “post-processing” computational techniques. Again, it might be a way to bring order to the mass of methods.

      Just a comment on what I personally don’t find useful: simple how-tos, such as “how to run a BLAST search”. This is not a protocol: individual users just need to go there, explore the website and see what it can do. There are also more effective ways to convey this information to novice users, such as screencasts.

    • Thanks for the useful tips, Neil. I had not come across Taverna before – it looks very interesting, though I was not able to download the program – will try again on my home pc.

      The idea of starting the thought process with the different types of experimental data is a good one. What else should we add to the list?
      So far we have:
      microbial genome sequences
      high-throughput peptide identification by mass spectrometry
      peptide sequences
      metabolic flux data

      I agree that screencasts are often the way to go and think that Bioscreencast is an excellent site. It would be great if everyone who found a really neat trick for doing something quickly made a screencast and shared it. While it is quite easy make a screencast, I know from experience that it is difficult to be motivated to stop in mid-work-flow to do this. The advantage of publishing a protocol is that we could collect this experience into one “place”, though we would need to work out a way of ensuring that it wasn’t too simple or explained something that was self-explanatory.

    • For microarray data:

      http://www.bioconductor.org/

      Bioconductor is a set of R libraries (http://www.r-project.org/) used to analyze microarray expression data.

    • We keep the following in our lab:

      Current Protocols in Bioinformatics
      (http://mrw.interscience.wiley.com/emrw/9780471250951/home)

      For the most part it is an extensive “how-to” list for using the many many public bioinformatics tools available.

    • Adding to what Neil said earlier, in the pharma industry, it is standard practice to use workflow/pipelining apps like Pipeline Pilot, Inforsense or Taverna to distribute standardized protocols.

      I would argue that bioinformatics techniques that have been used for a while have fairly well understood protocols (homology modeling comes to mind).

      How do you go around commissioning them? Reach out to scientists who publish a lot on a particular subject and are known as methods people. A good start would be people doing microarray analysis, especially gene expression where the protocols have been fairly standardized over the years.

    • I would agree with Deepak that for many common bioinformatic tasks there a set of ways that are well established to perform these tasks. There are many variations in some of the details of the implementations but the general procedures are the same. Homology modeling is a great example but others could be predicting functional association between proteins from diverse sets of -omics data, regulatory network reconstructions from say gene expression plus chip-on-chip. One that could be very timely would something related to analysis of polymorphisms to rank polymorphisms associated with a phenotype.
      I would suggest looking into review articles on the topics. There are also many articles in journals like Bioinformatics that do meta-analysis were they use the same benchmarks to test the best procedures (i.e best protocol) to obtain the desired output from the input. Here is a recent example of this type of meta-analysis comparing homology modeling “protocols”.

      In more general terms I am not sure how these protocols will benefit the readers. One big difference between lab work and bioinformatic work is that there is already a large pool of information to tap online whenever we are trying to learn one of these procedures. Most (good) tools have decent manuals and installations notes and most “protocols” require that we put together a group of tools in a pipeline that transforms your input into your desired output. So, in most cases we can find the information in manuals etc.

      I guess having everything neatly explained in one manuscript would make it more straightforward and more importantly it would provide credit to however goes to the trouble of carefully explaining how to set up the procedure.

      A very good plus would be for the authors of these protocols to also provide with code that can glue together the different tools in the pipeline or a workflow for a tool like Taverna.

    • For complete bioinformatics solutions:
      http://www.clcbio.com/
      It is an interesting idea to try to include bioinformatic in this paper.
      It is one of the most complete that I could find and use so far. Although I am not an expert because I am just trying to enter a Ph.D. in this area.

    • Thanks for all the helpful suggestions! I am going to spend a bit of time looking into these and will let you know how I get on. There will probably be more questions! :)

    • Hi Bronwen,

      uuh a very very very difficult task :)

      But there are some ‘bio’ sites for programming which support a lot of modules for bioinformatics tasks and corresponding description. Maybe you will be able to find some methods within their wiki or documentation.

      bioperl = http://www.bioperl.org/wiki/Main_Page
      biopyhton = http://biopython.org/wiki/Main_Page
      biosql = http://www.biosql.org/wiki/Main_Page
      biojava = http://biojava.org/wiki/Main_Page
      bioruby = http://bioruby.org/
      etc…

      Markus

    • Hi, Bronwen:

      Others here have made some good suggestions (Bioconductor, Biopython, Taverna, etc.)

      A lot of protocols as far as applications go are less interesting to me (as a scientist) than are the questions where the protocols are perhaps better established (or at least understood) but are coming fresh into the bioinformatics field, such as consideration of population and statistical sciences as related to bioinformatics. For instance, how do biostatisticans usually approach large-scale population studies, and how does bioinformatics overlap in the genomics studies ? Studies are increasing in size, and methods between typical biostatistics applications and bioinformatics are converging rapidly. There is a lot of gray area in that region.

      There is also interest on information management in biology and where it ties into research and inference: such as how to home-grow one’s own ontology, or how to use them to accelerate classification.

      A lot of the more established method collections have plenty of protocols in books and papers and community help efforts…I personally would love to see some protocol discussions or papers that cross over the line between disciplines in important areas.

    Post a reply

Search forums Advanced search

Submit this topic to

Advertisement