• corpasfoo by Manuel Corpas

    Broadly speaking, this blog is about latest trends in Bioinformatics, Computational Biology, the Human Genomone and the Internet. When posible I will discuss the social and ethical issues derived from the use of such technologies.

    • The credit crunch has pervaded all spheres of the economy and the biotechnology sector is no exception. That is not to say that all of biotechnology is in crisis. In fact, despite the fact that venture capitalists and investors are not willing to provide the much needed funds for start-up companies, there is still a better side to it.

      A lot of the research carried out in biotechnology is obtained from governmental and non-profit fundations such as the Wellcome Trust (1) or the BBSRC (2); I will call them development agencies. Funds here are allocated for several years and they are secured before they are given. This means that those lucky scientists working with funds from such sources are likely to be the least affected by the current financial turmoil. However, if the current downhill trend continues, next budgetary allocations will be affected and scientists should expect differences in the way the [little] money [left] is spent. Nevertheless this is not due to happen at least in several years time, at which point (fingers crossed) the current financial crisis might have remitted.

      New generation sequencing machines, microarrays and diagnostic kits orders should not decrease from labs funded by development agencies, especially in Europe. While NIH (3) funding has shrunk in real terms, budgets in Europe have continued to increase steadily for research, at least for now. Of course, there have been some exceptions, as in Italy, where the government has not performed according to promised plans of incorporating new faculty to Italian universities, leaving many Italian scientists in trouble. Nevertheless, it seems that at least in Europe as a whole, the commitment from politicians to support research continues.

      No industry is safe from the current credit crunch, but in my opinion, the view that the biotechnology sector is in crisis is not completely true and it does not correspond to all industries in this sector. When compared to other sectors of the economy, such as banking or construction, the panorama looks rather sweet.

      (1) http://www.wellcome.ac.uk/
      (2) http://www.bbsrc.ac.uk/
      (3) http://www.nih.gov/

    • Cloud computing: a new standard platform?

      Sunday, 08 Feb 2009

      Cloud computing is becoming a technology mature enough for its use in genome research experiments. The use of large datasets, its highly demanding algorithms and the need for sudden computational resources, make large-scale sequencing experiments an attractive test-case for cloud computing. So far I have seen cloud computing demonstrated using R (1). However, it remains to be seen a rigorous comparison of its performance using a BLAST (2) search and its ability to cope with ever-increasing databases and open source frameworks such as bioperl (3) or bioconductor (4).

      Cloud computing claims to be a resource where IT power is delivered over the Internet as you need it, rather than drawn from a desktop computer (5), in a fashion seemingly similar to having your own virtual servers available over the Internet (6). Some of the most important aspects of cloud computing are:

      • Software as a Service (SaaS): where you buy a software license for a determined period of time.
      • Utility Computing: storage and virtual servers that IT can access on demand.
      • Web Services.

      My first exposure to cloud computing came of an email from Matt Wood (7), a newly established group leader at the Sanger Institute (8), announcing the Cloud Computing Group (9) in Cambridge, UK. At that point I had no idea of what it meant. When I attended the meeting at Cambridge University’s Centre for Mathematical Sciences (10), to my surprise I found there a very select audience, ranging from the director of IT at Sanger, Phil Butcher (11), one of the Ensembl (12) software coordinators, Glenn Proctor (13), and quite a few local start-up companies.

      Among the presenters, we had Simone Brunozzi, from Amazon’s Cloud Computing (14). I think he had an interesting story to tell: how Amazon, a well known company, is now involved in the business of cloud computing and selling it. Apparently, this technology they sell was developed for Amazon’s own business. Among their main challenges was to be able to address the capricious shopping habits of customers, with orders peaking around Christmas and quite flat the rest of the year. These trends required rapid adaptability of computational resources. The idea of cloud computing fitted well with their business model of e-commerce: you don’t need to care about where your computation is done, the only thing you care about is that you have the needed resources and do not have to pay for them when you don’t need them. One of the things that stroke me about Amazon’s presentation was that they would not tell us the number of processors they had at their disposal.

      When it comes to using cloud computing for genomics research, prices may be quite expensive when they add up. The bioinformatics field, greatly influenced by the open-source movement, is not likely to rush to join Amazon’s cloud. Private efforts trying to make money out of human genome technology have remained rather unsuccessful to date: think of Celera Genomics or Lion Bioscience. I am skeptical of the bioinformatics community adopting cloud computing unless open source ideals are embraced: i) allowing people to develop and contribute to the technology if and when they want to, ii) allowing total openness in terms of its achievements and pitfalls and iii) making it free to use for everyone. I do not think that making it free does not mean there is no margin for profit. Think of the profitability of free-to-use technologies such as java (15) or MySQL (16), both components of SUN Microsystems’ (17) business.

      Despite the promise of potential benefits for the bioinformatics community, the way the cloud is being portrayed does not conform the ideals of free access and openness. Unless these ideals are implemented to some extent, I see it difficult for the cloud to take root in the bioinformatics field and become a new standard platform for genome research.

      References

      1. http://www.r-project.org/
      2. http://blast.ncbi.nlm.nih.gov/Blast.cgi
      3. http://www.bioperl.org/wiki/Main_Page
      4. http://www.bioconductor.org/
      5. http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman
      6. http://www.infoworld.com/article/08/04/07/15FE-cloud-computing-reality_1.html
      7. http://www.sanger.ac.uk/Users/mw4/
      8. http://www.sanger.ac.uk/
      9. http://cloudcamb.org/
      10. http://www.cms.cam.ac.uk/site/
      11. http://www.yourgenome.org/people/phil_butcher.shtml
      12. http://www.ensembl.org/index.html
      13. http://www.ebi.ac.uk/Information/Staff/person_maintx.php?s_person_id=299
      14. http://aws.amazon.com/ec2/
      15. http://www.java.com/en/
      16 http://www.mysql.com/
      17. http://www.sun.com/

    • 10 Sarcastic Rules on How to Be a Bioinformatician

      Wednesday, 28 Jan 2009

      Manuel Corpas, Segun Fatumo, Reinhard Schneider

      The advent of fast 3D gaming PCs, the Internet and massive sequencing efforts have attracted hackers and failed wet-lab biologists to the bioinformatics field. What follows is a compendium of 10 “sarcastic” rules that illustrate how a few months in the computer can save a few hours in the library (or in Google) 1.

      1. Stay low level at every level. Develop your code by anecdote: avoid planning phases, requirement analysis exercises or any structure to your code. Stay away from object oriented programming. Build up your own little myriad of helper scripts. Do not document either inside or outside your code. Your coding style should only be understood by you. Make sure your software does not scale. Refuse to model or abstract and always choose the quick and dirty fix.
      2. Be open source without being open. Error messages should never be provided. If error messages are provided, they should be utterly cryptic so as to convey as little information as possible to the end user 2. If you create the application, make it difficult to build it. Have plenty of hidden dependencies and bizarre variables. Don’t bother to debug or provide backwards compatibility. Ensure that your code is not portable, it only works in outdated operating systems and assume only you will use your application. Everyone will understand it.
      3. Make tools that make no sense to biologists. The less they resemble any intelligible scientific question the better. If you provide a help document, bombard scientists with abbreviations and provide as much unnecessary technical information as possible. The typical biologist hates mathematics, so use mathematical formulas extensively throughout the documentation. Integrate your workflows with as many irrelevant services as possible, so you’ll have greater the chances of a potential dead link.
      4. Do not provide a graphical user interface, command line is good. Force your end-users to use the command line. It helps if the parameter name does not relate to the intended action. For example, never use –o for specifying an output file, a “k” or “B” creates a much better impression. If you provide a graphical user interface, make sure there is no logic behind it, it is not intuitive to the user and support as few formats as you can, preferably html or text only. Forget HTTP-XML or SOAP. To make sure that the user experience is a nightmare, here are some guiding principles: 1) provide thousands of menu options and pop up windows that make no sense. 2) Ask the user decisions she can’t make. 3) Change your interface/format whenever you feel like it, despite the fact many users might depend on it.
      5. Make sure the output of your application is unreadable, unparseable and does not comply to any known standards. Just use plain ASCII text, or better still, provide your own format. Do not use ontologies, XML, or any other inter-exchangeable format. If you use XML, make your data file impossible to validate and do not follow the XML schema. You can also invent a new name for your gene if it doesn’t fit your schema.
      6. Be unreachable and isolated. Configure your contact email to either bounce back or permanently set it to vacation. Miss meetings or seminars where others are presenting their results. Reinvent the wheel. Do not keep up with the literature on current methods of research.
      7. Never maintain your databases, web services or any information that you may provide at any time. Provide unstable data, unstable models and unstable services. Your ultimate goal in data curation should be to propagate as many errors as possible from one database to another, while still making sure they sound realistic. Your curated data should only partially reflect the science of the papers you don’t read. When curating your data, make as many new categories as exceptions you find to your classifications. Forget about the biology and stay well away from convention.
      8. Blindly believe in the predictions given, P-values or statistics. Select instances for your training set that you know will give you the answer you want. Produce arbitrary cut-offs on rank-ordered result lists. Absolute truth above, absolute falsehood below. Do not ever change parameters of BLAST. If you get a list of hits, only look at the first one 1. Do not believe in “rubbish in rubbish out”; you just have to make sure that your rubbish data doesn’t smell.
      9. Never share your results and do not reuse. Do never discuss your results before your submission has been accepted in a lost conference proceeding. Learning from what others have done is a waste of time. Ignore what your colleagues have developed in the last two decades.
      10. Make your algorithm or analysis method irreproducible. The less testing you carry out in your experiments, the more revolutionary results you’ll get. When testing your algorithm, compare it against methods developed the past decade: your performance levels will look much better. Include irrelevant variables in your equations and make them unnecessary complex, so your reviewers will be very impressed by the complexity and the astonishing predictions you get.

      Acknowledgements
      The authors would like to thank Carole Goble for inspiration, Nils Gehlenborg for useful comments and Duncan Hull for pointing to useful material.

      References
      1 Carole Goble. The Seven Deadly Sins of Bioinformatics
      2 Andy Law. Law’s Laws.


Search blogs

web feed Request a blog Send an invite

Advertisement