Cloud computing is becoming a technology mature enough for its use in genome research experiments. The use of large datasets, its highly demanding algorithms and the need for sudden computational resources, make large-scale sequencing experiments an attractive test-case for cloud computing. So far I have seen cloud computing demonstrated using R (1). However, it remains to be seen a rigorous comparison of its performance using a BLAST (2) search and its ability to cope with ever-increasing databases and open source frameworks such as bioperl (3) or bioconductor (4).
Cloud computing claims to be a resource where IT power is delivered over the Internet as you need it, rather than drawn from a desktop computer (5), in a fashion seemingly similar to having your own virtual servers available over the Internet (6). Some of the most important aspects of cloud computing are:
- Software as a Service (SaaS): where you buy a software license for a determined period of time.
- Utility Computing: storage and virtual servers that IT can access on demand.
- Web Services.
My first exposure to cloud computing came of an email from Matt Wood (7), a newly established group leader at the Sanger Institute (8), announcing the Cloud Computing Group (9) in Cambridge, UK. At that point I had no idea of what it meant. When I attended the meeting at Cambridge University’s Centre for Mathematical Sciences (10), to my surprise I found there a very select audience, ranging from the director of IT at Sanger, Phil Butcher (11), one of the Ensembl (12) software coordinators, Glenn Proctor (13), and quite a few local start-up companies.
Among the presenters, we had Simone Brunozzi, from Amazon’s Cloud Computing (14). I think he had an interesting story to tell: how Amazon, a well known company, is now involved in the business of cloud computing and selling it. Apparently, this technology they sell was developed for Amazon’s own business. Among their main challenges was to be able to address the capricious shopping habits of customers, with orders peaking around Christmas and quite flat the rest of the year. These trends required rapid adaptability of computational resources. The idea of cloud computing fitted well with their business model of e-commerce: you don’t need to care about where your computation is done, the only thing you care about is that you have the needed resources and do not have to pay for them when you don’t need them. One of the things that stroke me about Amazon’s presentation was that they would not tell us the number of processors they had at their disposal.
When it comes to using cloud computing for genomics research, prices may be quite expensive when they add up. The bioinformatics field, greatly influenced by the open-source movement, is not likely to rush to join Amazon’s cloud. Private efforts trying to make money out of human genome technology have remained rather unsuccessful to date: think of Celera Genomics or Lion Bioscience. I am skeptical of the bioinformatics community adopting cloud computing unless open source ideals are embraced: i) allowing people to develop and contribute to the technology if and when they want to, ii) allowing total openness in terms of its achievements and pitfalls and iii) making it free to use for everyone. I do not think that making it free does not mean there is no margin for profit. Think of the profitability of free-to-use technologies such as java (15) or MySQL (16), both components of SUN Microsystems’ (17) business.
Despite the promise of potential benefits for the bioinformatics community, the way the cloud is being portrayed does not conform the ideals of free access and openness. Unless these ideals are implemented to some extent, I see it difficult for the cloud to take root in the bioinformatics field and become a new standard platform for genome research.
References
1. http://www.r-project.org/
2. http://blast.ncbi.nlm.nih.gov/Blast.cgi
3. http://www.bioperl.org/wiki/Main_Page
4. http://www.bioconductor.org/
5. http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman
6. http://www.infoworld.com/article/08/04/07/15FE-cloud-computing-reality_1.html
7. http://www.sanger.ac.uk/Users/mw4/
8. http://www.sanger.ac.uk/
9. http://cloudcamb.org/
10. http://www.cms.cam.ac.uk/site/
11. http://www.yourgenome.org/people/phil_butcher.shtml
12. http://www.ensembl.org/index.html
13. http://www.ebi.ac.uk/Information/Staff/person_maintx.php?s_person_id=299
14. http://aws.amazon.com/ec2/
15. http://www.java.com/en/
16 http://www.mysql.com/
17. http://www.sun.com/
Cloud computing is not a software package, and talking about the ideal of open source in the context of cloud computing does not make a lot of sense. At it’s heart, the cloud computing offers computing cycles as an abstraction, so the user doesn’t need to think of the underlying hardware. Scientists buy plenty of hardware from for profit companies and I don’t see a lot of them clamoring for ‘open source ideals’ in their desire to hack their computing hardware. The biosciences will adopt cloud computing out of necessity and cost effectiveness – and
they will pay for it just like they pay for their computers today….but it will be cheaper and they will never need to
worry about another forgotten backup again.
And as far as the open source movement, in some areas it’s effective and useful. These areas tend to be in computing areas with very board applicability, such as languages. Narrowly focused open source projects wither and die after a few years (at best). The tools used in the bio-sciences are relatively
narrow and currently very fragmented – making getting stuff done hard. This will be solved and it will be solved
with for profit closed software, and it will be so much better than trying to cobble together some hacked C
programs in a bash script somewhere that most will adopt. Don’t confuse the human genome with software – they are entirely different types of data and the latter (despite many trying to say otherwise) has no moral high ground
to take. It’s just bits and bytes, here today and gone tomorrow.
(footnote 1. You are comparing two fundamentally different technologies – Java and MySQL are software tools that can be duplicated at almost no cost. Cloud computing is nothing like that and relies fundamentally on massive hardware and power investment.)
(footnote 2. Who gives a damn how many processors amazon has. It changes everyday, and really that’s the whole point, you don’t need to care. )
(footnote 3. Perhaps the US gov could run the cloud? Then we’d all have ‘free’ cycles.)