Data for the masses
Maxine Clarke
Tuesday, 13 January 2009 15:49 UTC
Initiatives for digital research infrastructure should focus more on making standardized data openly available, and less on developing new portals, according to an Editorial in Nature on 8 January (457, 129; 2009).
The flood of digital research data that scientists are generating through genomics, sensor and other technologies has made it imperative to create an infrastructure to use, repurpose and preserve those data. Some such efforts are already under way (see Big Data special). But how should the responsibilities be divided between governments and the private sector?
A series of events in December highlights the complexities of this issue. One was a pioneering move by Amazon to host large scientific data sets for free, starting with GenBank and other widely used sequence and chemical-structure databases
Amazon’s move is not altogether altruistic. Although researchers will be able to download the data to their own computers, the company is betting that many will instead use its ‘cloud computing’ technology, which makes the company’s vast server infrastructure available to process the data sets on a pay-as-you-go basis.
Such services could offer immense benefits to research. By giving scientific data a permanent home online, Amazon could help to ease the long-running problem of databases that are abandoned when, for example, funding dries up at the end of a research grant. Its cloud-computing approach could liberate smaller labs from the cost of running data centres of their own. And it should facilitate the sharing of data and analysis tools between widely dispersed research teams.
Also in December, Google announced it was abandoning its plans to host large scientific data sets for free, apparently because of the economic downturn. In November, the European Union, in collaboration with research organizations, libraries and museums, launched the Europeana online digital library as its much-touted alternative to Google Books. Europeana has scanned valuable historical collections, but its computing infrastructure crashed within hours, not clunking back into service until more than a month later. A similar fate met Géoportail, a high-profile service created by the French government as a competitor to Google Earth.
In creating such public offerings, governments address valid concerns that private companies will exercise monopolies over significant cultural and scientific heritage. But their focus on creating their own digital libraries and databases too often means that other, perhaps more important, ways to address such concerns are neglected. Making standardized data openly available to both commercial and not-for-profit organizations, for instance, could spur innovation of superior information services. And to avoid embarrassing crashes, public efforts might well consider partnerships with private firms to tap into the economies of scale and expertise of the Googles and Amazons of the world.
Updated 13 January 2009 15:58 UTC
-
Replies
-
The involvement of Google and Amazon into the data storage and analysis (via their cloud computing) is great. However, there are still so many obstacles (both legal and social) that surround the ability of researchers to make their data available it is difficult to see this going forward in a meaningful manner any time soon. Hopefully though we can look at the implementation of the caBIG project as a reference to get past some of the existing issues.
-