Central portal for glycomics data processing
William York
Wednesday, 10 September 2008 13:09 UTC
Advances in glycobiology are currently limited by our inability to efficiently process glycomics and glycoproteomics data (e.g., MS and MS^n data). However, several research groups are working on methods for the automatic or computer-assisted interpretation of mass spectral data for glycomics. Nevertheless, development of workflows to take advantage of these computational methods is inhibited by the fact that glycomics analysis is a “moving target” – that is, the specific experimental methods (e.g., stable isotope labeling protocols) and overall experimental approaches are changing rapidly. In any case, many of the data processing steps (e.g., conversion to a common format such as mzXML) are not affected by these protocol changes, and can be taken from older workflows to be reused in new workflows.
This state of affairs has lead several groups (e.g., at the CCRC and DKFZ) to suggest that a modular data processing system should be developed. In such a system, individual laboratories could take modules from a central repository and string them together to form data processing workflows. This approach has been successfully applied to proteomic analysis (see, for example, the transproteomic pipeline at http://tools.proteomecenter.org/wiki/index.php?title=Software:TPP). In fact many of the data processing modules available via TPP and other initiatives can be applied to glycomics and glycoproteomics analysis. However, modules that are specifically aimed at processing glycan and glycoconjugate analytical data are still needed.
I suggest that the glycobiology community would benefit by initiation of a central web portal with the following content and goals.
1. A searchable collection of data processing modules for glycomics and glycoproteomics analytical data.
2. Clearly described guidelines for development of these modules.
3. Clearly described data-exchange standards to implement the modules.
4. A discussion forum to arrive at community consensus regarding the digital format and content of parameter lists and configuration settings that are necessary for robust data processing modules.
5. Interactive features (e.g., a “wish list”) aimed at soliciting module development from research groups that specialize in particular data processing techniques.
In my opinion, such a portal should encourage an open-source software development approach so that improvements of and extensions to the modules can be implemented by interested scientists.
Many of the individual steps in glycomics data processing could be efficiently implemented as Web services. (See http://en.wikipedia.org/wiki/Web_services) However, it is clear that Web services are impractical for processing of very large data sets in a single call. Therefore, efficient data processing modules may require a combination of downloadable and on-line modules. This is one of many issues that require comment and suggestions by the glycobiology community.
-
Replies