Regarding clusters of viral proteins (viral orthologus group)

Amitabh Gupta

Thursday, 10 Apr 2008 06:28 UTC

Hello friends,

In my ongoing project, I need the dataset of viral orthologous group… if any one know about FTP link of VOG or related dataset, please inform me…

Regards,
Amitabh

  • Replies

    Post a reply
    • I am not familiar with bioinformatic work on virus but here is one database with orthogroups: http://www.vbrc.org/orthologs.asp

    • I assume that you want a set of orthologous viral proteins, extracted from a set of all known viral proteins?

      So here’s one way to do that:

      1. Go to the NCBI, select “Protein” from the search options and enter “txid10239[Organism:exp]” in the search box – 10239 is the NCBI ID for viruses
      2. That gives me 780 103 records – rather a lot!
      3. Click the tab that says “RefSeq” to show around 65 577 records
      4. Choose “FASTA” under Display…then go back one page…then choose “File” under Send To; say “yes” to downloading ~ 65 577 sequences, go for a coffee or three…
      5. Get yourself a copy of CD-HIT compile and install
      6. Run e.g. “cd-hit -i myfile -o myfile90 -c 0.9 -n 5 -d 0” on your saved sequences (myfile) to get a new set clustered at cutoff 90% identity

      There are probably viral datasets on the web that I don’t know about, allowing you to ignore all of this. However, making your own dataset when one is not available is a good bioinformatics skill.

    Post a reply

Search forums Advanced search

Submit this topic to

web feed

Advertisement