Regarding clusters of viral proteins (viral orthologus group)
Amitabh Gupta
Thursday, 10 April 2008 06:28 UTC
Hello friends,
In my ongoing project, I need the dataset of viral orthologous group… if any one know about FTP link of VOG or related dataset, please inform me…
Regards,
Amitabh
-
Replies
-
I am not familiar with bioinformatic work on virus but here is one database with orthogroups: http://www.vbrc.org/orthologs.asp
-
I assume that you want a set of orthologous viral proteins, extracted from a set of all known viral proteins?
So here’s one way to do that:
- Go to the NCBI, select “Protein” from the search options and enter “txid10239[Organism:exp]” in the search box – 10239 is the NCBI ID for viruses
- That gives me 780 103 records – rather a lot!
- Click the tab that says “RefSeq” to show around 65 577 records
- Choose “FASTA” under Display…then go back one page…then choose “File” under Send To; say “yes” to downloading ~ 65 577 sequences, go for a coffee or three…
- Get yourself a copy of CD-HIT compile and install
- Run e.g. “cd-hit -i myfile -o myfile90 -c 0.9 -n 5 -d 0” on your saved sequences (myfile) to get a new set clustered at cutoff 90% identity
There are probably viral datasets on the web that I don’t know about, allowing you to ignore all of this. However, making your own dataset when one is not available is a good bioinformatics skill.
-