3

I'd like to build the evolutionary history of a protein, given its sequence. Namely, given a FASTA entry how can I build an evolutionary tree? Here is the 5wxy protein as an example:

>5WXY:A|PDBID|CHAIN|SEQUENCE
MGHHHHHHMMKTKLPILGVLGGMGPVVTAEFLKSIYEYNPFIDKEQESPNVIVFSFPSAPDRTGSIDSGKEREFIDFIQV
NLEHLNKLADCIVIGSCTAHYALPQIPENLKDKLISLIKIADQELQEYNEPTLLLASTGTYQKKLFQEGCTTADLIISLS
ESDQKLIHEMIYKVLKRGHDPLSILRDIEALLEKYNTRSYISGSTEFHLLTKSLKLKGIDSIKAIDPLSTIAQNFSQLII
KQAQVDLVTDCHQPSNPKSP

Do I need to do some queries to Entrez in order to build the evolution tree?

I saw this paper, but I am not sure how I can use it with a single mammal's protein primary sequence.


Question given a protein name, is it possible to look for its evolutionary history or build evolution tree using other databases beside PDB on top of bioinformatics algorithms.

M__
  • 12,263
  • 5
  • 28
  • 47
0x90
  • 1,437
  • 9
  • 18
  • 2
    You can’t build an evolutionary tree given a single data point (e.g. a single protein sequence). A tree relates multiple data points. Could you therefore please clarify what your input and desired output are? – Konrad Rudolph Jan 24 '18 at 15:08
  • @KonradRudolph, I would like to a protein from the PDB and find/generate its evolutionary tree. – 0x90 Jan 24 '18 at 15:12
  • Now you’ve just restated the question. But as I tried to explain, this doesn’t make sense. – Konrad Rudolph Jan 24 '18 at 15:16
  • @KonradRudolph I see what you are saying. Though how comes other people create such trees? for example (this paper) [http://www.pnas.org/content/108/20/8329.full]. – 0x90 Jan 24 '18 at 15:19
  • 2
    The trees in that paper relate many different E. coli strains, which is exactly my point: rather than having a single data point, they are comparing several different ones. – Konrad Rudolph Jan 24 '18 at 18:29
  • @KonradRudolph so is there a database that associates a protein to its family etc.? I am not saying it's possible to get the whole tree from a single node. Of course it's not possible. My question is how can I do it using other resources given a PDB name. – 0x90 Jan 24 '18 at 18:30
  • 1
    Given a PDB identifier and sequence, one can use other databases like NCBI nucleotide database to find other related sequences (using psi-blast) to build a tree. All the resources you linked had a multiple comparison/alignment in its workflow, so you need other resources aside from PDB – llrs Jan 25 '18 at 08:08
  • @Llopis is there DB for amino acids database? Or an example for how to use what you have suggested? (Using NCBI/psi-blast) – 0x90 Jan 25 '18 at 12:14
  • There's uniprot, and you can use blast from protein to DNA. That is far away from your question. I'm just trying to help you to redirect your question for you to get answer. What have you tried/searched? I can spend some time helping people but if people don't spend time helping themselves I won't. – llrs Jan 25 '18 at 12:28
  • @Llopis what's blast? At the moment I try to understand what I need to know in order to find the evolutionary tree for a PDB entry. – 0x90 Jan 25 '18 at 12:39
  • Did you seach for it ? "blast bioinformatics" will lead you on the path – llrs Jan 25 '18 at 12:40
  • @Llopis, yes I ran it. What did you have in mind to do with it and http://www.uniprot.org? The results: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=6KNTUB6D016 – 0x90 Jan 25 '18 at 13:31
  • 1
    @0x90 What do you think was meant? You need to put a modicum of effort into things. – Devon Ryan Jan 25 '18 at 13:34
  • @DevonRyan I am asking for a general direction of what to do uniport offers various number of tools and I am not sure what should I do. I want to understand the idea of how to do it first rather to run bunch of tools without understanding. – 0x90 Jan 25 '18 at 13:37
  • @DevonRyan one of my problems is why it would be even possible to do evolution in reverse. I would expect evolution to be a irreversible process. – 0x90 Jan 25 '18 at 13:40
  • @0x90 If you are not sure what to do search for it, if even so you can't then solve your problem, then ask, but not the other way round, please. If you need more training you can take courses, there are some online courses very interesting – llrs Jan 25 '18 at 14:09
  • @Llopis I think what you guys gave me so far should give me some general idea to start with. If you have good idea for a course/reference on evolution in bioinformatics please let me know. – 0x90 Jan 25 '18 at 14:37
  • I am not informed of the online courses, but I'm sure you will find some, you can check back for opinions in the [chat] (if you want some). – llrs Jan 25 '18 at 14:54

2 Answers2

3

The general procedure is:

  1. Find the sequence for the same (or as similar as you can find (see mentions of blast in the comments above)) gene in other species. You can use uniprot or any other large sequence database for this.
  2. You now have a large set of sequences related to yours, which you'll need to compare. Search the methods section of your favorite paper to see how they reconstructed likely evolutionary history from this, there are a number of packages out there.
Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
2

enter image description here

There you go, its Microcystis aeruginosa aspartate/glutamate racemase. Easy does it. If you want to know how you can do this in 15 seconds let me know.


Its one of the new features of NCBI's Blast

  1. Go to Blast here, https://blast.ncbi.nlm.nih.gov/Blast.cgi
  2. Enter your sequence into the box (it doesn't accept PDB codes alone)
  3. Enter the protein database - when I first did this calculation I used SwissProt, thinking there would be alot of sequences - I then used "nr"
  4. Under the algorithm parameters enter "50" (default is too many)
  5. Hit "Blast"
  6. Once the search is complete at the top of the page are the hyperlinks: "Other reports: Search Summary [Taxonomy reports] [Distance tree of results]"
  7. Click on "Distance tree of results"
  8. The following page will load automatically, automatically aligning your sequences and producing, in this case a parsimony based tree, but there is also the option of a nj tree (recommended) .... here: https://www.ncbi.nlm.nih.gov/blast/treeview/treeView.cgi?request=page&blastRID=5A1ZBRU1014&queryID=lcl|Query_210080&entrezLim=&ex=&exl=&exh=&ns=50&screenWidth=1280&screenHeight=800
  9. Click "Tool", "Download", "PDF" ...
Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
M__
  • 12,263
  • 5
  • 28
  • 47