We believe that if after running blast, the global identity between a resulting sequence and our query is at least 30%, we can say that two sequences are homologs. what is the difference between local identity and global identity and how can we calculate them? the following file is the global alignment result of one of my PSI-BLAST hits against query:
Aligned_sequences: 2
# 1: NP_418577.1
# 2: WP_036312822.1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 264
# Identity: 85/264 (32.2%)
# Similarity: 125/264 (47.3%)
# Gaps: 29/264 (11.0%)
# Score: 376.5
#
#
#=======================================
NP_418577.1 1 -----------MAEMKNLKIE-------VVRYNPEVDTAPHSAFYEVPYD 32
..|..:..|: :.|::||||..|....|:|...
WP_036312822. 1 MTIVDSGAPADTQEANDSGIQSYLVTFIIRRFDPEVDAEPRWVDYDVEMY 50
NP_418577.1 33 ATTSLLDALGYIKDNLAPDLSYRWSCRMAICGSCGMMVNNVPKLACKTFL 82
.|..:||||..||.::...||:|.||...||||..|.:|...:|||||.:
WP_036312822. 51 PTDRVLDALHRIKWDVDGTLSFRRSCAHGICGSDAMRINGRNRLACKTLI 100
NP_418577.1 83 R--DYTDGMKVEALANFPIERDLVVDMTHFIESLEAIKPYIIGNSRTADQ 130
: |.:..:.|||:...|:|:||:|||..|.||...::|::...|.....
WP_036312822. 101 KDLDISKPIYVEAIKGLPLEKDLIVDMDPFFESFRDVQPFLQPKSAPEPG 150
NP_418577.1 131 GTNIQTPAQMAKYHQFSGCINCGLCYAACPQFGLNPEFIGPAAITLAHRY 180
....|:....|.|...:.||.|..|.::||.|..:.::.|||||..|||:
WP_036312822. 151 KERFQSIKDRAVYDDTTKCILCAACTSSCPVFWTDGQYFGPAAIVNAHRF 200
NP_418577.1 181 NEDSRDHGKKERMAQLNSQNGVWSCTFVGYCSEVCPKHVDPAAAIQQGKV 230
..||||.....|:..||.:.|||.|.....|:|.||:.::...||.:.|.
WP_036312822. 201 IFDSRDDAADVRLDILNDKEGVWRCRTTFNCTEACPRGIEITKAIAEVKQ 250
NP_418577.1 231 ESSKDFLIATLKPR 244
...:.
WP_036312822. 251 AVLRG--------- 255
#---------------------------------------
#---------------------------------------
the following file is the local alignment of that hit against query:
Aligned_sequences: 2
# 1: NP_418577.1
# 2: WP_036312822.1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 219
# Identity: 83/219 (37.9%)
# Similarity: 119/219 (54.3%)
# Gaps: 2/219 ( 0.9%)
# Score: 390.5
#
#
#=======================================
NP_418577.1 13 RYNPEVDTAPHSAFYEVPYDATTSLLDALGYIKDNLAPDLSYRWSCRMAI 62
|::||||..|....|:|....|..:||||..||.::...||:|.||...|
WP_036312822. 31 RFDPEVDAEPRWVDYDVEMYPTDRVLDALHRIKWDVDGTLSFRRSCAHGI 80
NP_418577.1 63 CGSCGMMVNNVPKLACKTFLR--DYTDGMKVEALANFPIERDLVVDMTHF 110
|||..|.:|...:|||||.:: |.:..:.|||:...|:|:||:|||..|
WP_036312822. 81 CGSDAMRINGRNRLACKTLIKDLDISKPIYVEAIKGLPLEKDLIVDMDPF 130
NP_418577.1 111 IESLEAIKPYIIGNSRTADQGTNIQTPAQMAKYHQFSGCINCGLCYAACP 160
.||...::|::...|.........|:....|.|...:.||.|..|.::||
WP_036312822. 131 FESFRDVQPFLQPKSAPEPGKERFQSIKDRAVYDDTTKCILCAACTSSCP 180
NP_418577.1 161 QFGLNPEFIGPAAITLAHRYNEDSRDHGKKERMAQLNSQNGVWSCTFVGY 210
.|..:.::.|||||..|||:..||||.....|:..||.:.|||.|.....
WP_036312822. 181 VFWTDGQYFGPAAIVNAHRFIFDSRDDAADVRLDILNDKEGVWRCRTTFN 230
NP_418577.1 211 CSEVCPKHVDPAAAIQQGK 229
|:|.||:.::...||.:.|
WP_036312822. 231 CTEACPRGIEITKAIAEVK 249
#---------------------------------------
#---------------------------------------
Furthermore, "homolog" has a very specific definition. Homologs are genes that are both descended from the same ancestral sequence. It doesn't, as many often mistake, mean 'similarity'. 30% sequence identity, does not imply homology by itself. You need more information.
– Joe Healey Jun 21 '17 at 08:49If you really want to know orthology/homology, I'd suggest using a dedicated orthology finding program like OrthAgogue, OrthoMCL, or better yet ROARY if your sequences are bacterial.
– Joe Healey Jun 21 '17 at 09:04