Collections:
Too Many Results from align() Function
Why there are So Many Results from the align() Function?
✍: FYIcenter.com
If you are using the default score settings,
you may get a very large number of possible alignments.
Here is an example using the first and the third sequences from the
PF05371_seed.faa file.
fyicenter$ python
>>> from Bio import Align
>>> aligner = Align.PairwiseAligner()
>>> target = "AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA"
>>> query = "DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKFSSKA"
>>> alignments = aligner.align(target, query)
>>> print(len(alignments))
13792680
>>> print(alignments[0])
target 0 -----AEPNAATN-YATEAMD-SLKTQAI-DLIS-QTWPVVTT-V-VVAGLV-IRLFKKFSSKA 52
0 -----|-----|--||||||--||||||--|||--|||||||--|-|-|||--||||||||||| 64
query 0 DGTSTA-----T-SYATEAM-NSLKTQA-TDLI-DQTWPVVT-SVAV-AGL-AIRLFKKFSSKA 52
>>> print(alignments[99999])
target 0 AE-PNAATN-YATEAMDSLKTQA-IDLI-SQTWPVVTT-V-VVAGLV-IRLFKKFSSKA 52
0 ..-..|-|--||||||.||||||--|||--|||||||--|-|-|||--||||||||||| 59
query 0 DGTSTA-T-SYATEAMNSLKTQAT-DLID-QTWPVVT-SVAV-AGL-AIRLFKKFSSKA 52
>>> print(alignments.score)
40.0
>>> print(len(target))
52
As you can see, the align() function generated 13,792,680 possible alignments with the same best score of 40.0. This is because the gap and the mismatch are having the same score setting of 0.0. If we lower the the gap score setting to -1.0, all alignments with gaps will be eliminated.
>>> aligner.gap_score = -1.0
>>> alignments = aligner.align(target, query)
>>> print(len(alignments))
1
>>> print(alignments[0])
target 0 AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA 52
0 .....||.||||||.||||||.|||.|||||||.|.||||.||||||||||| 52
query 0 DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKFSSKA 52
>>> print(alignments.score)
40.0
As you can see, these is only 1 best alignment, which is the result we are looking for.
Now if we put a gap in the query sequence by deleting the last 3 letters "SKA", align() will find 2 best alignments as we expected.
>>> target = "AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA"
>>> query = "DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKFS"
>>> alignments = aligner.align(target, query)
>>> print(len(alignments))
2
>>> print(alignments[0])
target 0 AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA 52
0 .....||.||||||.||||||.|||.|||||||.|.||||.||||||||--- 52
query 0 DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKFS--- 49
>>> print(alignments[1])
target 0 AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA 52
0 .....||.||||||.||||||.|||.|||||||.|.||||.|||||||-|-- 52
query 0 DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKF-S-- 49
>>> print(alignments.score)
34.0
So if you are using the align() function, remember to change the default score settings to stop it generating too many best alignments.
⇒ Pre-defined Sequence Alignment Score Settings
⇐ Pairwise Sequence Alignment Score Settings
2023-05-09, 860🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1004908 Names: InChIKey: WOXKDUGGOYFFRN-UHFFFAOYS A-NSMILES: CN6CC(=O)N3C(...
Molecule Summary: ID: FYI-1004192 Names: InChIKey: LFCJSESMPHNHSO-UHFFFAOYS A-NSMILES: CCCCCCCCCCCCC...
Molecule Summary: ID: FYI-1000315 SMILES: CCCOc1ccc(CNC(=O)CSc2nc( C)n[nH]2)cn1Received at FYIcenter...
Molecule Summary: ID: FYI-1004515 Names: InChIKey: SITLCYUFBGMNPH-UHFFFAOYS A-NSMILES: NCC4=NC3C=C(N...
Molecule Summary: ID: FYI-1002200 Names: InChIKey: YUFRMYCKBZEYKU-JSKKQJDMS A-NSMILES: COc6cccc(C5C1...