Too Many Results from align() Function

Q

Why there are So Many Results from the align() Function?

✍: FYIcenter.com

A

If you are using the default score settings, you may get a very large number of possible alignments. Here is an example using the first and the third sequences from the PF05371_seed.faa file.

fyicenter$ python
>>> from Bio import Align
>>> aligner = Align.PairwiseAligner()

>>> target = "AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA"
>>> query =  "DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKFSSKA"
>>> alignments = aligner.align(target, query)
>>> print(len(alignments))
13792680

>>> print(alignments[0])
target  0 -----AEPNAATN-YATEAMD-SLKTQAI-DLIS-QTWPVVTT-V-VVAGLV-IRLFKKFSSKA 52
        0 -----|-----|--||||||--||||||--|||--|||||||--|-|-|||--||||||||||| 64
query   0 DGTSTA-----T-SYATEAM-NSLKTQA-TDLI-DQTWPVVT-SVAV-AGL-AIRLFKKFSSKA 52


>>> print(alignments[99999])
target  0 AE-PNAATN-YATEAMDSLKTQA-IDLI-SQTWPVVTT-V-VVAGLV-IRLFKKFSSKA 52
        0 ..-..|-|--||||||.||||||--|||--|||||||--|-|-|||--||||||||||| 59
query   0 DGTSTA-T-SYATEAMNSLKTQAT-DLID-QTWPVVT-SVAV-AGL-AIRLFKKFSSKA 52

>>> print(alignments.score)
40.0

>>> print(len(target))
52

As you can see, the align() function generated 13,792,680 possible alignments with the same best score of 40.0. This is because the gap and the mismatch are having the same score setting of 0.0. If we lower the the gap score setting to -1.0, all alignments with gaps will be eliminated.

>>> aligner.gap_score = -1.0
>>> alignments = aligner.align(target, query)
>>> print(len(alignments))
1

>>> print(alignments[0])
target  0 AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA 52
        0 .....||.||||||.||||||.|||.|||||||.|.||||.||||||||||| 52
query   0 DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKFSSKA 52

>>> print(alignments.score)
40.0

As you can see, these is only 1 best alignment, which is the result we are looking for.

Now if we put a gap in the query sequence by deleting the last 3 letters "SKA", align() will find 2 best alignments as we expected.

>>> target = "AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA"
>>> query =  "DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKFS"
>>> alignments = aligner.align(target, query)
>>> print(len(alignments))
2

>>> print(alignments[0])
target  0 AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA 52
        0 .....||.||||||.||||||.|||.|||||||.|.||||.||||||||--- 52
query   0 DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKFS--- 49

>>> print(alignments[1])
target  0 AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA 52
        0 .....||.||||||.||||||.|||.|||||||.|.||||.|||||||-|-- 52
query   0 DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRLFKKF-S-- 49

>>> print(alignments.score)
34.0

So if you are using the align() function, remember to change the default score settings to stop it generating too many best alignments.

 

Pre-defined Sequence Alignment Score Settings

Pairwise Sequence Alignment Score Settings

Biopython - Tools for Biological Computation

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-05-09, 860🔥, 0💬