Collections:
Search for Motif Matches with Bio.motifs
How to Search for Matches in a Target Sequence again a motif with Bio.motifs?
✍: FYIcenter.com
Bio.motifs module offers two options to search for segments
that match a motif in a target sequence.
1. Use the motif instances to search for exact matches.
fyicenter$ python
>>> from Bio.Seq import Seq
>>> instances = [
... "TACAA",
... "TACGC",
... "TACAC",
... "TACCC",
... "AACCC",
... "AATGC",
... "AATGC",
... ]
>>> m = motifs.create(instances)
>>> test_seq = Seq("TACACTGCATTACAACCCAAGCATTA")
>>> for pos, seq in m.instances.search(test_seq):
... print("%i %s" % (pos, seq))
...
0 TACAC
10 TACAA
13 AACCC
As you can see, 3 matches found that match exact with one of those sequence samples.
2. Use the motif PSSM to search for approximate matches.
>>> pseudocounts={"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6}
>>> pwm = m.counts.normalize(pseudocounts)
>>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3}
>>> pssm = pwm.log_odds(background)
>>> print(pssm)
0 1 2 3 4
A: 0.42 1.49 -2.17 -0.05 -0.75
C: -2.17 -2.17 1.58 0.42 1.83
G: -2.17 -2.17 -2.17 0.92 -2.17
T: 0.77 -2.17 -0.05 -2.17 -2.17
>>> for position, score in pssm.search(test_seq, threshold=3.0):
... print("Position %d: score = %5.3f" % (position, score))
...
Position 0: score = 5.622
Position -20: score = 4.601
Position 10: score = 3.037
Position 13: score = 5.738
Position -6: score = 4.601
Note that the negative positions refer to matches of the motif found on the reverse strand of the test sequence, which are positioned backward starting from the end of the test sequence.
3. Calculate matching scores of all positions. The output only shows scores of forward matches.
>>> scores = pssm.calculate(test_seq) >>> print(scores) [ 5.622304 -5.6797 -3.4317725 0.93827754 -6.849625 -2.0406609 -10.849625 -3.6561453 -0.03370807 -3.9110255 3.0373416 -2.1491852 -0.6016975 5.7381525 -0.509775 -3.5642228 -8.734148 -0.09919716 -0.6016975 -2.3942978 -10.849625 -3.6561453 ]
As you can, the match at the first position, "TACAC", has the highest score of 5.6.
⇒ Compare Motifs Using PSSM with Bio.motifs
⇐ Sequence Score against PSSM with Bio.motifs
2023-06-19, 788🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1001997 SMILES: CCCCCCCCCCCOC(=O)CCCCCN( CCCCCCCC(=O)OC(CCCCCCCC) CCCCCCCC)C...
Molecule Summary: ID: FYI-1005598 Names: InChIKey: MECSLPTUYZMKCW-XMMPIXPAS A-NSMILES: Cc4cc3N=C(c1c...
Molecule Summary: ID: FYI-1004219 Names: InChIKey: UOBNLIVSMSHGHB-KOXFPNTLS A-NSMILES: O=C(O)[C@H]2/...
How to generate a molecule SVG picture from a SMILES string? The easiest way to generate a molecule ...
Molecule Summary: ID: FYI-1004688 Names: InChIKey: OSGJRHOJQRWCCW-KQWYESAVS A-NSMILES: CC/C=C/CC/C=C...