Collections:
Sequence Score against PSSM with Bio.motifs
How to Calculate Sequence Score against PSSM with Bio.motifs?
✍: FYIcenter.com
With the motif PSSM (Position-Specific Scoring Matrix) defined in the previous
tutorial, we can define a matching score of any given sequence against
the motif:
score = sum_over_j(PSSM[S[j], j]) where: S[i] is a given sequence PSSM[i, j] is the PSSM of a motif
If we the matching sore as a distribution, we calculate its summary statistics of minimum, maximum, mean and standard deviation.
1. Create an example motif with 7 DNA sequences.
fyicenter$ python >>> from Bio.Seq import Seq >>> instances = [ ... "TACAA", ... "TACGC", ... "TACAC", ... "TACCC", ... "AACCC", ... "AATGC", ... "AATGC", ... ] >>> m = motifs.create(instances)
2. Calculate PWM and PSSM.
>>> pseudocounts={"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6}
>>> pwm = m.counts.normalize(pseudocounts)
>>> print(pwm)
0 1 2 3 4
A: 0.40 0.84 0.07 0.29 0.18
C: 0.04 0.04 0.60 0.27 0.71
G: 0.04 0.04 0.04 0.38 0.04
T: 0.51 0.07 0.29 0.07 0.07
>>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3}
>>> pssm = pwm.log_odds(background)
>>> print(pssm)
0 1 2 3 4
A: 0.42 1.49 -2.17 -0.05 -0.75
C: -2.17 -2.17 1.58 0.42 1.83
G: -2.17 -2.17 -2.17 0.92 -2.17
T: 0.77 -2.17 -0.05 -2.17 -2.17
3. Calculate matching scores of some given sequences.
>>> seq = "TACAA" >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score 3.037341679708973 >>> seq = "CCGTG" >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score -10.849625007211563 >>> seq = "AAAAA" >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score -1.071182777069196
4. Calculate minimum and maximum of PSSM, which are defined as below.
# minimum = sum_over_j(min_over_i(PSSM[i,j]))
>>> print("%4.2f" % pssm.min)
-10.85
# maximum = sum_over_j(max_over_i(PSSM[i,j]))
>>> print("%4.2f" % pssm.max)
6.59
5. The minimum of PSSM of motif is actually the matching score of the anticonsensus sequence of the motif. The maximum of PSSM is the matching score of the consensus sequence of the motif.
>>> seq = m.anticonsensus >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score -10.84962500721156 >>> seq = m.consensus >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score 6.594289804260533
4. Calculate mean and standard deviation of PSSM.
>>> mean = pssm.mean(background)
>>> std = pssm.std(background)
>>> print("mean = %0.2f, standard deviation = %0.2f" % (mean, std))
mean = 3.21, standard deviation = 2.59
⇒ Search for Motif Matches with Bio.motifs
2023-06-19, 724🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1000237 SMILES: CCCCC Received at FYIcenter.com on: 2021-02-04
Molecule Summary: ID: FYI-1001097 SMILES: C1CCC1 Received at FYIcenter.com on: 2021-12-24
Molecule Summary: ID: FYI-1003878 Names: InChIKey: YQZBFMJOASEONC-UHFFFAOYS A-NSMILES: CCCc1ccccc1C ...
Reaction Summary: ID: FYI-1005060 Formula: SMILES: Cl.[Na]O>>[Na]Cl.O Received at FYIcenter.com on: ...
Molecule Summary: ID: FYI-1003542 Names: InChIKey: GFLZVIUGJVSYJX-LZYCVMHZS A-NSMILES: C=C2CCC1C(C)(...