Collections:
Motif PSSM with Bio.motifs
How to Calculate Motif PSSM with Bio.motifs Module?
✍: FYIcenter.com
PSSM (Position-Specific Scoring Matrix),
also referred as PSWM (Position-Specific Weight Matrix)
or LSM (Logodds Scoring Matrix),
represents how well the frequency of each letter at each
position matches with a given background frequency.
PSSM can be expressed as:
PSSM[i,j] = log2(PPM[i,j]/B[i]) where: PPM[i,j] is the Position Probability Matrix. B[i] is a background frequency column. log2() is logarithm function of base 2.
The simplest background frequency model assumes that each letter appears equally in the entire population. So for DNA sequences, the simplest background frequency column is B = (Ba, Bc, Bg, Bt) = (0.25, 0.25, 0.25, 0.25).
In Biopython, we can use the log_odds() to calculate the PSSM against the simplest background frequency model. Note that log_odds() uses B = (0.25, 0.25, 0.25, 0.25) by default.
fyicenter$ python
>>> from Bio import motifs
>>> samples = [
... "AAGAAT",
... "ATCATA",
... "AAGTAA",
... "AACAAA",
... "ATTAAA",
... "AAGAAT"
... ]
>>> m = motifs.create(samples)
>>> ppm = m.counts.normalize()
>>> print(ppm)
0 1 2 3 4 5
A: 1.00 0.67 0.00 0.83 0.83 0.67
C: 0.00 0.00 0.33 0.00 0.00 0.00
G: 0.00 0.00 0.50 0.00 0.00 0.00
T: 0.00 0.33 0.17 0.17 0.17 0.33
>>> pssm = ppm.log_odds()
>>> print(pssm)
0 1 2 3 4 5
A: 2.00 1.42 -inf 1.74 1.74 1.42
C: -inf -inf 0.42 -inf -inf -inf
G: -inf -inf 1.00 -inf -inf -inf
T: -inf 0.42 -0.58 -0.58 -0.58 0.42
We can verify the calculation using the math.log(x,2) function for a couple of locations in the matrix.
>>> import math >>> math.log(ppm["A",0]/0.25, 2) 2.0 >>> math.log(ppm["A",1]/0.25, 2) 1.4150374992788437
In order to avoid -inf in the PSSM, we can also add a set of pseudocounts into the PPM.
>>> pseudocounts = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25}
>>> ppm = m.counts.normalize(pseudocounts)
>>> background = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25}
>>> pssm = ppm.log_odds(background)
>>> print(pssm)
0 1 2 3 4 5
A: 1.84 1.28 -2.81 1.58 1.58 1.28
C: -2.81 -2.81 0.36 -2.81 -2.81 -2.81
G: -2.81 -2.81 0.89 -2.81 -2.81 -2.81
T: -2.81 0.36 -0.49 -0.49 -0.49 0.36
⇒ Sequence Score against PSSM with Bio.motifs
⇐ Motif PCM, PFM, PPM, PWM with Bio.motifs
2023-07-01, 775🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1004311 Names: InChIKey: LQTDROCMLDYTSO-UHFFFAOYS A-NSMILES: O=c2c(OCCO)c(...
Molecule Summary: ID: FYI-1009038 Names: InChIKey: QMKZZQPPJRWDED-UHFFFAOYS A-NSMILES: Nc1cc(C(=O)O)...
Molecule Summary: ID: FYI-1005098 Names: InChIKey: WSQZNZLOZXSBHA-UHFFFAOYS A-NSMILES: O=C1OCCCCOC(=...
Molecule Summary: ID: FYI-1000266 SMILES: N[17C@@](F)([18C])C(=O)O Received at FYIcenter.com on: 202...
Molecule Summary: ID: FYI-1000186 SMILES: CCC/C=C/C=O Received at FYIcenter.com on: 2020-11-09