Collections:
Motif ICM with Bio.motifs
How to Calculate Motif ICM with Bio.motifs Module?
✍: FYIcenter.com
ICM (Information Content Matrices)
represents how important of each position over others.
ICM can be expressed as:
ICM[i,j] = PPM[i,j]*(ICt - U[j]) where: PPM[i,j] is the Position Probability Matrix ICt is the total IC: log2(n) n is the number of letters U[j] is the uncertainty per position: - sum_over_i(PPM[i,j]*log2(PPM[i,j]))
To calculate motif ICM, we can get the PPM first using Biopython.
fyicenter$ python >>> from Bio import motifs >>> samples = [ ... "AAGAAT", ... "ATCATA", ... "AAGTAA", ... "AACAAA", ... "ATTAAA", ... "AAGAAT" ... ] >>> m = motifs.create(samples) >>> ppm = m.counts.normalize() >>> print(ppm) 0 1 2 3 4 5 A: 1.00 0.67 0.00 0.83 0.83 0.67 C: 0.00 0.00 0.33 0.00 0.00 0.00 G: 0.00 0.00 0.50 0.00 0.00 0.00 T: 0.00 0.33 0.17 0.17 0.17 0.33
Then we can calculate the ICM using "numpy" and "math" libraries.
>>> import numpy >>> import math >>> n = len(ppm) >>> n 4 >>> ic_t = math.log(n, 2) >>> ic_t 2.0 >>> ppm_a = numpy.array([ppm["A"], ppm["C"], ppm["G"], ppm["T"]]) >>> print(ppm_a) [[1. 0.66666667 0. 0.83333333 0.83333333 0.66666667] [0. 0. 0.33333333 0. 0. 0. ] [0. 0. 0.5 0. 0. 0. ] [0. 0.33333333 0.16666667 0.16666667 0.16666667 0.33333333]] >>> log2_ppm_a = numpy.log2(ppm_a) >>> print(log2_ppm_a) [[ 0. -0.5849625 -inf -0.26303441 -0.26303441 -0.5849625 ] [ -inf -inf -1.5849625 -inf -inf -inf] [ -inf -inf -1. -inf -inf -inf] [ -inf -1.5849625 -2.5849625 -2.5849625 -2.5849625 -1.5849625 ]] >>> ppm_log2_ppm_a = ppm_a * log2_ppm_a >>> print(ppm_log2_ppm_a) [[ 0. -0.389975 nan -0.21919534 -0.21919534 -0.389975 ] [ nan nan -0.52832083 nan nan nan] [ nan nan -0.5 nan nan nan] [ nan -0.52832083 -0.43082708 -0.43082708 -0.43082708 -0.52832083]] >>> ppm_log2_ppm_a = numpy.nan_to_num(ppm_log2_ppm_a) >>> print(ppm_log2_ppm_a) [[ 0. -0.389975 0. -0.21919534 -0.21919534 -0.389975 ] [ 0. 0. -0.52832083 0. 0. 0. ] [ 0. 0. -0.5 0. 0. 0. ] [ 0. -0.52832083 -0.43082708 -0.43082708 -0.43082708 -0.52832083]] >>> u_a = - numpy.sum(ppm_log2_ppm_a, axis=0) >>> print(u_a) [-0. 0.91829583 1.45914792 0.65002242 0.65002242 0.91829583] >>> icm = ppm_a * (ic_t - u_a) >>> print(icm) [[2. 0.72113611 0. 1.12498132 1.12498132 0.72113611] [0. 0. 0.18028403 0. 0. 0. ] [0. 0. 0.27042604 0. 0. 0. ] [0. 0.36056806 0.09014201 0.22499626 0.22499626 0.36056806]]
As you can see, the total ICM value of the first position is the highest value of 2, the most important, or the most conserved. The total ICM value of the third position is the lowest value, less important, or less conserved.
⇒ Motif ICM as Relative Divergence with Bio.motifs
⇐ Compare Motifs Using PSSM with Bio.motifs
2023-05-31, 651🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1005667 Names: InChIKey: KDUNHMGENPFXGI-UHFFFAOYS A-NSMILES: CC1=CC(O)=CC(...
Molecule Summary: ID: FYI-1003030 Names: InChIKey: SYIWKZGUXQFEFY-SNAWJCMRS A-NSMILES: C=Cc1nc(S/C=C...
Molecule Summary: ID: FYI-1000264 SMILES: COc1cc2c(CC3C4CCCCC24CCN 3CCc2ccccc2)cc1Received at FYIcen...
Molecule Summary: ID: FYI-1003622 Names: InChIKey: BSAYHBZFNXDOIJ-UHFFFAOYS A-NSMILES: Cc7cc(OCCCc5c...
Molecule Summary: ID: FYI-1003220 Names: InChIKey: JSQCMNXZFPMWES-UHFFFAOYS A-NSMILES: CC7CCC6(C(=O)...