Collections:
Read Motif in JASPAR Format with Bio.motifs
How to Read Motif in JASPAR Format with Bio.motifs Module?
✍: FYIcenter.com
The Bio.motifs.read() function allows to read motif files in
several formats including JASPAR.
1. Download motif file in JASPER format by going to https://jaspar.genereg.net/matrix/MA0080.5/ and clicking the "JASPAR" download button. You see MA0080.5.jaspar file saved on your computer.
2. Read the motif file with read() function.
fyicenter$ python
>>> from Bio import motifs
>>> handle = open("MA0080.5.jaspar")
>>> m = motifs.read(handle, "jaspar")
>>> len(m)
20
3. View motif object structure.
>>> type(m)
<class 'Bio.motifs.jaspar.Motif'>
>>> print(m)
TF name SPI1
Matrix ID MA0080.5
Matrix:
0 1 2 3 4 5 6 7 ...
A: 42201.00 48240.00 54154.00 78831.00 81904.00 99739.00 15301.00 113087.00 ...
C: 22587.00 21262.00 20183.00 11424.00 12269.00 2914.00 10958.00 3425.00 ...
G: 38405.00 34277.00 37341.00 25893.00 25580.00 13479.00 100825.00 12544.00 ...
T: 30010.00 29424.00 21525.00 17055.00 13450.00 17071.00 6119.00 4147.00 ...
4. View the consensus and anticonsensus sequences.
>>> m.consensus
Seq('AAAAAAGAGGAAGTGAAAAA')
>>> m.anticonsensus
Seq('CCCCCCTCCCTCTCTTCCCC')
5. Calculate the total number of sequences used by the motif by adding the counts on the first position.
>>> m.counts[:, 0]
{'A': 42201.0, 'C': 22587.0, 'G': 38405.0, 'T': 30010.0}
>>> sum(m.counts[:, 0].values())
133203.0
So 133,203 DNA sequences were used to create this motif. Those sequences are not included in the input file.
>>> type(m.instances) <class 'NoneType'>
⇒ Motif PCM, PFM, PPM, PWM with Bio.motifs
⇐ Motif Counts and Consensus with Bio.motifs
2023-07-05, 1026🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1006886 Names: InChIKey: UYPWFMBFXBCPAC-HGDVOXMMS A-NSMILES: CCCCCCCCCCCCC...
Molecule Summary: ID: FYI-1001893 SMILES: C[C@H]([C@H]1C(=O)N[C@H] (C(=O)NCCCC[C@@H](C(=O)N [C@H](C(=O...
Molecule Summary: ID: FYI-1003080 Names: InChIKey: KXKFKKUVLLNNBK-CABZTGNLS A-NSMILES: NC(=O)[C@H]3C...
Molecule Summary: ID: FYI-1000274 SMILES: N[C@@H](CCCCNOC(=O)[C@@H ](N)CCC(=O)O)C(=O)OReceived at FY...
Molecule Summary: ID: FYI-1003066 Names: InChIKey: YUNOZFUSZKAUSO-JSGCOSHPS A-NSMILES: N#Cc4cc(c2cnc...