Genomic locations of TF-TF pairs
In this notebook, we will go over how to get the locations of two TFs co-occurring. We will start by creating a TF-COMB analysis from motif positions:
[1]:
import tfcomb
C = tfcomb.CombObj()
C.TFBS_from_motifs(regions="../data/GM12878_hg38_chr4_ATAC_peaks.bed",
motifs="../data/HOCOMOCOv11_HUMAN_motifs.txt",
genome="../data/hg38_chr4.fa.gz",
threads=4)
C.market_basket()
INFO: Scanning for TFBS with 4 thread(s)...
INFO: Progress: 12%
INFO: Progress: 20%
INFO: Progress: 30%
INFO: Progress: 40%
INFO: Progress: 50%
INFO: Progress: 60%
INFO: Progress: 70%
INFO: Progress: 80%
INFO: Progress: 91%
INFO: Finished!
INFO: Processing scanned TFBS
INFO: Identified 165810 TFBS (401 unique names) within given regions
Internal counts for 'TF_counts' were not set. Please run .count_within() to obtain TF-TF co-occurrence counts.
WARNING: No counts found in <CombObj>. Running <CombObj>.count_within() with standard parameters.
INFO: Setting up binding sites for counting
INFO: Counting co-occurrences within sites
INFO: Counting co-occurrence within background
INFO: Running with multiprocessing threads == 1. To change this, give 'threads' in the parameter of the function.
INFO: Progress: 10%
INFO: Progress: 20%
INFO: Progress: 30%
INFO: Progress: 40%
INFO: Progress: 50%
INFO: Progress: 60%
INFO: Progress: 70%
INFO: Progress: 80%
INFO: Progress: 90%
INFO: Done finding co-occurrences! Run .market_basket() to estimate significant pairs
INFO: Market basket analysis is done! Results are found in <CombObj>.rules
[2]:
C.rules.head()
[2]:
TF1 | TF2 | TF1_TF2_count | TF1_count | TF2_count | cosine | zscore | |
---|---|---|---|---|---|---|---|
POU3F2-SMARCA5 | POU3F2 | SMARCA5 | 239 | 302 | 241 | 0.885902 | 129.586528 |
SMARCA5-POU3F2 | SMARCA5 | POU3F2 | 239 | 241 | 302 | 0.885902 | 129.586528 |
POU2F1-SMARCA5 | POU2F1 | SMARCA5 | 263 | 426 | 241 | 0.820810 | 135.355691 |
SMARCA5-POU2F1 | SMARCA5 | POU2F1 | 263 | 241 | 426 | 0.820810 | 135.355691 |
SMARCA5-ZNF582 | SMARCA5 | ZNF582 | 172 | 241 | 195 | 0.793419 | 117.370387 |
Getting locations for a selected TF-TF pair
We choose the highest ranking TF pair from the .rules:
[3]:
TF1, TF2 = C.rules.iloc[0, [0,1]]
TF1, TF2
[3]:
('POU3F2', 'SMARCA5')
We can now apply get_pair_locations() to get the locations of the TF-TF pairs
[4]:
pairs = C.get_pair_locations((TF1, TF2))
[5]:
pairs[:10]
[5]:
TFBSPairList([<TFBSPair | TFBS1: (chr4,49092715,49092730,SMARCA5,13.72436,-) | TFBS2: (chr4,49092743,49092775,POU3F2,11.25323,+) | distance: 13 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092715,49092730,SMARCA5,13.72436,-) | TFBS2: (chr4,49092788,49092865,POU3F2,11.46462,+) | distance: 58 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092743,49092775,POU3F2,11.25323,+) | TFBS2: (chr4,49092785,49092880,SMARCA5,15.00425,-) | distance: 10 | orientation: convergent >, <TFBSPair | TFBS1: (chr4,49092745,49092780,SMARCA5,11.81125,-) | TFBS2: (chr4,49092788,49092865,POU3F2,11.46462,+) | distance: 8 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092785,49092880,SMARCA5,15.00425,-) | TFBS2: (chr4,49092893,49092930,POU3F2,11.46462,+) | distance: 13 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092788,49092865,POU3F2,11.46462,+) | TFBS2: (chr4,49092885,49092930,SMARCA5,12.69622,-) | distance: 20 | orientation: convergent >, <TFBSPair | TFBS1: (chr4,49096721,49096746,SMARCA5,12.97124,-) | TFBS2: (chr4,49096779,49096836,POU3F2,11.46462,+) | distance: 33 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49096721,49096746,SMARCA5,12.97124,-) | TFBS2: (chr4,49096839,49096896,POU3F2,11.63189,+) | distance: 93 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49096744,49096761,POU3F2,10.58175,+) | TFBS2: (chr4,49096771,49096836,SMARCA5,13.02056,-) | distance: 10 | orientation: convergent >, <TFBSPair | TFBS1: (chr4,49096744,49096761,POU3F2,10.58175,+) | TFBS2: (chr4,49096841,49096916,SMARCA5,13.70506,-) | distance: 80 | orientation: convergent >])
We can write these locations to a file:
[6]:
pairs.write_bed("TFBS_pair_positions.bed", fmt="bed")
#show the content of file
import pandas as pd
pd.read_csv("TFBS_pair_positions.bed", sep="\t", header=None, nrows=5)
[6]:
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
0 | chr4 | 49092715 | 49092730 | SMARCA5 | . | - |
1 | chr4 | 49092743 | 49092775 | POU3F2 | . | + |
2 | chr4 | 49092715 | 49092730 | SMARCA5 | . | - |
3 | chr4 | 49092788 | 49092865 | POU3F2 | . | + |
4 | chr4 | 49092743 | 49092775 | POU3F2 | . | + |
The pairs can also be written out as ‘bedpe’ format, which contains the positions of both sites:
[7]:
pairs.write_bed("TFBS_pair_positions.bedpe", fmt="bedpe")
pd.read_csv("TFBS_pair_positions.bedpe", sep="\t", header=None, nrows=5)
[7]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | chr4 | 49092715 | 49092730 | chr4 | 49092743 | 49092775 | SMARCA5-POU3F2 | 13 | - | + |
1 | chr4 | 49092715 | 49092730 | chr4 | 49092788 | 49092865 | SMARCA5-POU3F2 | 58 | - | + |
2 | chr4 | 49092743 | 49092775 | chr4 | 49092785 | 49092880 | POU3F2-SMARCA5 | 10 | + | - |
3 | chr4 | 49092745 | 49092780 | chr4 | 49092788 | 49092865 | SMARCA5-POU3F2 | 8 | - | + |
4 | chr4 | 49092785 | 49092880 | chr4 | 49092893 | 49092930 | SMARCA5-POU3F2 | 13 | - | + |