Genomic locations of TF-TF pairs

In this notebook, we will go over how to get the locations of two TFs co-occurring. We will start by creating a TF-COMB analysis from motif positions:

[1]:
import tfcomb

C = tfcomb.CombObj()
C.TFBS_from_motifs(regions="../data/GM12878_hg38_chr4_ATAC_peaks.bed",
                   motifs="../data/HOCOMOCOv11_HUMAN_motifs.txt",
                   genome="../data/hg38_chr4.fa.gz",
                   threads=4)
C.market_basket()
INFO: Scanning for TFBS with 4 thread(s)...
INFO: Progress: 12%
INFO: Progress: 20%
INFO: Progress: 30%
INFO: Progress: 40%
INFO: Progress: 50%
INFO: Progress: 60%
INFO: Progress: 70%
INFO: Progress: 80%
INFO: Progress: 91%
INFO: Finished!
INFO: Processing scanned TFBS
INFO: Identified 165810 TFBS (401 unique names) within given regions
Internal counts for 'TF_counts' were not set. Please run .count_within() to obtain TF-TF co-occurrence counts.
WARNING: No counts found in <CombObj>. Running <CombObj>.count_within() with standard parameters.
INFO: Setting up binding sites for counting
INFO: Counting co-occurrences within sites
INFO: Counting co-occurrence within background
INFO: Running with multiprocessing threads == 1. To change this, give 'threads' in the parameter of the function.
INFO: Progress: 10%
INFO: Progress: 20%
INFO: Progress: 30%
INFO: Progress: 40%
INFO: Progress: 50%
INFO: Progress: 60%
INFO: Progress: 70%
INFO: Progress: 80%
INFO: Progress: 90%
INFO: Done finding co-occurrences! Run .market_basket() to estimate significant pairs
INFO: Market basket analysis is done! Results are found in <CombObj>.rules
[2]:
C.rules.head()
[2]:
TF1 TF2 TF1_TF2_count TF1_count TF2_count cosine zscore
POU3F2-SMARCA5 POU3F2 SMARCA5 239 302 241 0.885902 129.586528
SMARCA5-POU3F2 SMARCA5 POU3F2 239 241 302 0.885902 129.586528
POU2F1-SMARCA5 POU2F1 SMARCA5 263 426 241 0.820810 135.355691
SMARCA5-POU2F1 SMARCA5 POU2F1 263 241 426 0.820810 135.355691
SMARCA5-ZNF582 SMARCA5 ZNF582 172 241 195 0.793419 117.370387

Getting locations for a selected TF-TF pair

We choose the highest ranking TF pair from the .rules:

[3]:
TF1, TF2 = C.rules.iloc[0, [0,1]]
TF1, TF2
[3]:
('POU3F2', 'SMARCA5')

We can now apply get_pair_locations() to get the locations of the TF-TF pairs

[4]:
pairs = C.get_pair_locations((TF1, TF2))
[5]:
pairs[:10]
[5]:
TFBSPairList([<TFBSPair | TFBS1: (chr4,49092715,49092730,SMARCA5,13.72436,-) | TFBS2: (chr4,49092743,49092775,POU3F2,11.25323,+) | distance: 13 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092715,49092730,SMARCA5,13.72436,-) | TFBS2: (chr4,49092788,49092865,POU3F2,11.46462,+) | distance: 58 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092743,49092775,POU3F2,11.25323,+) | TFBS2: (chr4,49092785,49092880,SMARCA5,15.00425,-) | distance: 10 | orientation: convergent >, <TFBSPair | TFBS1: (chr4,49092745,49092780,SMARCA5,11.81125,-) | TFBS2: (chr4,49092788,49092865,POU3F2,11.46462,+) | distance: 8 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092785,49092880,SMARCA5,15.00425,-) | TFBS2: (chr4,49092893,49092930,POU3F2,11.46462,+) | distance: 13 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092788,49092865,POU3F2,11.46462,+) | TFBS2: (chr4,49092885,49092930,SMARCA5,12.69622,-) | distance: 20 | orientation: convergent >, <TFBSPair | TFBS1: (chr4,49096721,49096746,SMARCA5,12.97124,-) | TFBS2: (chr4,49096779,49096836,POU3F2,11.46462,+) | distance: 33 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49096721,49096746,SMARCA5,12.97124,-) | TFBS2: (chr4,49096839,49096896,POU3F2,11.63189,+) | distance: 93 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49096744,49096761,POU3F2,10.58175,+) | TFBS2: (chr4,49096771,49096836,SMARCA5,13.02056,-) | distance: 10 | orientation: convergent >, <TFBSPair | TFBS1: (chr4,49096744,49096761,POU3F2,10.58175,+) | TFBS2: (chr4,49096841,49096916,SMARCA5,13.70506,-) | distance: 80 | orientation: convergent >])

We can write these locations to a file:

[6]:
pairs.write_bed("TFBS_pair_positions.bed", fmt="bed")

#show the content of file
import pandas as pd
pd.read_csv("TFBS_pair_positions.bed", sep="\t", header=None, nrows=5)
[6]:
0 1 2 3 4 5
0 chr4 49092715 49092730 SMARCA5 . -
1 chr4 49092743 49092775 POU3F2 . +
2 chr4 49092715 49092730 SMARCA5 . -
3 chr4 49092788 49092865 POU3F2 . +
4 chr4 49092743 49092775 POU3F2 . +

The pairs can also be written out as ‘bedpe’ format, which contains the positions of both sites:

[7]:
pairs.write_bed("TFBS_pair_positions.bedpe", fmt="bedpe")

pd.read_csv("TFBS_pair_positions.bedpe", sep="\t", header=None, nrows=5)
[7]:
0 1 2 3 4 5 6 7 8 9
0 chr4 49092715 49092730 chr4 49092743 49092775 SMARCA5-POU3F2 13 - +
1 chr4 49092715 49092730 chr4 49092788 49092865 SMARCA5-POU3F2 58 - +
2 chr4 49092743 49092775 chr4 49092785 49092880 POU3F2-SMARCA5 10 + -
3 chr4 49092745 49092780 chr4 49092788 49092865 SMARCA5-POU3F2 8 - +
4 chr4 49092785 49092880 chr4 49092893 49092930 SMARCA5-POU3F2 13 - +