Genomic locations of TF-TF pairs

In this notebook, we will go over how to get the locations of two TFs co-occurring. We will start by creating a TF-COMB analysis from motif positions:

[1]:

import tfcomb

C = tfcomb.CombObj()
C.TFBS_from_motifs(regions="../data/GM12878_hg38_chr4_ATAC_peaks.bed",
                   motifs="../data/HOCOMOCOv11_HUMAN_motifs.txt",
                   genome="../data/hg38_chr4.fa.gz",
                   threads=4)
C.market_basket()

INFO: Scanning for TFBS with 4 thread(s)...
INFO: Progress: 12%
INFO: Progress: 20%
INFO: Progress: 30%
INFO: Progress: 40%
INFO: Progress: 50%
INFO: Progress: 60%
INFO: Progress: 70%
INFO: Progress: 80%
INFO: Progress: 91%
INFO: Finished!
INFO: Processing scanned TFBS
INFO: Identified 165810 TFBS (401 unique names) within given regions
Internal counts for 'TF_counts' were not set. Please run .count_within() to obtain TF-TF co-occurrence counts.
WARNING: No counts found in <CombObj>. Running <CombObj>.count_within() with standard parameters.
INFO: Setting up binding sites for counting
INFO: Counting co-occurrences within sites
INFO: Counting co-occurrence within background
INFO: Running with multiprocessing threads == 1. To change this, give 'threads' in the parameter of the function.
INFO: Progress: 10%
INFO: Progress: 20%
INFO: Progress: 30%
INFO: Progress: 40%
INFO: Progress: 50%
INFO: Progress: 60%
INFO: Progress: 70%
INFO: Progress: 80%
INFO: Progress: 90%
INFO: Done finding co-occurrences! Run .market_basket() to estimate significant pairs
INFO: Market basket analysis is done! Results are found in <CombObj>.rules

[2]:

C.rules.head()

[2]:

	TF1	TF2	TF1_TF2_count	TF1_count	TF2_count	cosine	zscore
POU3F2-SMARCA5	POU3F2	SMARCA5	239	302	241	0.885902	129.586528
SMARCA5-POU3F2	SMARCA5	POU3F2	239	241	302	0.885902	129.586528
POU2F1-SMARCA5	POU2F1	SMARCA5	263	426	241	0.820810	135.355691
SMARCA5-POU2F1	SMARCA5	POU2F1	263	241	426	0.820810	135.355691
SMARCA5-ZNF582	SMARCA5	ZNF582	172	241	195	0.793419	117.370387

Getting locations for a selected TF-TF pair

We choose the highest ranking TF pair from the .rules:

[3]:

TF1, TF2 = C.rules.iloc[0, [0,1]]
TF1, TF2

[3]:

('POU3F2', 'SMARCA5')

We can now apply get_pair_locations() to get the locations of the TF-TF pairs

[4]:

pairs = C.get_pair_locations((TF1, TF2))

[5]:

pairs[:10]

[5]:

TFBSPairList([<TFBSPair | TFBS1: (chr4,49092715,49092730,SMARCA5,13.72436,-) | TFBS2: (chr4,49092743,49092775,POU3F2,11.25323,+) | distance: 13 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092715,49092730,SMARCA5,13.72436,-) | TFBS2: (chr4,49092788,49092865,POU3F2,11.46462,+) | distance: 58 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092743,49092775,POU3F2,11.25323,+) | TFBS2: (chr4,49092785,49092880,SMARCA5,15.00425,-) | distance: 10 | orientation: convergent >, <TFBSPair | TFBS1: (chr4,49092745,49092780,SMARCA5,11.81125,-) | TFBS2: (chr4,49092788,49092865,POU3F2,11.46462,+) | distance: 8 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092785,49092880,SMARCA5,15.00425,-) | TFBS2: (chr4,49092893,49092930,POU3F2,11.46462,+) | distance: 13 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49092788,49092865,POU3F2,11.46462,+) | TFBS2: (chr4,49092885,49092930,SMARCA5,12.69622,-) | distance: 20 | orientation: convergent >, <TFBSPair | TFBS1: (chr4,49096721,49096746,SMARCA5,12.97124,-) | TFBS2: (chr4,49096779,49096836,POU3F2,11.46462,+) | distance: 33 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49096721,49096746,SMARCA5,12.97124,-) | TFBS2: (chr4,49096839,49096896,POU3F2,11.63189,+) | distance: 93 | orientation: divergent >, <TFBSPair | TFBS1: (chr4,49096744,49096761,POU3F2,10.58175,+) | TFBS2: (chr4,49096771,49096836,SMARCA5,13.02056,-) | distance: 10 | orientation: convergent >, <TFBSPair | TFBS1: (chr4,49096744,49096761,POU3F2,10.58175,+) | TFBS2: (chr4,49096841,49096916,SMARCA5,13.70506,-) | distance: 80 | orientation: convergent >])

We can write these locations to a file:

[6]:

pairs.write_bed("TFBS_pair_positions.bed", fmt="bed")

#show the content of file
import pandas as pd
pd.read_csv("TFBS_pair_positions.bed", sep="\t", header=None, nrows=5)

[6]:

	0	1	2	3	4	5
0	chr4	49092715	49092730	SMARCA5	.	-
1	chr4	49092743	49092775	POU3F2	.	+
2	chr4	49092715	49092730	SMARCA5	.	-
3	chr4	49092788	49092865	POU3F2	.	+
4	chr4	49092743	49092775	POU3F2	.	+

The pairs can also be written out as ‘bedpe’ format, which contains the positions of both sites:

[7]:

pairs.write_bed("TFBS_pair_positions.bedpe", fmt="bedpe")

pd.read_csv("TFBS_pair_positions.bedpe", sep="\t", header=None, nrows=5)

[7]:

	0	1	2	3	4	5	6	7	8	9
0	chr4	49092715	49092730	chr4	49092743	49092775	SMARCA5-POU3F2	13	-	+
1	chr4	49092715	49092730	chr4	49092788	49092865	SMARCA5-POU3F2	58	-	+
2	chr4	49092743	49092775	chr4	49092785	49092880	POU3F2-SMARCA5	10	+	-
3	chr4	49092745	49092780	chr4	49092788	49092865	SMARCA5-POU3F2	8	-	+
4	chr4	49092785	49092880	chr4	49092893	49092930	SMARCA5-POU3F2	13	-	+