Orientation analysis

This is an example of a orientation analysis using .TFBS from motif sites.

Create CombObj and fill it with .TFBS from motif scanning

[1]:

import tfcomb
C = tfcomb.CombObj(verbosity=0)

[2]:

C.TFBS_from_motifs(regions="../data/GM12878_hg38_chr4_ATAC_peaks.bed",
                   motifs="../data/HOCOMOCOv11_HUMAN_motifs.txt",
                   genome="../data/hg38_chr4_masked.fa.gz",
                   threads=8)

For this analysis, we will run count_within() with the stranded option turned on:

[3]:

C.count_within(stranded=True, threads=8)
C.market_basket()

[4]:

C.rules

[4]:

	TF1	TF2	TF1_TF2_count	TF1_count	TF2_count	cosine	zscore
KLF1(-)-KLF9(-)	KLF1(-)	KLF9(-)	145	291	209	0.587961	66.330168
KLF9(-)-KLF1(-)	KLF9(-)	KLF1(-)	145	209	291	0.587961	66.330168
KLF1(-)-KLF12(-)	KLF1(-)	KLF12(-)	152	291	240	0.575164	61.565308
KLF12(-)-KLF1(-)	KLF12(-)	KLF1(-)	152	240	291	0.575164	61.565308
KLF12(-)-KLF9(-)	KLF12(-)	KLF9(-)	127	240	209	0.567055	62.215597
...	...	...	...	...	...	...	...
STAT2(+)-SP1(+)	STAT2(+)	SP1(+)	1	556	636	0.001682	-5.135968
NFE2L1(+)-SP2(+)	NFE2L1(+)	SP2(+)	1	451	798	0.001667	-4.294191
SP2(+)-NFE2L1(+)	SP2(+)	NFE2L1(+)	1	798	451	0.001667	-4.294191
BCL11A(+)-SP1(-)	BCL11A(+)	SP1(-)	1	562	653	0.001651	-4.399691
SP1(-)-BCL11A(+)	SP1(-)	BCL11A(+)	1	653	562	0.001651	-4.399691

226676 rows × 7 columns

Analyze preferential orientation of motifs

First, we create a directionality analysis for the rules found:

[5]:

df = C.analyze_orientation()

INFO: Rules are symmetric - scenarios counted are: ['same', 'opposite']

[6]:

df.head()

[6]:

	TF1	TF2	TF1_TF2_count	same	opposite	std	pvalue
SP3-SP4	SP3	SP4	534	0.758427	0.241573	0.365471	7.004689e-33
PATZ1-SP1	PATZ1	SP1	631	0.730586	0.269414	0.326098	4.936837e-31
PATZ1-SP3	PATZ1	SP3	642	0.725857	0.274143	0.319410	2.479952e-30
SP1-SP3	SP1	SP3	756	0.705026	0.294974	0.289951	1.751909e-29
KLF1-KLF9	KLF1	KLF9	243	0.851852	0.148148	0.497594	5.347598e-28

We can subset these on pvalue and number of sites:

[7]:

selected = df[(df["pvalue"] < 0.01) & (df["TF1_TF2_count"] > 50)]

[8]:

#Number of TF pairs with significant differences in orientation
selected.shape[0]

[8]:

We can also use the .loc-operator of the pandas dataframe to show the results of a subset of TF1-TF2-pairs:

[9]:

df.loc[["EGR1-IRF4", "SP1-TAF1"]]

[9]:

	TF1	TF2	TF1_TF2_count	same	opposite	std	pvalue
EGR1-IRF4	EGR1	IRF4	15	0.866667	0.133333	0.518545	0.004509
SP1-TAF1	SP1	TAF1	153	0.679739	0.320261	0.254189	0.000009

Visualization of orientation preference

[10]:

_ = selected.plot_heatmap()

../_images/examples_Orientation_analysis_18_0.png

We can select the subsets by investigating the selected pairs:

[11]:

selected.sort_values("same").head(5)

[11]:

	TF1	TF2	TF1_TF2_count	same	opposite	std	pvalue
KLF3-ZIC3	KLF3	ZIC3	66	0.166667	0.833333	0.471405	6.093838e-08
SP1-ZFX	SP1	ZFX	81	0.222222	0.777778	0.392837	5.733031e-07
PATZ1-ZFX	PATZ1	ZFX	74	0.229730	0.770270	0.382220	3.320871e-06
SP4-ZFX	SP4	ZFX	52	0.250000	0.750000	0.353553	3.114910e-04
ASCL1-WT1	ASCL1	WT1	61	0.262295	0.737705	0.336166	2.047606e-04

[12]:

selected.sort_values("opposite").head(5)

[12]:

	TF1	TF2	TF1_TF2_count	same	opposite	std	pvalue
KLF4-KLF5	KLF4	KLF5	63	0.920635	0.079365	0.594868	2.432643e-11
ETV4-KLF3	ETV4	KLF3	57	0.912281	0.087719	0.583053	4.806291e-10
KLF9-KLF9	KLF9	KLF9	123	0.886179	0.113821	0.546139	1.072707e-17
KLF4-MAZ	KLF4	MAZ	70	0.871429	0.128571	0.525279	5.126299e-10
KLF4-KLF9	KLF4	KLF9	74	0.864865	0.135135	0.515997	3.443424e-10

Extended analysis with directional=True

The first analysis presented does not take into account the relative order of TF1-TF2, e.g. if the orientation “same” represents “TF1-TF2” or

[13]:

C.count_within(directional=True, stranded=True, threads=8)
C.market_basket()

[14]:

df = C.analyze_orientation()

INFO: Rules are directional - scenarios counted are: ['TF1-TF2', 'TF2-TF1', 'convergent', 'divergent']

[15]:

df.head()

[15]:

	TF1	TF2	TF1_TF2_count	TF1-TF2	TF2-TF1	convergent	divergent	std	pvalue
SP2-SP2	SP2	SP2	1077	0.395543	0.395543	0.102136	0.106778	0.168069	8.140464e-79
SP1-SP1	SP1	SP1	687	0.417758	0.417758	0.075691	0.088792	0.193784	8.390630e-67
SP3-SP3	SP3	SP3	718	0.412256	0.412256	0.094708	0.080780	0.187444	2.559523e-65
PATZ1-PATZ1	PATZ1	PATZ1	547	0.422303	0.422303	0.078611	0.076782	0.198960	4.875297e-56
SP4-SP4	SP4	SP4	371	0.444744	0.444744	0.070081	0.040431	0.225196	1.132384e-48

Similarly to the first analysis, we can select the significant pairs and visualize the preferences for orientation:

[16]:

selected = df[(df["pvalue"] < 0.05) & (df["TF1_TF2_count"] > 50)]

[17]:

_ = selected.plot_heatmap()

../_images/examples_Orientation_analysis_30_0.png

In-depth look at preferential orientation

By sorting the selected co-occurring TF pairs, it is also possible to visualize the top pairs within each scenario as seen below.

TFs specific in TF1-TF2 orientation

[18]:

selected.sort_values("TF1-TF2", ascending=False).head()

[18]:

	TF1	TF2	TF1_TF2_count	TF1-TF2	TF2-TF1	convergent	divergent	std	pvalue
KLF9-ZNF341	KLF9	ZNF341	80	0.500000	0.337500	0.100000	0.062500	0.206408	6.866468e-09
KLF1-KLF4	KLF1	KLF4	97	0.494845	0.360825	0.041237	0.103093	0.214005	1.575098e-11
KLF4-KLF4	KLF4	KLF4	56	0.482143	0.482143	0.017857	0.017857	0.268055	1.851271e-10
BCL11A-SPIB	BCL11A	SPIB	56	0.482143	0.321429	0.107143	0.089286	0.187287	3.069277e-05
KLF5-ZNF341	KLF5	ZNF341	61	0.475410	0.278689	0.114754	0.131148	0.167382	1.331723e-04

TFs specific in TF2-TF2 orientation

[19]:

selected.sort_values("TF2-TF1", ascending=False).head()

[19]:

	TF1	TF2	TF1_TF2_count	TF1-TF2	TF2-TF1	convergent	divergent	std	pvalue
KLF4-MAZ	KLF4	MAZ	70	0.328571	0.542857	0.042857	0.085714	0.232262	7.933608e-10
EGR1-KLF9	EGR1	KLF9	72	0.277778	0.541667	0.138889	0.041667	0.217248	7.288757e-09
KLF4-KLF5	KLF4	KLF5	63	0.380952	0.539683	0.000000	0.079365	0.253430	1.621951e-10
ETV4-KLF3	ETV4	KLF3	57	0.403509	0.508772	0.017544	0.070175	0.242831	9.055097e-09
E2F6-SP4	E2F6	SP4	80	0.275000	0.500000	0.137500	0.087500	0.184560	3.725792e-07

TFs specific in convergent orientation

[20]:

selected.sort_values("convergent", ascending=False).head()

[20]:

	TF1	TF2	TF1_TF2_count	TF1-TF2	TF2-TF1	convergent	divergent	std	pvalue
ASCL1-SP3	ASCL1	SP3	55	0.127273	0.181818	0.454545	0.236364	0.143452	0.003533
MAZ-ZFX	MAZ	ZFX	54	0.203704	0.111111	0.444444	0.240741	0.140627	0.005055
SP1-ZFX	SP1	ZFX	81	0.111111	0.111111	0.419753	0.358025	0.162343	0.000011
AR-IRF1	AR	IRF1	51	0.274510	0.098039	0.411765	0.215686	0.130433	0.015372
ASCL1-WT1	ASCL1	WT1	61	0.147541	0.114754	0.409836	0.327869	0.141892	0.002055

TFs specific in divergent orientation

[21]:

selected.sort_values("divergent", ascending=False).head()

[21]:

	TF1	TF2	TF1_TF2_count	TF1-TF2	TF2-TF1	convergent	divergent	std	pvalue
STAT2-ZFP28	STAT2	ZFP28	55	0.127273	0.254545	0.181818	0.436364	0.134738	7.445704e-03
KLF3-ZIC3	KLF3	ZIC3	66	0.030303	0.136364	0.409091	0.424242	0.197358	9.148367e-07
IRF3-ZFP28	IRF3	ZFP28	58	0.206897	0.155172	0.224138	0.413793	0.113059	3.069838e-02
NFE2L1-STAT2	NFE2L1	STAT2	68	0.176471	0.220588	0.205882	0.397059	0.099740	4.364185e-02
SP4-ZFX	SP4	ZFX	52	0.192308	0.057692	0.365385	0.384615	0.154645	1.883579e-03