Orientation analysis
This is an example of a orientation analysis using .TFBS from motif sites.
Create CombObj and fill it with .TFBS from motif scanning
[1]:
import tfcomb
C = tfcomb.CombObj(verbosity=0)
[2]:
C.TFBS_from_motifs(regions="../data/GM12878_hg38_chr4_ATAC_peaks.bed",
motifs="../data/HOCOMOCOv11_HUMAN_motifs.txt",
genome="../data/hg38_chr4_masked.fa.gz",
threads=8)
For this analysis, we will run count_within() with the stranded option turned on:
[3]:
C.count_within(stranded=True, threads=8)
C.market_basket()
[4]:
C.rules
[4]:
TF1 | TF2 | TF1_TF2_count | TF1_count | TF2_count | cosine | zscore | |
---|---|---|---|---|---|---|---|
KLF1(-)-KLF9(-) | KLF1(-) | KLF9(-) | 145 | 291 | 209 | 0.587961 | 66.330168 |
KLF9(-)-KLF1(-) | KLF9(-) | KLF1(-) | 145 | 209 | 291 | 0.587961 | 66.330168 |
KLF1(-)-KLF12(-) | KLF1(-) | KLF12(-) | 152 | 291 | 240 | 0.575164 | 61.565308 |
KLF12(-)-KLF1(-) | KLF12(-) | KLF1(-) | 152 | 240 | 291 | 0.575164 | 61.565308 |
KLF12(-)-KLF9(-) | KLF12(-) | KLF9(-) | 127 | 240 | 209 | 0.567055 | 62.215597 |
... | ... | ... | ... | ... | ... | ... | ... |
STAT2(+)-SP1(+) | STAT2(+) | SP1(+) | 1 | 556 | 636 | 0.001682 | -5.135968 |
NFE2L1(+)-SP2(+) | NFE2L1(+) | SP2(+) | 1 | 451 | 798 | 0.001667 | -4.294191 |
SP2(+)-NFE2L1(+) | SP2(+) | NFE2L1(+) | 1 | 798 | 451 | 0.001667 | -4.294191 |
BCL11A(+)-SP1(-) | BCL11A(+) | SP1(-) | 1 | 562 | 653 | 0.001651 | -4.399691 |
SP1(-)-BCL11A(+) | SP1(-) | BCL11A(+) | 1 | 653 | 562 | 0.001651 | -4.399691 |
226676 rows × 7 columns
Analyze preferential orientation of motifs
First, we create a directionality analysis for the rules found:
[5]:
df = C.analyze_orientation()
INFO: Rules are symmetric - scenarios counted are: ['same', 'opposite']
[6]:
df.head()
[6]:
TF1 | TF2 | TF1_TF2_count | same | opposite | std | pvalue | |
---|---|---|---|---|---|---|---|
SP3-SP4 | SP3 | SP4 | 534 | 0.758427 | 0.241573 | 0.365471 | 7.004689e-33 |
PATZ1-SP1 | PATZ1 | SP1 | 631 | 0.730586 | 0.269414 | 0.326098 | 4.936837e-31 |
PATZ1-SP3 | PATZ1 | SP3 | 642 | 0.725857 | 0.274143 | 0.319410 | 2.479952e-30 |
SP1-SP3 | SP1 | SP3 | 756 | 0.705026 | 0.294974 | 0.289951 | 1.751909e-29 |
KLF1-KLF9 | KLF1 | KLF9 | 243 | 0.851852 | 0.148148 | 0.497594 | 5.347598e-28 |
We can subset these on pvalue and number of sites:
[7]:
selected = df[(df["pvalue"] < 0.01) & (df["TF1_TF2_count"] > 50)]
[8]:
#Number of TF pairs with significant differences in orientation
selected.shape[0]
[8]:
476
We can also use the .loc-operator of the pandas dataframe to show the results of a subset of TF1-TF2-pairs:
[9]:
df.loc[["EGR1-IRF4", "SP1-TAF1"]]
[9]:
TF1 | TF2 | TF1_TF2_count | same | opposite | std | pvalue | |
---|---|---|---|---|---|---|---|
EGR1-IRF4 | EGR1 | IRF4 | 15 | 0.866667 | 0.133333 | 0.518545 | 0.004509 |
SP1-TAF1 | SP1 | TAF1 | 153 | 0.679739 | 0.320261 | 0.254189 | 0.000009 |
Visualization of orientation preference
[10]:
_ = selected.plot_heatmap()
We can select the subsets by investigating the selected pairs:
[11]:
selected.sort_values("same").head(5)
[11]:
TF1 | TF2 | TF1_TF2_count | same | opposite | std | pvalue | |
---|---|---|---|---|---|---|---|
KLF3-ZIC3 | KLF3 | ZIC3 | 66 | 0.166667 | 0.833333 | 0.471405 | 6.093838e-08 |
SP1-ZFX | SP1 | ZFX | 81 | 0.222222 | 0.777778 | 0.392837 | 5.733031e-07 |
PATZ1-ZFX | PATZ1 | ZFX | 74 | 0.229730 | 0.770270 | 0.382220 | 3.320871e-06 |
SP4-ZFX | SP4 | ZFX | 52 | 0.250000 | 0.750000 | 0.353553 | 3.114910e-04 |
ASCL1-WT1 | ASCL1 | WT1 | 61 | 0.262295 | 0.737705 | 0.336166 | 2.047606e-04 |
[12]:
selected.sort_values("opposite").head(5)
[12]:
TF1 | TF2 | TF1_TF2_count | same | opposite | std | pvalue | |
---|---|---|---|---|---|---|---|
KLF4-KLF5 | KLF4 | KLF5 | 63 | 0.920635 | 0.079365 | 0.594868 | 2.432643e-11 |
ETV4-KLF3 | ETV4 | KLF3 | 57 | 0.912281 | 0.087719 | 0.583053 | 4.806291e-10 |
KLF9-KLF9 | KLF9 | KLF9 | 123 | 0.886179 | 0.113821 | 0.546139 | 1.072707e-17 |
KLF4-MAZ | KLF4 | MAZ | 70 | 0.871429 | 0.128571 | 0.525279 | 5.126299e-10 |
KLF4-KLF9 | KLF4 | KLF9 | 74 | 0.864865 | 0.135135 | 0.515997 | 3.443424e-10 |
Extended analysis with directional=True
The first analysis presented does not take into account the relative order of TF1-TF2, e.g. if the orientation “same” represents “TF1-TF2” or
[13]:
C.count_within(directional=True, stranded=True, threads=8)
C.market_basket()
[14]:
df = C.analyze_orientation()
INFO: Rules are directional - scenarios counted are: ['TF1-TF2', 'TF2-TF1', 'convergent', 'divergent']
[15]:
df.head()
[15]:
TF1 | TF2 | TF1_TF2_count | TF1-TF2 | TF2-TF1 | convergent | divergent | std | pvalue | |
---|---|---|---|---|---|---|---|---|---|
SP2-SP2 | SP2 | SP2 | 1077 | 0.395543 | 0.395543 | 0.102136 | 0.106778 | 0.168069 | 8.140464e-79 |
SP1-SP1 | SP1 | SP1 | 687 | 0.417758 | 0.417758 | 0.075691 | 0.088792 | 0.193784 | 8.390630e-67 |
SP3-SP3 | SP3 | SP3 | 718 | 0.412256 | 0.412256 | 0.094708 | 0.080780 | 0.187444 | 2.559523e-65 |
PATZ1-PATZ1 | PATZ1 | PATZ1 | 547 | 0.422303 | 0.422303 | 0.078611 | 0.076782 | 0.198960 | 4.875297e-56 |
SP4-SP4 | SP4 | SP4 | 371 | 0.444744 | 0.444744 | 0.070081 | 0.040431 | 0.225196 | 1.132384e-48 |
Similarly to the first analysis, we can select the significant pairs and visualize the preferences for orientation:
[16]:
selected = df[(df["pvalue"] < 0.05) & (df["TF1_TF2_count"] > 50)]
[17]:
_ = selected.plot_heatmap()
In-depth look at preferential orientation
By sorting the selected co-occurring TF pairs, it is also possible to visualize the top pairs within each scenario as seen below.
TFs specific in TF1-TF2 orientation
[18]:
selected.sort_values("TF1-TF2", ascending=False).head()
[18]:
TF1 | TF2 | TF1_TF2_count | TF1-TF2 | TF2-TF1 | convergent | divergent | std | pvalue | |
---|---|---|---|---|---|---|---|---|---|
KLF9-ZNF341 | KLF9 | ZNF341 | 80 | 0.500000 | 0.337500 | 0.100000 | 0.062500 | 0.206408 | 6.866468e-09 |
KLF1-KLF4 | KLF1 | KLF4 | 97 | 0.494845 | 0.360825 | 0.041237 | 0.103093 | 0.214005 | 1.575098e-11 |
KLF4-KLF4 | KLF4 | KLF4 | 56 | 0.482143 | 0.482143 | 0.017857 | 0.017857 | 0.268055 | 1.851271e-10 |
BCL11A-SPIB | BCL11A | SPIB | 56 | 0.482143 | 0.321429 | 0.107143 | 0.089286 | 0.187287 | 3.069277e-05 |
KLF5-ZNF341 | KLF5 | ZNF341 | 61 | 0.475410 | 0.278689 | 0.114754 | 0.131148 | 0.167382 | 1.331723e-04 |
TFs specific in TF2-TF2 orientation
[19]:
selected.sort_values("TF2-TF1", ascending=False).head()
[19]:
TF1 | TF2 | TF1_TF2_count | TF1-TF2 | TF2-TF1 | convergent | divergent | std | pvalue | |
---|---|---|---|---|---|---|---|---|---|
KLF4-MAZ | KLF4 | MAZ | 70 | 0.328571 | 0.542857 | 0.042857 | 0.085714 | 0.232262 | 7.933608e-10 |
EGR1-KLF9 | EGR1 | KLF9 | 72 | 0.277778 | 0.541667 | 0.138889 | 0.041667 | 0.217248 | 7.288757e-09 |
KLF4-KLF5 | KLF4 | KLF5 | 63 | 0.380952 | 0.539683 | 0.000000 | 0.079365 | 0.253430 | 1.621951e-10 |
ETV4-KLF3 | ETV4 | KLF3 | 57 | 0.403509 | 0.508772 | 0.017544 | 0.070175 | 0.242831 | 9.055097e-09 |
E2F6-SP4 | E2F6 | SP4 | 80 | 0.275000 | 0.500000 | 0.137500 | 0.087500 | 0.184560 | 3.725792e-07 |
TFs specific in convergent orientation
[20]:
selected.sort_values("convergent", ascending=False).head()
[20]:
TF1 | TF2 | TF1_TF2_count | TF1-TF2 | TF2-TF1 | convergent | divergent | std | pvalue | |
---|---|---|---|---|---|---|---|---|---|
ASCL1-SP3 | ASCL1 | SP3 | 55 | 0.127273 | 0.181818 | 0.454545 | 0.236364 | 0.143452 | 0.003533 |
MAZ-ZFX | MAZ | ZFX | 54 | 0.203704 | 0.111111 | 0.444444 | 0.240741 | 0.140627 | 0.005055 |
SP1-ZFX | SP1 | ZFX | 81 | 0.111111 | 0.111111 | 0.419753 | 0.358025 | 0.162343 | 0.000011 |
AR-IRF1 | AR | IRF1 | 51 | 0.274510 | 0.098039 | 0.411765 | 0.215686 | 0.130433 | 0.015372 |
ASCL1-WT1 | ASCL1 | WT1 | 61 | 0.147541 | 0.114754 | 0.409836 | 0.327869 | 0.141892 | 0.002055 |
TFs specific in divergent orientation
[21]:
selected.sort_values("divergent", ascending=False).head()
[21]:
TF1 | TF2 | TF1_TF2_count | TF1-TF2 | TF2-TF1 | convergent | divergent | std | pvalue | |
---|---|---|---|---|---|---|---|---|---|
STAT2-ZFP28 | STAT2 | ZFP28 | 55 | 0.127273 | 0.254545 | 0.181818 | 0.436364 | 0.134738 | 7.445704e-03 |
KLF3-ZIC3 | KLF3 | ZIC3 | 66 | 0.030303 | 0.136364 | 0.409091 | 0.424242 | 0.197358 | 9.148367e-07 |
IRF3-ZFP28 | IRF3 | ZFP28 | 58 | 0.206897 | 0.155172 | 0.224138 | 0.413793 | 0.113059 | 3.069838e-02 |
NFE2L1-STAT2 | NFE2L1 | STAT2 | 68 | 0.176471 | 0.220588 | 0.205882 | 0.397059 | 0.099740 | 4.364185e-02 |
SP4-ZFX | SP4 | ZFX | 52 | 0.192308 | 0.057692 | 0.365385 | 0.384615 | 0.154645 | 1.883579e-03 |