tfcomb.annotation module
- tfcomb.annotation.annotate_regions(regions, gtf, config=None, best=True, threads=1, verbosity=1)[source]
Annotate regions with genes from .gtf using UROPA [1].
- Parameters:
regions (tobias.utils.regions.RegionList() or pandas.DataFrame) – A RegionList object with positions of genomic elements e.g. TFBS or a DataFrame containing chr/start/stop-coordinates. If DataFrame, the function assumes that the order of columns is: ‘chromosome’, ‘start’, ‘end’, ‘id’, ‘score’, ‘strand’.
gtf (str) – Path to .gtf file containing genomic elements for annotation.
config (dict, optional) – A dictionary indicating how regions should be annotated. Default is to annotate feature ‘gene’ within -10000;1000bp of the gene start. See ‘Examples’ of how to set up a custom configuration dictionary.
best (boolean) – Whether to return the best annotation or all valid annotations. Default: True (only best are kept).
threads (int, optional) – Number of threads to use for multiprocessing. Default: 1.
verbosity (int, optional) – Level of verbosity of logger. One of 0,1, 2. Default: 1.
- Returns:
Dataframe including regions and annotation information (if applicable, otherwise a warning will be displayed and None is returned).
- Return type:
pd.DataFrame or None
References
Examples
>>> custom_config = {"queries": [{"distance": [10000, 1000], ... "feature_anchor": "start", ... "feature": "gene"}], ... "priority": True, ... "show_attributes": "all"}
#Annotate regions (data/ refers to the data directory of the tfcomb github repository)
>>> regions = pd.read_csv("data/GM12878_hg38_chr4_ATAC_peaks.bed") >>> annotate_regions(regions, gtf="data/chr4_genes.gtf", config=custom_config)
- tfcomb.annotation.get_annotated_genes(regions, attribute='gene_name')[source]
Get list of genes from the list of annotated regions from annotate_regions().
- Parameters:
regions (RegionList() or list of OneTFBS objects) –
attribute (str) – The name of the attribute in the 9th column of the .gtf file. Default: ‘gene_name’.
- class tfcomb.annotation.GOAnalysis(*args: Any, **kwargs: Any)[source]
Bases:
DataFrame- aspect_translation = {'BP': 'Biological Process', 'CC': 'Cellular Component', 'MF': 'Molecular Function'}
- enrichment(genes, organism='hsapiens', background=None, propagate_counts=True, min_depth=1, verbosity=1)[source]
Perform a GO-term enrichment based on a list of genes. This is a TF-COMB wrapper for goatools.
- Parameters:
gene_ids (list) – A list of gene ids.
organism (
str, optional) – The organism of which the gene_ids originate. Defaults to ‘hsapiens’.background (list, optional) – A specific list of background gene ids to use. Default: The list of protein coding genes of the ‘organism’ given.
propagate_counts (bool) – Whether to propagate counts up the tree to parent GO’s. Default: True.
min_depth (int) – Minimum depth of GO-terms to show in output table. Default: 1.
verbosity (int, optional) – Default: 1.
- Returns:
GOAnalysis object containing enrichment results
Reference
———-
https (//www.nature.com/articles/s41598-018-28948-z)
- plot_bubble(aspect='BP', n_terms=20, threshold=0.05, title=None, save=None)[source]
Plot a bubble-style plot of GO-enrichment results.
- Parameters:
aspect (str) – The aspect for which GO-terms should be shown. Must be one of [“BP”, “MF”, “CC”]. Default: “BP”.
n_terms (int) – Maximum number of terms to show in graph. Default: 20
threshold (float between 0-1) – FDR threshold for significant GO-terms. Default: 0.05.
title (str) – Custom title for the plot. Default: ‘GO-terms for <aspect>’.
save (str, optional) – Save the plot to the file given in ‘save’. Default: None.
- Return type:
ax