tfcomb.annotation module

tfcomb.annotation.annotate_regions(regions, gtf, config=None, best=True, threads=1, verbosity=1)[source]

Annotate regions with genes from .gtf using UROPA [1].

Parameters:
  • regions (tobias.utils.regions.RegionList() or pandas.DataFrame) – A RegionList object with positions of genomic elements e.g. TFBS or a DataFrame containing chr/start/stop-coordinates. If DataFrame, the function assumes that the order of columns is: ‘chromosome’, ‘start’, ‘end’, ‘id’, ‘score’, ‘strand’.

  • gtf (str) – Path to .gtf file containing genomic elements for annotation.

  • config (dict, optional) – A dictionary indicating how regions should be annotated. Default is to annotate feature ‘gene’ within -10000;1000bp of the gene start. See ‘Examples’ of how to set up a custom configuration dictionary.

  • best (boolean) – Whether to return the best annotation or all valid annotations. Default: True (only best are kept).

  • threads (int, optional) – Number of threads to use for multiprocessing. Default: 1.

  • verbosity (int, optional) – Level of verbosity of logger. One of 0,1, 2. Default: 1.

Returns:

Dataframe including regions and annotation information (if applicable, otherwise a warning will be displayed and None is returned).

Return type:

pd.DataFrame or None

References

Examples

>>> custom_config = {"queries": [{"distance": [10000, 1000],
...                                      "feature_anchor": "start",
...                                      "feature": "gene"}],
...                                      "priority": True,
...                                      "show_attributes": "all"}

#Annotate regions (data/ refers to the data directory of the tfcomb github repository)

>>> regions = pd.read_csv("data/GM12878_hg38_chr4_ATAC_peaks.bed")
>>> annotate_regions(regions, gtf="data/chr4_genes.gtf",
                                                  config=custom_config)
tfcomb.annotation.get_annotated_genes(regions, attribute='gene_name')[source]

Get list of genes from the list of annotated regions from annotate_regions().

Parameters:
  • regions (RegionList() or list of OneTFBS objects) –

  • attribute (str) – The name of the attribute in the 9th column of the .gtf file. Default: ‘gene_name’.

class tfcomb.annotation.GOAnalysis(*args: Any, **kwargs: Any)[source]

Bases: DataFrame

aspect_translation = {'BP': 'Biological Process', 'CC': 'Cellular Component', 'MF': 'Molecular Function'}
enrichment(genes, organism='hsapiens', background=None, propagate_counts=True, min_depth=1, verbosity=1)[source]

Perform a GO-term enrichment based on a list of genes. This is a TF-COMB wrapper for goatools.

Parameters:
  • gene_ids (list) – A list of gene ids.

  • organism (str, optional) – The organism of which the gene_ids originate. Defaults to ‘hsapiens’.

  • background (list, optional) – A specific list of background gene ids to use. Default: The list of protein coding genes of the ‘organism’ given.

  • propagate_counts (bool) – Whether to propagate counts up the tree to parent GO’s. Default: True.

  • min_depth (int) – Minimum depth of GO-terms to show in output table. Default: 1.

  • verbosity (int, optional) – Default: 1.

Returns:

  • GOAnalysis object containing enrichment results

  • Reference

  • ———-

  • https (//www.nature.com/articles/s41598-018-28948-z)

plot_bubble(aspect='BP', n_terms=20, threshold=0.05, title=None, save=None)[source]

Plot a bubble-style plot of GO-enrichment results.

Parameters:
  • aspect (str) – The aspect for which GO-terms should be shown. Must be one of [“BP”, “MF”, “CC”]. Default: “BP”.

  • n_terms (int) – Maximum number of terms to show in graph. Default: 20

  • threshold (float between 0-1) – FDR threshold for significant GO-terms. Default: 0.05.

  • title (str) – Custom title for the plot. Default: ‘GO-terms for <aspect>’.

  • save (str, optional) – Save the plot to the file given in ‘save’. Default: None.

Return type:

ax

compare(compare_table)[source]

IN PROGRESS: Plot a comparison of two GO-term analysis

Parameters:

compare_table (GOAnalysis object) –