tfcomb.objects module

class tfcomb.objects.CombObj(verbosity=1)[source]

Bases: object

The main class for collecting and working with co-occurring TFs.

Examples

>>> C = tfcomb.objects.CombObj()

# Verbosity of the output log can be set using the ‘verbosity’ parameter:

>>> C = tfcomb.objects.CombObj(verbosity=2)
copy()[source]

Returns a deep copy of the CombObj

set_verbosity(level)[source]

Set the verbosity level for logging after creating the CombObj.

Parameters:

level (int) – A value between 0-3 where 0 (only errors), 1 (info), 2 (debug), 3 (spam debug). Default: 1.

set_prefix(prefix)[source]

Sets the .prefix variable of the object. Useful when comparing two objects in a DiffCombObj.

Parameters:

prefix (str) – A string to add as .prefix for this object, e.g. ‘control’, ‘treatment’ or ‘analysis1’.

check_pair(pair)[source]

Checks if a pair is valid and present.

Parameters:

pair (tuple(str,str)) – TF names for which the test should be performed. e.g. (“NFYA”,”NFYB”)

to_pickle(path)[source]

Save the CombObj to a pickle file.

Parameters:

path (str) – Path to the output pickle file e.g. ‘my_combobj.pkl’.

See also

from_pickle

from_pickle(path)[source]

Import a CombObj from a pickle file.

Parameters:

path (str) – Path to an existing pickle file to read.

Raises:

InputError – If read object is not an instance of CombObj.

See also

to_pickle

TFBS_from_motifs(regions, motifs, genome, motif_pvalue=1e-05, motif_naming='name', gc=0.5, resolve_overlapping='merge', extend_bp=0, threads=1, overwrite=False, _suffix='')[source]

Function to calculate TFBS from motifs and genome fasta within the given genomic regions.

Parameters:
  • regions (str or tobias.utils.regions.RegionList) – Path to a .bed-file containing regions or a tobias-format RegionList object.

  • motifs (str or tobias.utils.motifs.MotifList) – Path to a file containing JASPAR/MEME-style motifs or a tobias-format MotifList object.

  • genome (str) – Path to the genome fasta-file to use for scan.

  • motif_pvalue (float, optional) – The pvalue threshold for the motif search. Default: 1e-05.

  • motif_naming (str, optional) – How to name TFs based on input motifs. Must be one of: ‘name’, ‘id’, ‘name_id’ or ‘id_name’. Default: “name”.

  • gc (float between 0-1, optional) – Set GC-content for the motif background model. Default: 0.5.

  • resolve_overlapping (str, optional) – Control how to treat overlapping occurrences of the same TF. Must be one of “merge”, “highest_score” or “off”. If “highest_score”, the highest scoring overlapping site is kept. If “merge”, the sites are merged, keeping the information of the first site. If “off”, overlapping TFBS are kept. Default: “merge”.

  • extend_bp (int, optional) – Extend input regions with ‘extend_bp’ before scanning. Default: 0.

  • threads (int, optional) – How many threads to use for multiprocessing. Default: 1.

  • overwrite (boolean, optional) – Whether to overwrite existing sites within .TFBS. Default: False (sites are appended to .TFBS).

Returns:

.TFBS_from_motifs fills the objects’ .TFBS variable

Return type:

None

TFBS_from_bed(bed_file, overwrite=False)[source]

Fills the .TFBS attribute using a precalculated set of binding sites e.g. from ChIP-seq.

Parameters:
  • bed_file (str) – A path to a .bed-file with precalculated binding sites. The 4th column of the file should contain the name of the TF in question.

  • overwrite (boolean) – Whether to overwrite existing sites within .TFBS. Default: False (sites are appended to .TFBS).

Returns:

The .TFBS variable is filled in place

Return type:

None

TFBS_from_TOBIAS(bindetect_path, condition, overwrite=False)[source]

Fills the .TFBS variable with pre-calculated bound binding sites from TOBIAS BINDetect.

Parameters:
  • bindetect_path (str) – Path to the BINDetect-output folder containing <TF1>, <TF2>, <TF3> (…) folders.

  • condition (str) – Name of condition to use for fetching bound sites.

  • overwrite (boolean) – Whether to overwrite existing sites within .TFBS. Default: False (sites are appended to .TFBS).

Returns:

The .TFBS variable is filled in place

Return type:

None

Raises:

InputError – If no files are found in path or if condition is not one of the avaiable conditions.

cluster_TFBS(threshold=0.5, merge_overlapping=True)[source]

Cluster TFBS based on overlap of individual binding sites. This can be used to pre-process motif-derived TFBS into TF “families” of TFs with similar motifs. This changes the .name attribute of each site within .TFBS to represent the cluster (or the original TF name if no cluster was found).

Parameters:
  • threshold (float from 0-1, optional) – The threshold to set when clustering binding sites. Default: 0.5.

  • merge_overlapping (bool, optional) – Whether to merge overlapping sites following clustering. If True, overlapping sites from the same cluster will be merged to one site (spanning site1-start -> site2-end). If False, the original sites (but with updated names) will be kept in .TFBS. Default: True.

Returns:

The .TFBS names are updated in place.

Return type:

None

subset_TFBS(names=None, regions=None)[source]

Subset .TFBS in object to specific regions or TF names. Can be used to select only a subset of TFBS (e.g. only in promoters) to run analysis on. Note: Either ‘names’ or ‘regions’ must be given - not both.

Parameters:
  • names (list of strings, optional) – A list of names to keep. Default: None.

  • regions (str or RegionList, optional) – Path to a .bed-file containing regions or a tobias-format RegionList object. Default: None.

Returns:

The .TFBS attribute is updated in place.

Return type:

None

TFBS_to_bed(path)[source]

Writes out the .TFBS regions to a .bed-file. This is a wrapper for the tobias.utils.regions.RegionList().write_bed() utility.

Parameters:

path (str) – File path to write .bed-file to.

count_within(min_dist=0, max_dist=100, min_overlap=0, max_overlap=0, stranded=False, directional=False, binarize=False, anchor='inner', n_background=50, threads=1)[source]

Count co-occurrences between TFBS. This function requires .TFBS to be filled by either TFBS_from_motifs, TFBS_from_bed or TFBS_from_tobias. This function can be followed by .market_basket to calculate association rules.

Parameters:
  • min_dist (int) – Minimum distance between two TFBS to be counted as co-occurring. Distances are calculated depending on the ‘anchor’ given. Default: 0.

  • max_dist (int) – Maximum distance between two TFBS to be counted as co-occurring. Distances are calculated depending on the ‘anchor’ given. Default: 100.

  • min_overlap (float between 0-1, optional) – Minimum overlap fraction needed between sites, e.g. 0 = no overlap needed, 1 = full overlap needed. Default: 0.

  • max_overlap (float between 0-1, optional) – Controls how much overlap is allowed for individual sites. A value of 0 indicates that overlapping TFBS will not be saved as co-occurring. Float values between 0-1 indicate the fraction of overlap allowed (the overlap is always calculated as a fraction of the smallest TFBS). A value of 1 allows all overlaps. Default: 0 (no overlap allowed).

  • stranded (bool) – Whether to take strand of TFBSs into account. Default: False.

  • directional (bool) – Decide if direction of found pairs should be taken into account, e.g. whether “<—TF1—> <—TF2—>” is only counted as TF1-TF2 (directional=True) or also as TF2-TF1 (directional=False). Default: False.

  • binarize (bool, optional) – Whether to count a TF1-TF2 more than once per window (e.g. in the case of “<TF1> <TF2> <TF2> (…)”). Default: False.

  • anchor (str, optional) – The anchor to use for calculating distance. Must be one of [“inner”, “outer”, “center”]

  • n_background (int, optional) – Number of random co-occurrence backgrounds to obtain. This number effects the runtime of .count_within, but ‘threads’ can be used to speed up background calculation. Default: 50.

  • threads (int, optional) – Number of threads to use. Default: 1.

Returns:

Fills the object variables .TF_counts and .pair_counts.

Return type:

None

Raises:

ValueError – If .TFBS has not been filled.

get_pair_locations(pair, TF1_strand=None, TF2_strand=None, **kwargs)[source]

Get genomic locations of a particular TF pair. Requires .TFBS to be filled. If ‘count_within’ was run, the parameters used within the latest ‘count_within’ run are used. Else, the default values of tfcomb.utils.get_pair_locations() are used. Both options can be overwritten by setting kwargs.

Parameters:
  • pair (tuple) – Name of TF1, TF2 in pair.

  • TF1_strand (str, optional) – Strand of TF1 in pair. Default: None (strand is not taken into account).

  • TF2_strand (str, optional) – Strand of TF2 in pair. Default: None (strand is not taken into account).

  • kwargs (arguments) – Any additional arguments are passed to tfcomb.utils.get_pair_locations.

Return type:

tfcomb.utils.TFBSPairList

market_basket(measure='cosine', threads=1, keep_zero=False, n_baskets=1000000.0, _show_columns=['TF1_TF2_count', 'TF1_count', 'TF2_count'])[source]

Runs market basket analysis on the TF1-TF2 counts. Requires prior run of .count_within().

Parameters:
  • measure (str or list of strings, optional) – The measure(s) to use for market basket analysis. Can be any of: [“cosine”, “confidence”, “lift”, “jaccard”]. Default: ‘cosine’.

  • threads (int, optional) – Threads to use for multiprocessing. This is passed to .count_within() in case the <CombObj> does not contain any counts yet. Default: 1.

  • keep_zero (bool, optional) – Whether to keep rules with 0 occurrences in .rules table. Default: False (remove 0-rules).

  • n_baskets (int, optional) – The number of baskets used for calculating market basket measures. Default: 1e6.

Raises:

InputError – If the measure given is not within available measures.

reduce_TFBS()[source]

Reduce TFBS to the TFs present in .rules.

Return type:

None - changes .TFBS in place

simplify_rules()[source]

Simplify rules so that TF1-TF2 and TF2-TF1 pairs only occur once within .rules. This is useful for association metrics such as ‘cosine’, where the association of TF1->TF2 equals TF2->TF1. This function keeps the first unique pair occurring within the rules table.

select_TF_rules(TF_list, TF1=True, TF2=True, reduce_TFBS=True, inplace=False, how='inner')[source]

Select rules based on a list of TF names. The parameters TF1/TF2 can be used to select for which TF to create the selection on (by default: both TF1 and TF2).

Parameters:
  • TF_list (list) – List of TF names fitting to TF1/TF2 within .rules.

  • TF1 (bool, optional) – Whether to subset the rules containing ‘TF_list’ TFs within “TF1”. Default: True.

  • TF2 (bool, optional) – Whether to subset the rules containing ‘TF_list’ TFs within “TF2”. Default: True.

  • reduce_TFBS (bool, optional) – Whether to reduce the .TFBS of the new object to the TFs remaining in .rules after selection. Setting this to ‘False’ will improve speed, but also increase memory consumption. Default: True.

  • inplace (bool, optional) – Whether to make selection on current CombObj. If False,

  • how (string, optional) – How to join TF1 and TF2 subset. Default: inner

Raises:

InputError – If both TF1 and TF2 are False or if no rules were selected based on input.

Returns:

  • If inplace == False; tfcomb.objects.CombObj() – An object containing a subset of <Combobj>.rules.

  • if inplace == True; – Returns None

select_custom_rules(custom_list, reduce_TFBS=True)[source]

Select rules based on a custom list of TF pairs.

Parameters:
  • custom_list (list of strings) – List of TF pairs (e.g. a string “TF1-TF2”) fitting to TF1/TF2 combination within .rules.

  • reduce_TFBS (bool, optional) – Whether to reduce the .TFBS of the new object to the TFs remaining in .rules after selection. Setting this to ‘False’ will improve speed, but also increase memory consumption. Default: True.

Returns:

An object containing a subset of <Combobj>.rules

Return type:

tfcomb.objects.CombObj()

select_top_rules(n, reduce_TFBS=True)[source]

Select the top ‘n’ rules within .rules. By default, the .rules are sorted for the measure value, so n=100 will select the top 100 highest values for the measure (e.g. cosine).

Parameters:
  • n (int) – The number of rules to select.

  • reduce_TFBS (bool, optional) – Whether to reduce the .TFBS of the new object to the TFs remaining in .rules after selection. Setting this to ‘False’ will improve speed, but also increase memory consumption. Default: True.

Returns:

An object containing a subset of <Combobj>.rules

Return type:

tfcomb.objects.CombObj()

select_significant_rules(x='cosine', y='zscore', x_threshold=None, x_threshold_percent=0.05, y_threshold=None, y_threshold_percent=0.05, reduce_TFBS=True, plot=True, **kwargs)[source]

Make selection of rules based on distribution of x/y-measures

Parameters:
  • x (str, optional) – The name of the column within .rules containing the measure to be selected on. Default: ‘cosine’.

  • y (str, optional) – The name of the column within .rules containing the pvalue to be selected on. Default: ‘zscore’

  • x_threshold (float, optional) – A minimum threshold for the x-axis measure to be selected. If None, the threshold will be estimated from the data. Default: None.

  • x_threshold_percent (float between 0-1, optional) – If x_threshold is not given, x_threshold_percent controls the strictness of the automatic threshold selection. Default: 0.05.

  • y_threshold (float, optional) – A minimum threshold for the y-axis measure to be selected. If None, the threshold will be estimated from the data. Default: None.

  • y_threshold_percent (float between 0-1, optional) – If y_threshold is not given, y_threshold_percent controls the strictness of the automatic threshold selection. Default: 0.05.

  • reduce_TFBS (bool, optional) – Whether to reduce the .TFBS of the new object to the TFs remaining in .rules after selection. Setting this to ‘False’ will improve speed, but also increase memory consumption. Default: True.

  • plot (bool, optional) – Whether to show the ‘measure vs. pvalue’-plot or not. Default: True.

  • kwargs (arguments) – Additional arguments are forwarded to tfcomb.plotting.scatter

Returns:

An object containing a subset of <obj>.rules

Return type:

tfcomb.objects.CombObj()

integrate_data(table, merge='pair', TF1_col='TF1', TF2_col='TF2', prefix=None)[source]

Function to add external data to object rules.

Parameters:
  • table (str or pandas.DataFrame) – A table containing data to add to .rules. If table is a string, ‘table’ is assumed to be the path to a tab-separated table containing a header line and rows of data.

  • merge (str) – Which information to merge - must be one of “pair”, “TF1” or “TF2”. The option “pair” is used to merge infromation about TF-TF pairs such as protein-protein-interactions. The ‘TF1’ and ‘TF2’ can be used to include TF-specific information such as expression levels.

  • TF1_col (str, optional) – The column in table corresponding to “TF1” name. If merge == “TF2”, ‘TF1’ is ignored. Default: “TF1”.

  • TF2_col (str, optional) – The column in table corresponding to “TF2” name. If merge == “TF1”, ‘TF2’ is ignored. Default: “TF2”.

  • prefix (str, optional) – A prefix to add to the columns. Can be useful for adding the same information to both TF1 and TF2 (e.g. by using “TF1” and “TF2” prefixes), or adding same-name columns from different tables. Default: None (no prefix).

plot_TFBS(**kwargs)[source]

This is a wrapper for the plotting function tfcomb.plotting.genome_view

Parameters:

kwargs (arguments) – All arguments are passed to tfcomb.plotting.genome_view. Please see the documentation for input parameters.

plot_heatmap(n_rules=20, color_by='cosine', sort_by=None, **kwargs)[source]

Plot a heatmap of rules and their attribute values. This is a wrapper for the plotting function tfcomb.plotting.heatmap.

Parameters:
  • n_rules (int, optional) – The number of rules to show. The first n_rules rules of .rules are taken. Default: 20.

  • color_by (str, optional) – A column within .rules to color the heatmap by. Note: Can be different than sort_by. Default: “cosine”.

  • sort_by (str, optional) – A column within .rules to sort by before choosing n_rules. Default: None (rules are not sorted before selection).

  • kwargs (arguments) – Any additional arguments are passed to tfcomb.plotting.heatmap.

plot_bubble(n_rules=20, yaxis='cosine', color_by='TF1_TF2_count', size_by=None, sort_by=None, **kwargs)[source]

Plot a bubble-style scatterplot of the object rules. This is a wrapper for the plotting function tfcomb.plotting.bubble.

Parameters:
  • n_rules (int, optional) – The number of rules to show. The first n_rules rules of .rules are taken. Default: 20.

  • yaxis (str, optional) – A column within .rules to depict on the y-axis of the plot. Default: “cosine”.

  • color_by (str, optional) – A column within .rules to color points in the plot by. Default: “TF1_TF2_count”.

  • size_by (str, optional) – A column within .rules to size points in the plot by. Default: None.

  • sort_by (str, optional) – A column within .rules to sort by before choosing n_rules. Default: None (rules are not sorted before selection).

  • unique (bool, optional) – Only show unique pairs in plot, e.g. only the first occurrence of TF1-TF2 / TF2-TF1. Default: True.

  • kwargs (arguments) – Any additional arguments are passed to tfcomb.plotting.bubble.

plot_scatter(x, y, hue=None, **kwargs)[source]

Plot a scatterplot of information from .rules.

Parameters:
  • x (str) – The name of the column in .rules containing values to plot on x-axis.

  • y (str) – The name of the column in .rules containing values to plot on y-axis.

create_distObj()[source]

Creates a distObject, useful for manual analysis. Fills self.distObj.

analyze_distances(parent_directory=None, threads=4, correction=True, scale=True, **kwargs)[source]

Standard distance analysis workflow. Use create_distObj for own workflow steps and more options!

analyze_orientation()[source]

Analyze preferred orientation of sites in .TFBS. This is a wrapper for tfcomb.analysis.orientation().

Return type:

pd.DataFrame

build_network(**kwargs)[source]

Builds a TF-TF co-occurrence network for the rules within object. This is a wrapper for the tfcomb.network.build_nx_network() function, which uses the python networkx package.

Parameters:

kwargs (arguments) – Any additional arguments are passed to tfcomb.network.build_nx_network().

Return type:

None - fills the .network attribute of the CombObj with a networkx.Graph object

cluster_network(method='louvain', weight=None)[source]

Creates a clustering of nodes within network and add a new node attribute “cluster” to the network.

Parameters:
  • method (str, one of ["louvain", "blockmodel"]) – The method Default: “louvain”.

  • weight (str, optional) – The name of the edge attribute to use as weight. Default: None (not weighted).

plot_network(color_node_by='TF1_count', color_edge_by='cosine', size_edge_by='TF1_TF2_count', **kwargs)[source]

Plot the rules in .rules as a network using Graphviz for python. This function is a wrapper for building the network (using tfcomb.network.build_network) and subsequently plotting the network (using tfcomb.plotting.network).

Parameters:
  • color_node_by (str, optional) – A column in .rules or .TF_table to color nodes by. Default: ‘TF1_count’.

  • color_edge_by (str, optional) – A column in .rules to color edges by. Default: ‘cosine’.

  • size_edge_by (str, optional) – A column in rules to size edge width by. Default: ‘TF1_TF2_count’.

  • kwargs (arguments) – All other arguments are passed to tfcomb.plotting.network.

compare(obj_to_compare, measure='cosine', join='inner', normalize=True)[source]

Utility function to create a DiffCombObj directly from a comparison between this CombObj and another CombObj. Requires .market_basket() run on both objects. Runs DiffCombObj.normalize (if chosen) and DiffCombObj.calculate_foldchanges() under the hood.

Note

Set .prefix for each object to get proper naming of output log2fc columns.

Parameters:
  • obj_to_compare (tfcomb.objects.CombObj) – Another CombObj to compare to the current CombObj.

  • measure (str, optional) – The measure to compare between objects. Default: ‘cosine’.

  • join (string) – How to join the TF names of the two objects. Must be one of “inner” or “outer”. If “inner”, only TFs present in both objects are retained. If “outer”, TFs from both objects are used, and any missing counts are set to 0. Default: “inner”.

  • normalize (bool, optional) – Whether to normalize values between objects. Default: True.

Return type:

DiffCombObj

class tfcomb.objects.DiffCombObj(objects=[], measure='cosine', join='inner', fillna=True, verbosity=1)[source]

Bases: object

add_object(obj, join='inner', fillna=True)[source]

Add one CombObj to the DiffCombObj.

Parameters:
  • obj (CombObj) – An instance of CombObj

  • join (string) – How to join the TF names of the two objects. Must be one of “inner” or “outer”. If “inner”, only TFs present in both objects are retained. If “outer”, TFs from both objects are used, and any missing counts are set to 0. Default: “inner”.

  • fillna (True) – If “join” == “outer”, there can be missing counts for individual rules. If fillna == True, these counts are set to 0. Else, the counts are NA. Default: True.

Returns:

Object is added in place

Return type:

None

normalize()[source]

Normalize the values for the DiffCombObj given measure (.measure) using quantile normalization. Overwrites the <prefix>_<measure> columns in .rules with the normalized values.

calculate_foldchanges(pseudo=0.01)[source]

Calculate measure foldchanges between objects in DiffCombObj. The measure is chosen at the creation of the DiffCombObj and defaults to ‘cosine’.

Parameters:

pseudo (float, optional) – Set the pseudocount to add to all values before log2-foldchange transformation. Default: 0.01.

See also

tfcomb.DiffCombObj.normalize

select_rules(contrast=None, measure='cosine', measure_threshold=None, measure_threshold_percent=0.05, mean_threshold=None, mean_threshold_percent=0.05, plot=True, **kwargs)[source]

Select differentially regulated rules using a MA-plot on the basis of measure and mean of measures per contrast.

Parameters:
  • contrast (tuple) – Name of the contrast to use in tuple format e.g. (<prefix1>,<prefix2>). Default: None (the first contrast is shown).

  • measure (str, optional) – The measure to use for selecting rules. Default: “cosine” (internally converted to <prefix1>/<prefix2>_<measure>_log2fc).

  • measure_threshold (tuple, optional) – Threshold for ‘measure’ for selecting rules. Default: None (the threshold is estimated automatically)

  • measure_threshold_percent (float between 0-1) – If measure_threshold is not set, measure_threshold_percent controls the strictness of the automatic threshold. If you increase this value, more differential rules will be found and vice versa. Default: 0.05.

  • mean_threshold (float, optional) – Threshold for ‘mean’ for selecting rules. Default: None (the threshold is estimated automatically)

  • mean_threshold_percent (float between 0-1) – if mean_threshold is not set, mean_threshold_percent controls the strictness of the automatic threshold. If you increase this value, more differential rules will be found and vice versa. Default: 0.05.

  • plot (boolean, optional) – Whether to plot the volcano plot. Default: True.

  • kwargs (arguments, optional) – Additional arguments are passed to tfcomb.plotting.scatter.

Returns:

An object containing a subset of <DiffCombobj>.rules

Return type:

tfcomb.objects.DiffCombObj()

See also

tfcomb.plotting.volcano

plot_correlation(method='pearson', save=None, **kwargs)[source]

Plot correlation of ‘measure’ between rules across objects.

Parameters:
  • method (str, optional) – Either ‘pearson’ or ‘spearman’. Default: ‘pearson’.

  • save (str, optional) – Save the plot to the file given in ‘save’. Default: None.

  • kwargs (arguments, optional) – Additional arguments are passed to sns.clustermap.

plot_rules_heatmap(**kwargs)[source]

Plot a heatmap of size n_rules x n_objects

plot_heatmap(contrast=None, n_rules=10, color_by='cosine_log2fc', sort_by=None, **kwargs)[source]

Functionality to plot a heatmap of differentially co-occurring TF pairs for a certain contrast.

Parameters:
  • contrast (tuple, optional) – Name of the contrast to use in tuple format e.g. (<prefix1>,<prefix2>). Default: None (the first contrast is shown).

  • n_rules (int, optional) – Number of rules to show from each contrast (default: 10). Note: This is the number of rules either up/down, meaning that the rules shown are n_rules * 2.

  • color_by (str, optional) – Default: “cosine” (converted to “<prefix1>/<prefix2>_<color_by>”)

  • sort_by (str, optional) – Column in .rules to sort rules by. Default: None (keep sort)

  • kwargs (arguments, optional) – Additional arguments are passed to tfcomb.plotting.heatmap.

plot_bubble(contrast=None, n_rules=20, yaxis='cosine_log2fc', color_by=None, size_by=None, **kwargs)[source]

Plot bubble scatterplot of information within .rules.

Parameters:
  • contrast (tuple, optional) – Name of the contrast to use in tuple format e.g. (<prefix1>,<prefix2>). Default: None (the first contrast is shown).

  • n_rules (int, optional) – Number of rules to show (in each direction). Default: 20.

  • yaxis (str, optional) – Measure to show on the y-axis. Default: “cosine_log2fc”.

  • color_by (str, optional) – If column is not in rules, the string is supposed to be in the form “prefix1/prefix2_<color_by>”. Default: None.

  • size_by (str, optional) – Column to size bubbles by. Default: None.

  • kwargs (arguments) – Any additional arguments are passed to tfcomb.plotting.bubble.

plot_network(contrast=None, color_node_by=None, size_node_by=None, color_edge_by='cosine_log2fc', size_edge_by=None, **kwargs)[source]

Plot the network of differential co-occurring TFs.

Parameters:
  • contrast (tuple) – Name of the contrast to use in tuple format e.g. (<prefix1>,<prefix2>). Default: None (the first contrast is shown).

  • color_node_by (str, optional) – Name of measure to color node by. If column is not in .rules, the name will be internally converted to “prefix1/prefix2_<color_edge_by>”. Default: None.

  • size_node_by (str, optional) – Column in .rules to size_node_by. If column is not in .rules, the name will be internally converted to “prefix1/prefix2_<size_node_by>” Default: None.

  • color_edge_by (str, optional) – The name of measure or column to color edge by (will be internally converted to “prefix1/prefix2_<color_edge_by>”). Default: “cosine_log2fc”.

  • size_edge_by (str, optional) – The name of measure or column to size edge by.

  • kwargs (arguments) – Any additional arguments are passed to tfcomb.plotting.network.

Return type:

dot network object

to_pickle(path: str)[source]

Save the DiffCombObj to a pickle file.

Parameters:

path (str) – Path to the output pickle file e.g. ‘my_diff_comb_obj.pkl’.

See also

from_pickle

from_pickle(path: str)[source]

Import a DiffCombObj from a pickle file.

Parameters:

path (str) – Path to an existing pickle file to read.

Raises:

InputError – If read object is not an instance of DiffCombObj.

See also

to_pickle

class tfcomb.objects.DistObj(verbosity=1)[source]

Bases: object

The main class for analyzing preferred binding distances for co-occurring TFs.

Examples

>>> D = tfcomb.distances.DistObj()

# Verbosity of the output log can be set using the ‘verbosity’ parameter: >>> D = tfcomb.distances.DistObj(verbosity=2)

set_verbosity(level)[source]

Set the verbosity level for logging after creating the CombObj.

Parameters:

level (int) – A value between 0-3 where 0 (only errors), 1 (info), 2 (debug), 3 (spam debug).

Returns:

Sets the verbosity level for the Logger inplace

Return type:

None

fill_rules(comb_obj)[source]

Fill DistanceObject according to reference object with all needed Values and parameters to perform standard prefered distance analysis

Parameters:

comb_obj (tfcomb.objects (or any other object contain all necessary rules)) – Object from which the rules and parameters should be copied from

Returns:

Copies values and parameters from a combObj or diffCombObj.

Return type:

None

reset_signal()[source]

Resets the signals to their original state.

Returns:

Resets the object datasource variable to the original raw distances

Return type:

None

check_datasource(att)[source]

Utility function to check if distances in .<att> were set. If not, InputError is raised.

Parameters:

att (str) – Attribute name for a dataframe in self.

check_peaks()[source]

Utility function to check if peaks were called. If not, InputError is raised.

check_min_max_dist()[source]

Utility function to check if min and max distance are valid.

static chunk_table(table, n)[source]

Split a pandas dataframe row-wise into n chunks.

Parameters:

n (int) – A positive number of chunks to split table into.

Return type:

list of pd.DataFrames

count_distances(directional=None, stranded=None, percentage=False, percentage_bins=100)[source]
Count distances for co_occurring TFs, can be followed by analyze_distances

to determine preferred binding distances

Parameters:
  • directional (bool or None, optional) – Decide if direction of found pairs should be taken into account, e.g. whether “<—TF1—> <—TF2—>” is only counted as TF1-TF2 (directional=True) or also as TF2-TF1 (directional=False). If directional is None, self.directional will be used. Default: None.

  • stranded (bool or None, optional) – Whether to take strand of TFBS into account when counting distances. If stranded is None, self.stranded will be used. Default: None

  • percentage (bool, optional) – Whether to count distances as bp or percentage of longest TF1/TF2 region. If True, output will be collected in 1-percent increments from 0-1. If False, output depends on the min/max distance values given in the DistObj. Default: False.

Returns:

Fills the object variable .distances.

Return type:

None

scale(how='min-max')[source]

Scale the counted distances per pair. Saves the scaled counts into .scaled and updates .datasource.

Parameters:

how (str, optional) – How to scale the counts. Must be one of: [“min-max”, “fraction”]. If “min-max”, all counts are scaled between 0 and 1. If “fraction”, the sum of all counts are scled between 0 and 1. Default: “min-max”.

smooth(window_size=3, reduce=True)[source]

Helper function for smoothing all rules with a given window size.

Parameters:
  • window_size (int, optional) – Window size for the rolling smoothing window. A bigger window produces larger flanking ranks at the sides. Default: 3.

  • reduce (bool, optional) – Reduce the distances to the positions with a full window, i.e. if the window size is 3, the first and last distances are removed. This prevents flawed enrichment of peaks at the borders of the distances. Default: True.

Returns:

Fills the object variable .smoothed and updates .datasource

Return type:

None

is_smoothed()[source]

Return True if data was smoothed during analysis, False otherwise

Returns:

True if smoothed, False otherwiese

Return type:

bool

correct_background(frac=0.66, threads=1)[source]

Corrects the background of distances.

Parameters:
  • frac (float, optional) – Fraction of data used to calculate smooth. Setting this fraction lower will cause stronger smoothing effect. Default: 0.66

  • threads (int, optional) – Number of threads to use in functions. Default: 1.

Returns:

Fills the object variable .corrected

Return type:

None

analyze_signal_all(threads=1, method='zscore', threshold=2, min_count=1, save=None)[source]

After background correction is done, the signal is analyzed for peaks, indicating preferred binding distances. There can be more than one peak (more than one preferred binding distance) per Signal. Peaks are called with scipy.signal.find_peaks().

Parameters:
  • threads (int) – Number of threads used. Default: 1.

  • method (str) – Method for transforming counts. Can be one of: “zscore” or “flat”. If “zscore”, the zscore for the pairs is used. If “flat”, no transformation is performed. Default: “zscore”.

  • threshold (float) – The lower threshold for selecting peaks. Default: 2.

  • min_count (int) – Minimum count of TF1-TF2 occurrences for a preferred distance to be called. Default: 1 (all occurrences are considered).

  • save (str) – Path to save the peaks table to. Default: None (table is not written).

Returns:

Fills the object variable self.peaks, self.peaking_count

Return type:

None

evaluate_noise(threads=1, method='median', height_multiplier=0.75)[source]

Evaluates the noisiness of the signal. Therefore the peaks are cut out and the remaining signal is analyzed.

Parameters:
  • threads (int) – Number of threads used for evaluation. Default: 1

  • method (str) – Measurement to calculate the noisiness of a signal. One of [“median”, “min_max”]. Default: “median”

  • height_multiplier (float) – Height multiplier (percentage) to calculate cut points. Must be between 0 and 1. Default: 0.75

rank_rules(by=['Distance_percent', 'Peak Heights', 'Noisiness'], calc_mean=True)[source]

ranks rules within each column specified.

Parameters:
  • by (list of strings) – Columns for wich the rules should be ranked Default: [“Distance_percent”, “Peak Heights”, “Noisiness”]

  • calc_mean (bool) – True if an extra column should be calculated containing the mean rank, false otherwise Default: True

Raises:

InputError – If columns selection (parameter: by) is not valid.

Returns:

adds a rank column for each criteria given plus one for the mean if set to True

Return type:

None

mean_distance(source='datasource')[source]

Get the mean distance for each rule in .rules.

Return type:

pandas.DataFrame containing “mean_distance” per rule.

max_distance(source='datasource')[source]

Get the distance with the maximum signal for each rule in .rules.

Parameters:

source (str) – The name of the datasource to use for calculation. Default: “datasource” (the current state of data).

Return type:

pandas.DataFrame containing “max_distance” per rule.

analyze_hubs()[source]

Counts the number of different partners each transcription factor forms a peak with, with at least one peak.

Returns:

A panda series with the tf as index and the count as integer

Return type:

pd.Series

count_peaks()[source]

Counts the number of identified distance peaks per rules.

Returns:

A dataframe containing ‘n_peaks’ (column) for each TF1-TF2 rule (index)

Return type:

pd.DataFrame

classify_rules()[source]

Classify all rules True if at least one peak was found, False otherwise.

Returns:

fills .classified

Return type:

None

get_periodicity()[source]

Calculate periodicity for all rules via autocorrelation.

Returns:

Fills the object variable .autocorrelation and .periodicity

Return type:

None

plot_autocorrelation(pair)[source]

Plot the autocorrelation for a pair, which shows the lag of periodicity in the counted distances.

Parameters:

pair (tuple(str, str)) – TF names to plot. e.g. (“NFYA”,”NFYB”)

build_network()[source]

Builds a TF-TF co-occurrence network for the rules within object.

Returns:

fills the .network attribute of the CombObj with a networkx.Graph object

Return type:

None

See also

tfcomb.network.build_nx_network

plot_bg_estimation(pair)[source]

Plot the background estimation for pair for debugging background estimation

Parameters:

pair (tuple(str, str)) – TF names to plot. e.g. (“NFYA”,”NFYB”)

plot(pair, method='peaks', style='hist', show_peaks=True, save=None, config=None, collapse=None, ax=None, color='tab:blue', max_dist=None, **kwargs)[source]

Produces different plots.

Parameters:
  • pair (tuple(str, str)) – TF names to plot. e.g. (“NFYA”,”NFYB”)

  • method (str, optional) – Plotting method. One of: - ‘peaks’: Shows the z-score signal and any peaks found by analyze_signal_all. - ‘correction’: Shows the fit of the lowess curve to the data. - ‘datasource’, ‘distances’, ‘scaled’, ‘corrected’, ‘smoothed’: Shows the signal of the counts given in the .<method> table. Default: ‘peaks’.

  • style (str, optional) – What style to plot the datasource in. Can be one of: [“hist”, “kde”, “line”]. Default: “hist”.

  • show_peaks (bool, optional) – Whether to show the identified peak(s) (if any were found) in the plot. Default: True.

  • save (str, optional) – Path to save the plots to. If save is None plots won’t be plotted. Default: None

  • config (dict, optional) –

    Config for some plotting methods.

    e.g. {“nbins”:100} for histogram like plots or {“bwadjust”:0.1} for kde (densitiy) plot.

    If set to None, below mentioned default parameters are used.

    possible parameters:

    [hist]: n_bins, Default: self.max_dist - self.min_dist + 1

    [kde]: bwadjust, Default: 0.1 (see seaborn.kdeplot())

    Default: None

  • collapse (str, optional) – None if negative data should not be collapsed. [“min”,”max”,”mean”,”sum”] allowed as methods. See ._collapse_negative() for more information.

  • ax (plt.axis) – Plot to an existing axis object. Default: None (a new axis will be created).

  • color (str, optional) – Color of the plot hist/line/kde. Default: “tab:blue”.

  • max_dist (int, optional) – Option to set the max_dist independent of the max_dist used for counting distances. Default: None (max_dist is not changed).

  • kwargs (arguments) – Additional arguments are passed to plt.hist().

plot_network(color_node_by='TF1_count', color_edge_by='Distance', size_edge_by='Distance_percent', **kwargs)[source]

Plot the rules in .rules as a network using Graphviz for python. This function is a wrapper for building the network (using tfcomb.network.build_network) and subsequently plotting the network (using tfcomb.plotting.network).

Parameters:
  • color_node_by (str, optional) – A column in .rules or .TF_table to color nodes by. Default: ‘TF1_count’.

  • color_edge_by (str, optional) – A column in .rules to color edges by. Default: ‘Distance’.

  • size_edge_by (str, optional) – A column in rules to size edge width by. Default: ‘TF1_TF2_count’.

  • **kwargs (arguments) – All other arguments are passed to tfcomb.plotting.network.