tfcomb.utils module
- class tfcomb.utils.OneTFBS(lst=[])[source]
Bases:
listCollects location information about one single TFBS
- class tfcomb.utils.TFBSPair(TFBS1, TFBS2, anchor='inner', simplify=False)[source]
Bases:
objectCollects information about a co-occurring pair of TFBS
- class tfcomb.utils.TFBSPairList(iterable=(), /)[source]
Bases:
listClass for collecting and analyzing a list of TFBSPair objects
- write_bed(outfile, fmt='bed', merge=False)[source]
Write the locations of (TF1, TF2) pairs to a bed-file.
- Parameters:
locations (list) – The output of get_pair_locations().
outfile (str) – The path which the pair locations should be written to.
fmt (str, optional) – The format of the output file. Must be one of “bed” or “bedpe”. If “bed”, the TF1/TF2 sites are written individually (see merge option to merge sites). If “bedpe”, the sites are written in BEDPE format. Default: “bed”.
merge (bool, optional) – If fmt=”bed”, ‘merge’ controls how the locations are written out. If True, will be written as one region spanning TF1.start-TF2.end. If False, TF1/TF2 sites are written individually. Default: False.
- remove(value)[source]
Remove first occurrence of value.
Raises ValueError if the value is not present.
- pop(index=-1)[source]
Remove and return item at index (default last).
Raises IndexError if list is empty or index is out of range.
- property bigwig_path
Get path to bigwig file.
- property plotting_tables
Getter for plotting_tables. Will compute if necessary.
- comp_plotting_tables(flank=100, align='center')[source]
Prepare pair and score tables for plotting.
- Parameters:
flank (int or tuple, default 100) – Window size of TFBSpair. Adds given amount of bases in both directions counted from alignment anchor (see align) between binding sites. Use a tuple of ints to set left and right flank independently.
align (str, default 'center') –
- Position from which the flanking regions are added. Must be one of ‘center’, ‘left’, ‘right’.
’center’: Midpoint between binding positions (rounded down if uneven). ‘left’: End of first binding position in pair. ‘right’: Start of second binding position in pair.
- pairMap(logNorm_cbar=None, show_binding=True, flank_plot='strand', figsize=(7, 14), output=None, flank=None, align=None, alpha=0.7, cmap='seismic', show_diagonal=True, legend_name_score='Bigwig Score', xtick_num=10, log=<ufunc 'log1p'>, dpi=300)[source]
Create a heatmap of TF binding pairs sorted for distance.
- Parameters:
logNorm_cbar (str, default None) –
[None, “centerLogNorm”, “SymLogNorm”] Choose a type of normalization for the colorbar.
- SymLogNorm:
Use matplotlib.colors.SymLogNorm. This does not center to 0
- centerLogNorm:
Use custom matplotlib.colors.SymLogNorm from stackoverflow. Note this creates a weird colorbar.
show_binding (bool, default True) – Shows the TF binding positions as a grey background.
flank_plot (str, default 'strand') – [“strand”, “orientation”, None] Decide if the plots flanking the heatmap should be colored for strand, strand-orientation or disabled.
figsize (int tuple, default (7, 14)) – Figure dimensions.
output (str, default None) – Save plot to given file.
flank (int or int tuple, default None) – Bases added to both sides counted from center. Forwarded to comp_plotting_tables().
align (str, default None) – Alignment of pairs. One of [‘left’, ‘right’, ‘center’]. Forwarded to comp_plotting_tables().
alpha (float, default 0.7) – Alpha value for diagonal lines, TF binding positions and center line.
cmap (matplotlib colormap name or object, or list of colors, default 'seismic') – Color palette used in the main heatmap. Forwarded to seaborn.heatmap(cmap)
show_diagonal (boolean, default True) – Shows diagonal lines for identifying preference in binding distance.
legend_name_score (str, default 'Bigwig Score') – Name of the score legend (upper legend).
xtick_num (int, default 10) – Number of ticks shown on the x-axis. Disable ticks with None or values < 0.
log (function, default numpy.log1p) – Function applied to each row of scores. Excludes 0 and will use absolute value for negative numbers adding the sign afterwards. Use any of the numpy.log functions. For example numpy.log, numpy.log10 or numpy.log1p. None to disable.
dpi (float, default 300) – The resolution of the figure in dots-per-inch.
- Returns:
Object containing the finished pairMap.
- Return type:
matplotlib.gridspec.GridSpec
- pairTrack(dist=None, start=None, end=None, ymin=None, ymax=None, ylabel='Bigwig signal', output=None, flank=None, align=None, figsize=(6, 4), dpi=70, _ret_param=False)[source]
Create an aggregated footprint track on the paired binding sites. Either aggregate all sites for a specific distance or give a range of sites that should be aggregated. If the second approach spans multiple distances the binding locations are shown as a range as well.
- Parameters:
dist (int or int list, default None) – Show track for one or more distances between binding sites.
start (int, default None) – Define start of range of sites that should be aggregated. If set will ignore ‘dist’.
end (int, default None) – Define end of range of sites that should be aggregated. If set will ignore ‘dist’.
ymin (int, default None) – Y-axis minimum limit.
ymax (int, default None) – Y-axis maximum limit.
ylabel (str, default 'Bigwig signal') – Label for the y-axis.
output (str, default None) – Save plot to given file.
flank (int or int tuple, default None) – Bases added to both sides counted from center. Forwarded to comp_plotting_tables().
align (str, default None) – Alignment of pairs. One of [‘left’, ‘right’, ‘center’]. Forwarded to comp_plotting_tables().
figsize (int tuple, default (3, 3)) – Figure dimensions.
dpi (float, default 70) – The resolution of the figure in dots-per-inch.
_ret_param (bool, default False) – Intended for internal animation use! If True will cause the function to return a dict of function call parameters used to create plot.
- Returns:
Return axes object of the plot.
- Return type:
matplotlib.axes._subplots.AxesSubplot or dict
- pairTrackAnimation(site_num=None, step=10, ymin=None, ymax=None, ylabel='Bigwig signal', interval=50, repeat_delay=0, repeat=False, output=None, flank=None, align=None, figsize=(6, 4), dpi=70)[source]
Combine a set of pairTrack plots to a .gif.
Note
The memory limit can be increased with the following if necessary. Default is 20 MB. matplotlib.rcParams[‘animation.embed_limit’] = 100 # in MB
- Parameters:
site_num (int, default None) – Number of sites to aggregate for every step. If None will aggregate by distance between binding pair.
step (int, default None) – Step size between aggregations. Will be ignored if site_num=None.
ymin (int, default None) – Y-axis minimum limit
ymax (int, default None) – Y-axis maximum limit
ylabel (str, default 'Bigwig signal') – Label for the y-axis.
interval (int, default 50) – Delay between frames in milliseconds
repeat_delay (int, default 0) – The delay in milliseconds between consecutive animation runs, if repeat is True.
repeat (boolean, default False) – Whether the animation repeats when the sequence of frames is completed.
output (str, default None) – Save plot to given file.
flank (int or int tuple, default None) – Bases added to both sides counted from center. Forwarded to comp_plotting_tables().
align (str, default None) – Alignment of pairs. One of [‘left’, ‘right’, ‘center’]. Forwarded to comp_plotting_tables().
figsize (int tuple, default (6, 4)) – Figure dimensions.
dpi (float, default 70) – The resolution of the figure in dots-per-inch.
- Returns:
Gif object ready to display in a jupyter notebook.
- Return type:
IPython.core.display.HTML
- pairLines(x, y, figsize=(6, 4), dpi=70, output=None)[source]
Compare miscellaneous values between TF-pair.
- Parameters:
x (string) – Data to show on the x-axis. Set None to get a list of options.
y (string) – Data to show on the y-axis. Set None to get a list of options.
figsize (int tuple, default (6, 4)) – Figure dimensions.
dpi (float, default 70) – The resolution of the figure in dots-per-inch.
output (str, default None) – Save plot to given file.
- Returns:
Return axes object of the plot.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- plot_distances(groupby='orientation', figsize=None, group_order=None)[source]
Plot the distribution of distances between TFBS-pairs.
- Parameters:
groupby (str) – An attribute of each pair to group distances by. If None, all distances are shown without grouping. Default: “orientation”.
figsize (tuple of ints) – Set the figure size, e.g. (8,10). Default: None (default matplotlib figuresize).
- exception tfcomb.utils.InputError[source]
Bases:
ExceptionRaises an InputError exception without writing traceback
- exception tfcomb.utils.StopExecution[source]
Bases:
ExceptionStop execution of a notebook cell with error message
- tfcomb.utils.check_graphtool()[source]
Utility to check if ‘graph-tool’ is installed on path. Raises an exception (if notebook) or exits (if script) if the module is not installed.
- tfcomb.utils.check_columns(df, columns)[source]
Utility to check whether columns are found within a pandas dataframe.
- Parameters:
df (pandas.DataFrame) – A pandas dataframe to check.
columns (list) – A list of column names to check for within ‘df’.
- Raises:
InputError – If any of the columns are not in ‘df’.
- tfcomb.utils.check_dir(dir_path, create=True)[source]
Check if a dir is writeable.
- Parameters:
dir_path (str) – A path to a directory.
- Raises:
InputError – If dir_path is not writeable.
- tfcomb.utils.check_writeability(file_path)[source]
Check if a file is writeable.
- Parameters:
file_path (str) – A path to a file.
- Raises:
InputError – If file_path is not writeable.
- tfcomb.utils.check_type(obj, allowed, name=None)[source]
Check whether given object is within a list of allowed types.
- Parameters:
obj (object) – Object to check type on
allowed (type or list of types) – A type or a list of object types to be allowed
name (str, optional) – Name of object to be written in error. Default: None (the input is referred to as ‘object’)
- Raises:
InputError – If object type is not within types.
- tfcomb.utils.check_string(astring, allowed, name=None)[source]
Check whether given string is within a list of allowed strings.
- Parameters:
astring (str) – A string to check.
allowed (str or list of strings) – A string or list of allowed strings to check against ‘astring’.
name (str, optional) – The name of the string to be written in error. Default: None (the value is referred to as ‘string’).
- Raises:
InputError – If ‘astring’ is not in ‘allowed’.
- tfcomb.utils.check_value(value, vmin=-inf, vmax=inf, integer=False, name=None)[source]
Check whether given ‘value’ is a valid value (or integer) and if it is within the bounds of vmin/vmax.
- Parameters:
value (int or float) – The value to check.
vmin (int or float, optional) – Minimum the value is allowed to be. Default: -infinity (no bound)
vmax (int or float) – Maxmum the value is allowed to be. Default: +infinity (no bound)
integer (bool, optional) – Whether value must be an integer. Default: False (value can be float)
name (str, optional) – The name of the value to be written in error. Default: None (the value is referred to as ‘value’).
- Raises:
InputError – If ‘value’ is not a valid value as given by parameters.
- tfcomb.utils.log_progress(jobs, logger, n=10)[source]
Log progress of jobs within job list.
- Parameters:
jobs (list) – List of multiprocessing jobs to write progress for.
logger (logger instance) – A logger to use for writing out progress.
n (int, optional) – Maximum number of progress statements to show. Default: 10.
- tfcomb.utils.prepare_motifs(motifs_file, motif_pvalue=0.0001, motif_naming='name')[source]
Read motifs from motifs_file and set threshold/name.
- tfcomb.utils.open_genome(genome_f)[source]
Opens an internal genome object for fetching sequences.
- Parameters:
genome_f (str) – The path to a fasta file.
- Return type:
pysam.FastaFile
- tfcomb.utils.check_boundaries(regions, genome)[source]
Utility to check whether regions are within the boundaries of genome.
- Parameters:
regions (tobias.utils.regions.RegionList) – A RegionList() object containing regions to check.
genome (pysam.FastaFile) – An object (e.g. from open_genome()) to use as reference.
- Raises:
InputError – If a region is not available within genome
- tfcomb.utils.unique_region_names(regions)[source]
Get a list of unique region names within regions.
- Parameters:
regions (tobias.utils.regions.RegionList) – A RegionList() object containing regions with .name attributes.
- Returns:
The list of sorted names from regions.
- Return type:
list
- tfcomb.utils.calculate_TFBS(regions, motifs, genome, resolve='merge')[source]
Multiprocessing-safe function to scan for motif occurrences
- Parameters:
genome (str or) – If string , genome will be opened
regions (RegionList()) – A RegionList() object of regions
resolve (str) – How to resolve overlapping sites from the same TF. Must be one of “off”, “highest_score” or “merge”. If “highest_score”, the highest scoring overlapping site is kept. If “merge”, the sites are merged, keeping the information of the first site. If “off”, overlapping TFBS are kept. Default: “merge”.
- Return type:
List of TFBS within regions
- tfcomb.utils.resolve_overlaps(sites, how='merge', per_name=True)[source]
Resolve overlapping sites within a list of genomic regions.
- Parameters:
sites (RegionList) – A list of TFBS/regions with .chrom, .start, .end and .name information.
how (str) – How to resolve the overlapping site. Must be one of “highest_score”, “merge”. If “highest_score”, the highest scoring overlapping site is kept. If “merge”, the sites are merged, keeping the information of the first site. Default: “merge”.
per_name (bool) – Whether to resolve overlapping only per name or across all sites. If ‘True’ overlaps are only resolved if the name of the sites are equal. If ‘False’, overlaps are resolved across all sites. Default: True.
- tfcomb.utils.add_region_overlap(a, b, att='overlap')[source]
Overlap regions in regionlist ‘a’ with regions from regionlist ‘b’ and add a boolean attribute to the regions in ‘a’ containing overlap status with ‘b’.
- Parameters:
a (list of OneTFBS objects) – A list of objects containing genomic locations.
b (list of OneTFBS objects) – A list of objects containing genomic locations to overlap with ‘a’ regions.
att (str, optional) – The name of the attribute to add to ‘a’ objects. Default: “overlap”.
- tfcomb.utils.shuffle_sites(sites, seed=1)[source]
Shuffle TFBS names to existing positions and updates lengths of the new positions.
- Parameters:
sites (np.array) – An array of sites in shape (n_sites,4), where each row is a site and columns correspond to chromosome, start, end, name.
- Return type:
An array containing shuffled names with site lengths corresponding to original length of sites.
- tfcomb.utils.calculate_background(sites, seed=1, directional=False, **kwargs)[source]
Wrapper to shuffle sites and count co-occurrence of the shuffled sites.
- Parameters:
sites (np.array) – An array of sites in shape (n_sites,4), where each row is a site and columns correspond to chromosome, start, end, name.
seed (int, optional) – Seed for shuffling sites. Default: 1.
directional (bool) – Decide if direction of found pairs should be taken into account. Default: False.
kwargs (arguments) – Additional arguments for count_co_occurrence
- tfcomb.utils.get_threshold(data, which='upper', percent=0.05, _n_max=10000, verbosity=0, plot=False)[source]
Function to get upper/lower threshold(s) based on the distribution of data. The threshold is calculated as the probability of “percent” (upper=1-percent).
- Parameters:
data (list or array) – An array of data to find threshold on.
which (str) – Which threshold to calculate. Can be one of “upper”, “lower”, “both”. Default: “upper”.
percent (float between 0-1) – Controls how strict the threshold should be set in comparison to the distribution. Default: 0.05.
- Return type:
If which is one of “upper”/”lower”, get_threshold returns a float. If “both”, get_threshold returns a list of two float thresholds.
- tfcomb.utils.make_symmetric(matrix)[source]
Make a numpy matrix matrix symmetric by merging x-y and y-x
- tfcomb.utils.set_contrast(contrast, available_contrasts)[source]
Utility function for the plotting functions of tfcomb.objects.DiffCombObj
- tfcomb.utils.analyze_signal_chunks(datasource, threshold)[source]
Evaluating signal for chunks.
- Parameters:
datasource (pd.DataFrame) – A (sub-)Dataframe with the (corrected) distance counts for the pairs
threshold (float) – Threshold for prominence and height in peak calling (see scipy.signal.find_peaks() for detailed information)
- Returns:
list of found peaks in form [TF1, TF2, Distance, Peak Heights, Prominences, Prominence Threshold]
- Return type:
list
See also
tfcomb.object.analyze_signal_all
- tfcomb.utils.evaluate_noise_chunks(signals, peaks, method='median', height_multiplier=0.75)[source]
Evaluate the noisiness of a signal for chunks (a chunk can also be the whole dataset).
- Parameters:
pairs (list(tuples(str,str))) – list of pairs to perform analysis on
signals (pd.Dataframe) – A (sub-)Dataframe containing signal data for pairs
method (str, otional) – Method used to get noise measurement, either “median” or “min_max” allowed. Default: “median”
height_multiplier (float, optional) – Height multiplier (percentage) to calculate cut points. Must be between 0 and 1. Default: 0.75
- Raises:
ValueError – If no signal data is given for a pair
Note
Constraint: DataFrame with signals need to contain a signal for each pair given within pairs.
- tfcomb.utils.getAllAttr(object, private=False, functions=False)[source]
Collect all attributes of an object and return as dict.
- Parameters:
private (boolean, default False) – If private attributes should be included. Everything with ‘_’ prefix.
functions (boolean, default False) – If callable attributes ie functions shoudl be included.
- Returns:
Dict of all the objects attributes.
- Return type:
dictionary