Histogram

class histpy.Histogram(edges, contents=None, sumw2=None, labels=None, axis_scale=None, sparse=None, unit=None, track_overflow=None, dtype=None, copy_contents=True)[source]

Bases: object

This is a wrapper of a numpy array with axes and a fill method. Sparse array from pydata’s sparse package are also supported.

Like an array, the histogram can have an arbitrary number of dimensions.

Standard numpy array indexing is supported to get the contents –i.e. h[:], h[4], h[[1,3,4]], h[:,5:50:2]], etc.–. However, the meaning of the -1 index is different. Instead of counting from the end, -1 corresponds to the underflow bin. Similarly, an index equal to the number of bins corresponds to the overflow bin.

You can however give relative position with respect to h.end –e.g. h[0:h.end] result in all regular bins, h[-1:h.end+1] includes also the underflow/overflow bins and h[h.end] gives you the contents of the overflow bin. The convenient aliases h.uf = -1, h.of = e.end and h.all = slice(-1,h.end+1) –or slice(0,h.end) if the under/overflow are not tracked– are provided.

You can also use an Ellipsis object (...) at the end to specify that the contents from the rest of the dimension are to have the under and overflow bins included. e.g. for a 3D histogram h[1,-1:h.end+1,-1:h.end+1] = h[1,...]. h[:] returns all contents without under/overflow bins and h[…] returns everything, including those special bins.

If no initial contents are provided, all axes track under/overflow by default. If contents are provided, an axis tracks under/overflow by default if the provided contents has two more bins in that dimension than axis.nbins. You can specify that certain axes will/will not track under/overflow with the track_overflow keyword. Note that attempting to access an underflow/overflow bin on an axis that is not tracked will result in an IndexError.

If sumw2 is not None, then the histogram will keep track of the sum of the weights squared. You should use this feature if you are using weighted data and are concerned about error propagation. You can access the sum of squared wieghts with h.sumw2[item], where item is interpreted the same way as in h[item]. h.bin_error[item] return the sqrt(sumw2) (or sqrt(contents) is sumw2 was not specified).

The binary operators +, -, * and / are supported for Histograms and correctly propagate error if sumw2 is present. The other operand can be a Histogram, a scalar or an array of appropiate size. Note that h += h0 is more efficient than h = h + h0 since latter involves the instantiation of a new histogram. Unary negation of a Histogram is supported, as is dividing a scalar by a Histogram.

Parameters:
  • edges (Axes or array) – Definition of bin edges, as anything that can be processes by Axes. Lower edge value is included in the bin, upper edge value is excluded.

  • contents (array-like or SparseArray) – Initialization of histogram contents. May include overflow/underflow bins if overflow is being tracked; if tracking is enabled and contents does not have these bins, they will be initialized to zeros. If omitted, creates an array of zeros.

  • sumw2 (None, bool or array) – If not None, the histogram will maintain squared weights associated with the elements of contents. These weights are initially zero if sumw2 = True but may instead be initialized explicitly with an array. Arithmetic between two histograms with squared weights propagates these weights to the result according to propagation-of-error theory.

  • labels (array of str) – Optionally label the axes for easier indexing

  • axis_scale (str or array) – Bin center mode e.g. “linear” or “log”. See Axis.axis_scale.

  • sparse (bool) – indicate if contents and sumw2 should be maintained as dense or sparse arrays. If specified, contents and sumw2 will be converted to the specified sparsity if needed (but attempting to densify a sparse matrix will fail to avoid unexpected memory blowups). If not specified, the Histogram’s sparsity follows that of the provided contents, or is dense if no contents are provided.

  • unit (Unit-like) – unit of contents; if not specified, inferred from contents if u.Quantity or None otherwise

  • track_overflow (bool, array-like, or dict) –

    Whether to allocate space to track the underflow and overflow bins. Acceptable forms include

    • a single boolean value (applies to all axes)

    • a 1-D array-like with a boolean value per axis

    • a dictionary specifying boolean values for a set of named/numbered axes. For axes not in the dict, the default is True.

    If this parameter is not provided, the default behavior depends on the value of the contents argument.

    • if contents is not provided, overflow is not tracked on any axis.

    • if contents is provided, each axis tracks overflow if contents includes overflow/underflow bins for that axis, i.e., if its size along that axis is two more than the axis’ number of bins.

  • dtype – Numpy datatype or None type of contents array; if None, use type of provided contents, or default (float64) if none provided.

  • copy_contents (bool) – if True (default), numpy arrays or Quantity arrays passed as contents and sumw2 will not be copied unless necessary; hence, the Histogram’s memory may alias these values.

set_sumw2(sumw2, copy=True)[source]

Set the sumw2 matrix to a Histogram to track the sum of error weights. If not None/False, sumw2 must be either an array-like with the same shape as contents or a Histogram with the same axes as contents. It will be coerced to have the same units, sparsity, dtype, and overflow tracking as the base Histogram.

Parameters:
  • sumw2 – values for weights: True for all zeros, or a Histogram, or an array-like; None or False to remove any existing sumw2

  • copy – bool if True (default), copy the object passed as sumw2; otherwise, create a view into the object if possible.

copy()[source]

Make a deep copy of a Histogram. The copy shares no writable members with the original; the only shared members are those that will never be mutated.

This function preserves subclass types if called from a derived class. Subclasses with additional data members may override this function; if they do not, their data members will be deepcopied.

astype(dtype, copy=True)[source]

Cast the contents and, if present, the sumw2 of a Histogram to a different data type. If the new type differs from the old type, we always return a copy; otherwise, we return a copy if copy=True or the original if copy=False.

track_overflow(track_overflow=None)[source]

Obtain an array specifying whether each axis is tracking underflows and overflows. If input is not None, adjust the track_overflow settings to those provided.

Parameters:

track_overflow (bool, array-like, or dict) – Optional. New overflow tracking settings

Returns:

np.ndarray with copy of current overflow tracking settings

We return a copy to external callers because it is unsafe to modify a live track_overflow array in place; any updates must be fed back to track_overflow (internally, to _update_track_overflow) to take effect.

to(unit, equivalencies=[], update=True, copy=True)[source]

Convert a Histogram to a different unit.

Parameters:
  • unit (unit-like) – Unit to convert to.

  • equivalencies (list or tuple) – A list of equivalence pairs to try if the units are not directly convertible.

  • update (bool) – If update is False, only the units will be changed without updating the contents accordingly

  • copy (bool) – If True (default), then the value is copied. Otherwise, a copy will only be made if necessary.

property is_sparse

Return True if the underlyying histogram contents array is sparse, or False if dense.

to_dense()[source]

Return a dense copy of a histogram

todense()

Return a dense copy of a histogram

to_sparse()[source]

Return a sparse copy of a histogram.

tosparse()

Return a sparse copy of a histogram.

property contents

Equivalent to h[:]. Does not include under and overflow bins.

property full_contents

Equivalent to h[...]. The size of each axis can be nbins or nbins+2, depending on the track_overflow parameters

property shape

Tuple with length of each axis

property axes

Underlying axes object

property axis

Equivalent to self.axes[0], but fails if ndim > 1

expand_dims(*args, **kwargs)[source]

Same as h.axes.expand_dims().

broadcast(*args, **kwargs)[source]

Same as h.axes.broadcast().

expand_dict(*args, **kwargs)[source]

Same as h.axes.expand_dict().

interp(*values, kind='linear')[source]

Perform multilinear interpolation of one or more values relative to the contents of this Histogram. The center of histogram bin (i1,…,in) is assumed to have the value h[i1,…,in] for purposes of interpolation.

If all axes of the histogram have log scale, multilinear interpolation is performed in the log domain. Hence, for example, interpolating a value halfway between two bin centers along a log-scale axis returns the geometric mean of the values in those bins. If log-domain interpolation is requested, the histogram’s contents should all be > 0 to avoid warnings or errors.

Interpolation will raise an error if called on a histogram containing both log- and linear/symmetric-scale axes, as the result of multilinear interpolation is ill-defined in this case.

Parameters:
  • values (scalar or array-like) –

    value(s) to interpolate If single value, may be ndim coordinates

    as separate arguments or a single array-like

    if multiple values, may be ndim array-likes

    of coordinates as separate arguments or a single array-like containing same

  • kind (string) – “linear” (default) if multilinear interpolation is to be done using this Histogram’s contents, or “log” if it is to be done on the log of the contents and then converted back to the linear domain.

Returns:

interpolated values (scalar or array of same shape as values)

fill(*values, weight=None, warn_overflow=True)[source]

Add an entry to the histogram. Can be weighted.

Follow same convention as find_bin()

Parameters:
  • values (float or array) – Value of entry

  • weight (float) – Value weight in histogram. Defaults to 1 in whatever units the histogram has

  • warn_overflow (bool) – Enable/disable warnings when an underflow or overflow occurs –i.e. when one or more of the input values falls beyond the range of the corresponding axis.

Note

Note that weight needs to be specified explicitly by key; otherwise it will be considered a value, and an IndexError will be thrown.

clear()[source]

Set all counts to 0

project(*axis)[source]

Return a histogram containing a projection of the current one.

Parameters:

axis (axis index/label or array-like of same) – axis or axes onto which the histogram will be projected. Omitted axes will be summed over. The axes of the projected histogram will have the order specified by this argument, so project() can be used to permute a Histogram’s axes (whether or not some are projected away).

Returns:

Projected histogram (a new object, not a view)

Return type:

Histogram

project_out(*axis)[source]

Return a histogram containing a projection that sums over the specified axes of the current one, leaving the rest intact.

Parameters:

axis (axis index/label or array-like of same) – axis or axes that will be projected out of the histogram. Omitted axes will be retained in their current order.

Returns:

Projected histogram (a new object, not a view)

Return type:

Histogram

static concatenate(edges, histograms, label=None, track_overflow=None)[source]

Generate a Histogram H from a list of histograms h_1 … h_n. We create a new first axis of length equal to the list and set H[i] = h_i.

For this operation to be well-defined, the axes of all input histograms must be equal, and they must all have the same sparsity; if any input has a unit, all must have compatible units. If any input is a subclass of Histogram, all must have the same subclass type.

If all inputs have sumw2, the output will as well; otherwise, all sumw2 values are discarded.

Generate a Histogram from a list of histograms. The axes of all input histograms must be equal, and the new histogram will have one more dimension than the input. The new axis has index 0. If histograms can be subclassed, all of them must have the same class type.

Parameters:
  • edges (Axes or array) – Definition of bin edges of the new dimension

  • histograms (list of Histogram) – List of histogram to fill contents. Might or might not include under/overflow bins.

  • labels (str) – Label the new dimension

  • track_overflow (bool) – Track underflow and overflow on the newly created axis. Defaults to True if number of histograms is 2 + # bins on new axis, or False otherwise.

Returns:

new object of the same type as histograms[0] (Histogram or subclass)

class OpType(value)[source]

Bases: Enum

An enumeration.

rebin(*ngroup)[source]

Rebin a histogram by grouping adjacent bins into one on each axis

If an axis does not have overflow tracking enabled, any partial group along that axis will be discarded. If it does have overflow tracking enabled, any partial group’s sum will be added to the axis’ underflow bin if it is on the left, or to the overflow bin if it is on the right.

For histograms with multiple axes, the result of rebinning is equivalent to rebinning the input on the first axis, then rebinning the result on the second axis, and so forth for all axes.

Parameters:

ngroup (int or array-like) – number of adjacent bins to combine for each axis. If this value is > 0 for an axis, binning starts from left side of contents, so the last partial group (if any) is on the right; if < 0, binning starts from right side, so last partial group (if any) is on the left.

Returns:

a new, rebinned Histogram

plot(ax=None, errorbars=None, colorbar=True, label_axes=True, **kwargs)[source]

Quick plot of the histogram contents.

Under/overflow bins are not included. Only 1D and 2D histograms are supported.

Histogram with a HealpixAxis will automatically be plotted as a map, passing all kwargs to mhealpy’s HealpixMap.plot()

Parameters:
  • ax (matplotlib.axes) – Axes on which to draw the histogram. A new one will be created by default.

  • errorbars (bool or None) – Include errorbars for 1D histograms. The default is to plot them if sumw2 is available

  • colorbar (bool) – Draw colorbar in 2D plots

  • label_axes (bool) – Label plots axes. Histogram axes must be labeled.

  • **kwargs – Passed to matplotlib.errorbar() (1D) or matplotlib.pcolormesh (2D)

draw(ax=None, errorbars=None, colorbar=True, label_axes=True, **kwargs)

Quick plot of the histogram contents.

Under/overflow bins are not included. Only 1D and 2D histograms are supported.

Histogram with a HealpixAxis will automatically be plotted as a map, passing all kwargs to mhealpy’s HealpixMap.plot()

Parameters:
  • ax (matplotlib.axes) – Axes on which to draw the histogram. A new one will be created by default.

  • errorbars (bool or None) – Include errorbars for 1D histograms. The default is to plot them if sumw2 is available

  • colorbar (bool) – Draw colorbar in 2D plots

  • label_axes (bool) – Label plots axes. Histogram axes must be labeled.

  • **kwargs – Passed to matplotlib.errorbar() (1D) or matplotlib.pcolormesh (2D)

fit(f, lo_lim=None, hi_lim=None, **kwargs)[source]

Fit histogram data using least squares.

This is a convenient call to scipy.optimize.curve_fit. Sigma corresponds to the output of h.bin_error. Empty bins (e.g. error equals 0) are ignored

Parameters:
  • f (callable) – Function f(x),… that takes the independent variable x as first argument, and followed by the parameters to be fitted. For a k-dimensional histogram is should handle arrays of shape (k,) or (k,N). The inputs and outputs must be unitless.

  • lo_lim (float or array) – Low axis limit to fit. One value per axis.

  • lo_lim – High axis limit to fit. One value per axis.

  • **kwargs – Passed to scipy.optimize.curve_fit

write(filename, name='hist', overwrite=False)[source]

Write histogram to a group in an HDF5 file. Appended if the file already exists.

Parameters:
  • filename (str) – Path to file

  • name (str) – Name of group to save histogram (can be any HDF5 path)

  • overwrite (str) – Delete and overwrite group if already exists.

classmethod open(filename, name='hist')[source]

Read a Histogram from a specified group in an HDF5 file.