nums.numpy.api.stats module

nums.numpy.api.stats.average(a, axis=None, weights=None, returned=False)[source]

Compute the weighted average along the specified axis.

This docstring was copied from numpy.average.

Some inconsistencies with the NumS version may exist.

Compute the weighted average along the specified axis.

Parameters
  • a (BlockArray) – Array containing data to be averaged. If a is not an array, a conversion is attempted.

  • axis (None or int or tuple of ints, optional) – Axis or axes along which to average a. The default, axis=None, will average over all of the elements of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, averaging is performed on all of the axes specified in the tuple instead of a single axis or all the axes as before.

  • weights (BlockArray, optional) –

    An array of weights associated with the values in a. Each value in a contributes to the average according to its associated weight. The weights array can either be 1-D (in which case its length must be the size of a along the given axis) or of the same shape as a. If weights=None, then all data in a are assumed to have a weight equal to one. The 1-D calculation is:

    avg = sum(a * weights) / sum(weights)
    

    The only constraint on weights is that sum(weights) must not be 0.

  • returned (bool, optional) – Default is False. If True, the tuple (average, sum_of_weights) is returned, otherwise only the average is returned. If weights=None, sum_of_weights is equivalent to the number of elements over which the average is taken.

Returns

retval, [sum_of_weights] – Return the average along the specified axis. When returned is True, return a tuple with the average as the first element and the sum of the weights as the second element. sum_of_weights is of the same type as retval. The result dtype follows a genereal pattern. If weights is None, the result dtype will be that of a , or float64 if a is integral. Otherwise, if weights is not None and a is non- integral, the result type will be the type of lowest precision capable of representing values of both a and weights. If a happens to be integral, the previous rules still applies but the result dtype will at least be float.

Return type

array_type or double

Raises
  • ZeroDivisionError – When all weights along axis are zero. See numpy.ma.average for a version robust to this type of error.

  • TypeError – When the length of 1D weights is not the same as the shape of a along axis.

See also

mean

Notes

Only single ‘axis’ is currently supported.

1D weights broadcasting is currently not supported.

Weights along one or more axes sum to zero.

Examples

The doctests shown below are copied from NumPy. They won’t show the correct result until you operate get().

>>> data = nps.arange(1, 5)  
>>> data.get()  
array([1, 2, 3, 4])
>>> nps.average(data).get()  
array(2.5)
>>> data = nps.arange(6).reshape((3,2))  
>>> data.get()  
array([[0, 1],
       [2, 3],
       [4, 5]])
nums.numpy.api.stats.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, dtype=None)[source]

Estimate a covariance matrix, given data and weights.

This docstring was copied from numpy.cov.

Some inconsistencies with the NumS version may exist.

Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples, \(X = [x_1, x_2, ... x_N]^T\), then the covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\).

See the notes for an outline of the algorithm.

Parameters
  • m (BlockArray) – A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.

  • y (BlockArray, optional) – An additional set of variables and observations. y has the same form as that of m.

  • rowvar (bool, optional) – If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

  • bias (bool, optional) – Default normalization (False) is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N. These values can be overridden by using the keyword ddof in numpy versions >= 1.5.

  • ddof (int, optional) – If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average. See the notes for the details. The default value is None.

  • fweights (BlockArray, int, optional) – 1-D array of integer frequency weights; the number of times each observation vector should be repeated.

  • aweights (BlockArray, optional) – 1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

Returns

out – The covariance matrix of the variables.

Return type

BlockArray

See also

corrcoef

Normalized covariance matrix

Notes

Assume that the observations are in the columns of the observation array m and let f = fweights and a = aweights for brevity. The steps to compute the weighted covariance are as follows:

>>> m = nps.arange(10, dtype=nps.float64)  
>>> f = nps.arange(10) * 2  
>>> a = nps.arange(10) ** 2.  
>>> ddof = 1  
>>> w = f * a  
>>> v1 = nps.sum(w)  
>>> v2 = nps.sum(w * a)  
>>> m -= nps.sum(m * w, axis=None, keepdims=True) / v1  
>>> cov = nps.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)  

Note that when a == 1, the normalization factor v1 / (v1**2 - ddof * v2) goes over to 1 / (nps.sum(f) - ddof).get() as it should.

y, ddof, fweights, and aweights are not supported.

Only 2-dimensional arrays are supported.

Examples

The doctests shown below are copied from NumPy. They won’t show the correct result until you operate get().

Consider two variables, \(x_0\) and \(x_1\), which correlate perfectly, but in opposite directions:

>>> x = nps.array([[0, 2], [1, 1], [2, 0]]).T  
>>> x.get()  
array([[0, 1, 2],
       [2, 1, 0]])

Note how \(x_0\) increases while \(x_1\) decreases. The covariance matrix shows this clearly:

>>> nps.cov(x).get()  
array([[ 1., -1.],
       [-1.,  1.]])

Note that element \(C_{0,1}\), which shows the correlation between \(x_0\) and \(x_1\), is negative.

nums.numpy.api.stats.mean(a, axis=None, dtype=None, out=None, keepdims=False)[source]

Compute the arithmetic mean along the specified axis.

This docstring was copied from numpy.mean.

Some inconsistencies with the NumS version may exist.

Parameters
  • a (BlockArray) – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.

  • axis (None or int or tuple of ints, optional) – Axis or axes along which the means are computed. The default is to compute the mean of the flattened array. If this is a tuple of ints, a mean is performed over multiple axes, instead of a single axis or all the axes as before.

  • dtype (data-type, optional) – Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.

  • out (BlockArray, optional) – Alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See ufuncs-output-type for more details.

  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array. If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of BlockArray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

Returns

m – If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.

Return type

BlockArray, see dtype parameter above

See also

average

Weighted average

std, var, nanmean, nanstd, nanvar

Notes

The arithmetic mean is the sum of the elements along the axis divided by the number of elements.

Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.

By default, float16 results are computed using float32 intermediates for extra precision.

Examples

The doctests shown below are copied from NumPy. They won’t show the correct result until you operate get().

>>> a = nps.array([[1, 2], [3, 4]])  
>>> nps.mean(a).get()  
array(2.5)
>>> nps.mean(a, axis=0).get()  
array([2., 3.])
>>> nps.mean(a, axis=1).get()  
array([1.5, 3.5])
nums.numpy.api.stats.median(a, axis=None, out=None, keepdims=False)[source]

Compute the median along the specified axis.

This docstring was copied from numpy.median.

Some inconsistencies with the NumS version may exist.

Returns the median of the array elements.

Parameters
  • a (BlockArray) – Input array or object that can be converted to an array.

  • axis ({int, sequence of int, None}, optional) – Axis or axes along which the medians are computed. The default is to compute the median along a flattened version of the array.

  • out (BlockArray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output, but the type (of the output) will be cast if necessary.

  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns

median – A new array holding the result. If the input contains integers or floats smaller than float64, then the output data-type is nps.float64. Otherwise, the data-type of the output is the same as that of the input. If out is specified, that array is returned instead.

Return type

BlockArray

See also

mean, percentile

Notes

Given a vector V of length N, the median of V is the middle value of a sorted copy of V, V_sorted - i e., V_sorted[(N-1)/2], when N is odd, and the average of the two middle values of V_sorted when N is even.

‘axis’ is currently not supported.

‘out’ is currently not supported.

‘keepdims’ is currently not supported.

nums.numpy.api.stats.percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)[source]

Compute the q-th percentile of the data along the specified axis.

This docstring was copied from numpy.percentile.

Some inconsistencies with the NumS version may exist.

Returns the q-th percentile(s) of the array elements.

Parameters
  • a (BlockArray) – Input array or object that can be converted to an array.

  • q (float) – Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.

  • axis ({int, tuple of int, None}, optional) – Axis or axes along which the percentiles are computed. The default is to compute the percentile(s) along a flattened version of the array.

  • out (BlockArray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output, but the type (of the output) will be cast if necessary.

  • overwrite_input (bool, optional) – If True, then allow the input array a to be modified by intermediate calculations, to save memory. In this case, the contents of the input a after this function completes is undefined.

  • interpolation ({'linear', 'lower', 'higher', 'midpoint', 'nearest'}) –

    This optional parameter specifies the interpolation method to use when the desired percentile lies between two data points i < j:

    • ’linear’: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

    • ’lower’: i.

    • ’higher’: j.

    • ’nearest’: i or j, whichever is nearest.

    • ’midpoint’: (i + j) / 2.

  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array a.

Returns

percentile – If q is a single percentile and axis=None, then the result is a scalar. If multiple percentiles are given, first axis of the result corresponds to the percentiles. The other axes are the axes that remain after the reduction of a. If the input contains integers or floats smaller than float64, the output data-type is float64. Otherwise, the output data-type is the same as that of the input. If out is specified, that array is returned instead.

Return type

BlockArray

See also

mean

median

equivalent to percentile(..., 50)

nanpercentile

quantile

equivalent to percentile, except with q in the range [0, 1].

Notes

Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the minimum to the maximum in a sorted copy of V. The values and distances of the two nearest neighbors as well as the interpolation parameter will determine the percentile if the normalized ranking does not match the location of q exactly. This function is the same as the median if q=50, the same as the minimum if q=0 and the same as the maximum if q=100.

‘axis’ is currently not supported.

‘out’ is currently not supported.

‘overwrite_input’ is currently not supported.

only ‘linear’ ‘interpolation’ is currently supported.

‘keepdims’ is currently not supported.

Examples

The doctests shown below are copied from NumPy. They won’t show the correct result until you operate get().

>>> a = nps.array([[10, 7, 4], [3, 2, 1]])  
>>> a.get()  
array([[10,  7,  4],
       [ 3,  2,  1]])
nums.numpy.api.stats.quantile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)[source]

Compute the q-th quantile of the data along the specified axis.

This docstring was copied from numpy.quantile.

Some inconsistencies with the NumS version may exist.

Parameters
  • a (BlockArray) – Input array or object that can be converted to an array.

  • q (BlockArray of float) – Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive.

  • axis ({int, tuple of int, None}, optional) – Axis or axes along which the quantiles are computed. The default is to compute the quantile(s) along a flattened version of the array.

  • out (BlockArray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output, but the type (of the output) will be cast if necessary.

  • overwrite_input (bool, optional) – If True, then allow the input array a to be modified by intermediate calculations, to save memory. In this case, the contents of the input a after this function completes is undefined.

  • interpolation ({'linear', 'lower', 'higher', 'midpoint', 'nearest'}) –

    This optional parameter specifies the interpolation method to use when the desired quantile lies between two data points i < j:

    • linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

    • lower: i.

    • higher: j.

    • nearest: i or j, whichever is nearest.

    • midpoint: (i + j) / 2.

  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array a.

Returns

quantile – If q is a single quantile and axis=None, then the result is a scalar. If multiple quantiles are given, first axis of the result corresponds to the quantiles. The other axes are the axes that remain after the reduction of a. If the input contains integers or floats smaller than float, the output data-type is float. Otherwise, the output data-type is the same as that of the input. If out is specified, that array is returned instead.

Return type

BlockArray

See also

mean

percentile

equivalent to quantile, but with q in the range [0, 100].

median

equivalent to quantile(..., 0.5)

nanquantile

Notes

Given a vector V of length N, the q-th quantile of V is the value q of the way from the minimum to the maximum in a sorted copy of V. The values and distances of the two nearest neighbors as well as the interpolation parameter will determine the quantile if the normalized ranking does not match the location of q exactly. This function is the same as the median if q=0.5, the same as the minimum if q=0.0 and the same as the maximum if q=1.0.

‘axis’ is currently not supported.

‘out’ is currently not supported.

‘overwrite_input’ is currently not supported.

only ‘linear’ ‘interpolation’ is currently supported.

‘keepdims’ is currently not supported.

Examples

The doctests shown below are copied from NumPy. They won’t show the correct result until you operate get().

>>> a = nps.array([[10, 7, 4], [3, 2, 1]])  
>>> a.get()  
array([[10,  7,  4],
       [ 3,  2,  1]])
nums.numpy.api.stats.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)[source]

Compute the standard deviation along the specified axis.

This docstring was copied from numpy.std.

Some inconsistencies with the NumS version may exist.

Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.

Parameters
  • a (BlockArray) – Calculate the standard deviation of these values.

  • axis (None or int or tuple of ints, optional) – Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array. If this is a tuple of ints, a standard deviation is performed over multiple axes, instead of a single axis or all the axes as before.

  • dtype (dtype, optional) – Type to use in computing the standard deviation. For arrays of integer type the default is None.

  • out (BlockArray, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output but the type (of the calculated values) will be cast if necessary.

  • ddof (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array. If the default value is passed, then keepdims will not be passed through to the std method of sub-classes of BlockArray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

Returns

standard_deviation – If out is None, return a new array containing the standard deviation, otherwise return a reference to the output array.

Return type

BlockArray, see dtype parameter above.

See also

var, mean, nanmean, nanstd, nanvar

Notes

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(abs(x - x.mean())**2)).

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.

‘out’ is currently not supported.

Examples

The doctests shown below are copied from NumPy. They won’t show the correct result until you operate get().

>>> a = nps.array([[1, 2], [3, 4]])  
>>> nps.std(a).get()  
array(1.1180339887498949) # may vary
>>> nps.std(a, axis=0).get()  
array([1.,  1.])
>>> nps.std(a, axis=1).get()  
array([0.5,  0.5])
nums.numpy.api.stats.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)[source]

Compute the variance along the specified axis.

This docstring was copied from numpy.var.

Some inconsistencies with the NumS version may exist.

Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.

Parameters
  • a (BlockArray) – Array containing numbers whose variance is desired. If a is not an array, a conversion is attempted.

  • axis (None or int or tuple of ints, optional) – Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array. If this is a tuple of ints, a variance is performed over multiple axes, instead of a single axis or all the axes as before.

  • dtype (data-type, optional) – Type to use in computing the variance. For arrays of integer type the default is float; for arrays of float types it is the same as the array type.

  • out (BlockArray, optional) – Alternate output array in which to place the result. It must have the same shape as the expected output, but the type is cast if necessary.

  • ddof (int, optional) – “Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero.

  • keepdims (bool, optional) –

    If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

    If the default value is passed, then keepdims will not be passed through to the var method of sub-classes of BlockArray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

Returns

variance – If out=None, returns a new array containing the variance; otherwise, a reference to the output array is returned.

Return type

BlockArray, see dtype parameter above

See also

std, mean, nanmean, nanstd, nanvar

Notes

The variance is the average of the squared deviations from the mean, i.e., var = mean(abs(x - x.mean())**2).

The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

Note that for complex numbers, the absolute value is taken before squaring, so that the result is always real and nonnegative.

‘out’ is currently not supported.

Examples

The doctests shown below are copied from NumPy. They won’t show the correct result until you operate get().

>>> a = nps.array([[1, 2], [3, 4]]) 
>>> nps.var(a).get()  
array(1.25)
>>> nps.var(a, axis=0).get()  
array([1.,  1.])
>>> nps.var(a, axis=1).get()  
array([0.25,  0.25])