nums.numpy.api.stats module
-
nums.numpy.api.stats.
average
(a, axis=None, weights=None, returned=False)[source] Compute the weighted average along the specified axis.
This docstring was copied from numpy.average.
Some inconsistencies with the NumS version may exist.
Compute the weighted average along the specified axis.
- Parameters
a (BlockArray) – Array containing data to be averaged. If a is not an array, a conversion is attempted.
axis (None or int or tuple of ints, optional) – Axis or axes along which to average a. The default, axis=None, will average over all of the elements of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, averaging is performed on all of the axes specified in the tuple instead of a single axis or all the axes as before.
weights (BlockArray, optional) –
An array of weights associated with the values in a. Each value in a contributes to the average according to its associated weight. The weights array can either be 1-D (in which case its length must be the size of a along the given axis) or of the same shape as a. If weights=None, then all data in a are assumed to have a weight equal to one. The 1-D calculation is:
avg = sum(a * weights) / sum(weights)
The only constraint on weights is that sum(weights) must not be 0.
returned (bool, optional) – Default is False. If True, the tuple (average, sum_of_weights) is returned, otherwise only the average is returned. If weights=None, sum_of_weights is equivalent to the number of elements over which the average is taken.
- Returns
retval, [sum_of_weights] – Return the average along the specified axis. When returned is True, return a tuple with the average as the first element and the sum of the weights as the second element. sum_of_weights is of the same type as retval. The result dtype follows a genereal pattern. If weights is None, the result dtype will be that of a , or
float64
if a is integral. Otherwise, if weights is not None and a is non- integral, the result type will be the type of lowest precision capable of representing values of both a and weights. If a happens to be integral, the previous rules still applies but the result dtype will at least befloat
.- Return type
array_type or double
- Raises
ZeroDivisionError – When all weights along axis are zero. See numpy.ma.average for a version robust to this type of error.
TypeError – When the length of 1D weights is not the same as the shape of a along axis.
See also
Notes
Only single ‘axis’ is currently supported.
1D weights broadcasting is currently not supported.
Weights along one or more axes sum to zero.
Examples
The doctests shown below are copied from NumPy. They won’t show the correct result until you operate
get()
.>>> data = nps.arange(1, 5) >>> data.get() array([1, 2, 3, 4]) >>> nps.average(data).get() array(2.5)
>>> data = nps.arange(6).reshape((3,2)) >>> data.get() array([[0, 1], [2, 3], [4, 5]])
-
nums.numpy.api.stats.
cov
(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, dtype=None)[source] Estimate a covariance matrix, given data and weights.
This docstring was copied from numpy.cov.
Some inconsistencies with the NumS version may exist.
Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples, \(X = [x_1, x_2, ... x_N]^T\), then the covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\).
See the notes for an outline of the algorithm.
- Parameters
m (BlockArray) – A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.
y (BlockArray, optional) – An additional set of variables and observations. y has the same form as that of m.
rowvar (bool, optional) – If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.
bias (bool, optional) – Default normalization (False) is by
(N - 1)
, whereN
is the number of observations given (unbiased estimate). If bias is True, then normalization is byN
. These values can be overridden by using the keywordddof
in numpy versions >= 1.5.ddof (int, optional) – If not
None
the default value implied by bias is overridden. Note thatddof=1
will return the unbiased estimate, even if both fweights and aweights are specified, andddof=0
will return the simple average. See the notes for the details. The default value isNone
.fweights (BlockArray, int, optional) – 1-D array of integer frequency weights; the number of times each observation vector should be repeated.
aweights (BlockArray, optional) – 1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If
ddof=0
the array of weights can be used to assign probabilities to observation vectors.
- Returns
out – The covariance matrix of the variables.
- Return type
See also
corrcoef
Normalized covariance matrix
Notes
Assume that the observations are in the columns of the observation array m and let
f = fweights
anda = aweights
for brevity. The steps to compute the weighted covariance are as follows:>>> m = nps.arange(10, dtype=nps.float64) >>> f = nps.arange(10) * 2 >>> a = nps.arange(10) ** 2. >>> ddof = 1 >>> w = f * a >>> v1 = nps.sum(w) >>> v2 = nps.sum(w * a) >>> m -= nps.sum(m * w, axis=None, keepdims=True) / v1 >>> cov = nps.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)
Note that when
a == 1
, the normalization factorv1 / (v1**2 - ddof * v2)
goes over to1 / (nps.sum(f) - ddof).get()
as it should.y, ddof, fweights, and aweights are not supported.
Only 2-dimensional arrays are supported.
Examples
The doctests shown below are copied from NumPy. They won’t show the correct result until you operate
get()
.Consider two variables, \(x_0\) and \(x_1\), which correlate perfectly, but in opposite directions:
>>> x = nps.array([[0, 2], [1, 1], [2, 0]]).T >>> x.get() array([[0, 1, 2], [2, 1, 0]])
Note how \(x_0\) increases while \(x_1\) decreases. The covariance matrix shows this clearly:
>>> nps.cov(x).get() array([[ 1., -1.], [-1., 1.]])
Note that element \(C_{0,1}\), which shows the correlation between \(x_0\) and \(x_1\), is negative.
-
nums.numpy.api.stats.
mean
(a, axis=None, dtype=None, out=None, keepdims=False)[source] Compute the arithmetic mean along the specified axis.
This docstring was copied from numpy.mean.
Some inconsistencies with the NumS version may exist.
- Parameters
a (BlockArray) – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
axis (None or int or tuple of ints, optional) – Axis or axes along which the means are computed. The default is to compute the mean of the flattened array. If this is a tuple of ints, a mean is performed over multiple axes, instead of a single axis or all the axes as before.
dtype (data-type, optional) – Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.
out (BlockArray, optional) – Alternate output array in which to place the result. The default is
None
; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See ufuncs-output-type for more details.keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array. If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of BlockArray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.
- Returns
m – If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.
- Return type
BlockArray, see dtype parameter above
Notes
The arithmetic mean is the sum of the elements along the axis divided by the number of elements.
Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.
By default, float16 results are computed using float32 intermediates for extra precision.
Examples
The doctests shown below are copied from NumPy. They won’t show the correct result until you operate
get()
.>>> a = nps.array([[1, 2], [3, 4]]) >>> nps.mean(a).get() array(2.5) >>> nps.mean(a, axis=0).get() array([2., 3.]) >>> nps.mean(a, axis=1).get() array([1.5, 3.5])
-
nums.numpy.api.stats.
median
(a, axis=None, out=None, keepdims=False)[source] Compute the median along the specified axis.
This docstring was copied from numpy.median.
Some inconsistencies with the NumS version may exist.
Returns the median of the array elements.
- Parameters
a (BlockArray) – Input array or object that can be converted to an array.
axis ({int, sequence of int, None}, optional) – Axis or axes along which the medians are computed. The default is to compute the median along a flattened version of the array.
out (BlockArray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output, but the type (of the output) will be cast if necessary.
keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
- Returns
median – A new array holding the result. If the input contains integers or floats smaller than
float64
, then the output data-type isnps.float64
. Otherwise, the data-type of the output is the same as that of the input. If out is specified, that array is returned instead.- Return type
See also
Notes
Given a vector
V
of lengthN
, the median ofV
is the middle value of a sorted copy ofV
,V_sorted
- i e.,V_sorted[(N-1)/2]
, whenN
is odd, and the average of the two middle values ofV_sorted
whenN
is even.‘axis’ is currently not supported.
‘out’ is currently not supported.
‘keepdims’ is currently not supported.
-
nums.numpy.api.stats.
percentile
(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)[source] Compute the q-th percentile of the data along the specified axis.
This docstring was copied from numpy.percentile.
Some inconsistencies with the NumS version may exist.
Returns the q-th percentile(s) of the array elements.
- Parameters
a (BlockArray) – Input array or object that can be converted to an array.
q (float) – Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.
axis ({int, tuple of int, None}, optional) – Axis or axes along which the percentiles are computed. The default is to compute the percentile(s) along a flattened version of the array.
out (BlockArray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output, but the type (of the output) will be cast if necessary.
overwrite_input (bool, optional) – If True, then allow the input array a to be modified by intermediate calculations, to save memory. In this case, the contents of the input a after this function completes is undefined.
interpolation ({'linear', 'lower', 'higher', 'midpoint', 'nearest'}) –
This optional parameter specifies the interpolation method to use when the desired percentile lies between two data points
i < j
:’linear’:
i + (j - i) * fraction
, wherefraction
is the fractional part of the index surrounded byi
andj
.’lower’:
i
.’higher’:
j
.’nearest’:
i
orj
, whichever is nearest.’midpoint’:
(i + j) / 2
.
keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array a.
- Returns
percentile – If q is a single percentile and axis=None, then the result is a scalar. If multiple percentiles are given, first axis of the result corresponds to the percentiles. The other axes are the axes that remain after the reduction of a. If the input contains integers or floats smaller than
float64
, the output data-type isfloat64
. Otherwise, the output data-type is the same as that of the input. If out is specified, that array is returned instead.- Return type
See also
median
equivalent to
percentile(..., 50)
nanpercentile
quantile
equivalent to percentile, except with q in the range [0, 1].
Notes
Given a vector
V
of lengthN
, the q-th percentile ofV
is the valueq/100
of the way from the minimum to the maximum in a sorted copy ofV
. The values and distances of the two nearest neighbors as well as the interpolation parameter will determine the percentile if the normalized ranking does not match the location ofq
exactly. This function is the same as the median ifq=50
, the same as the minimum ifq=0
and the same as the maximum ifq=100
.‘axis’ is currently not supported.
‘out’ is currently not supported.
‘overwrite_input’ is currently not supported.
only ‘linear’ ‘interpolation’ is currently supported.
‘keepdims’ is currently not supported.
Examples
The doctests shown below are copied from NumPy. They won’t show the correct result until you operate
get()
.>>> a = nps.array([[10, 7, 4], [3, 2, 1]]) >>> a.get() array([[10, 7, 4], [ 3, 2, 1]])
-
nums.numpy.api.stats.
quantile
(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)[source] Compute the q-th quantile of the data along the specified axis.
This docstring was copied from numpy.quantile.
Some inconsistencies with the NumS version may exist.
- Parameters
a (BlockArray) – Input array or object that can be converted to an array.
q (BlockArray of float) – Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive.
axis ({int, tuple of int, None}, optional) – Axis or axes along which the quantiles are computed. The default is to compute the quantile(s) along a flattened version of the array.
out (BlockArray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output, but the type (of the output) will be cast if necessary.
overwrite_input (bool, optional) – If True, then allow the input array a to be modified by intermediate calculations, to save memory. In this case, the contents of the input a after this function completes is undefined.
interpolation ({'linear', 'lower', 'higher', 'midpoint', 'nearest'}) –
This optional parameter specifies the interpolation method to use when the desired quantile lies between two data points
i < j
:linear:
i + (j - i) * fraction
, wherefraction
is the fractional part of the index surrounded byi
andj
.lower:
i
.higher:
j
.nearest:
i
orj
, whichever is nearest.midpoint:
(i + j) / 2
.
keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array a.
- Returns
quantile – If q is a single quantile and axis=None, then the result is a scalar. If multiple quantiles are given, first axis of the result corresponds to the quantiles. The other axes are the axes that remain after the reduction of a. If the input contains integers or floats smaller than
float
, the output data-type isfloat
. Otherwise, the output data-type is the same as that of the input. If out is specified, that array is returned instead.- Return type
See also
percentile
equivalent to quantile, but with q in the range [0, 100].
median
equivalent to
quantile(..., 0.5)
nanquantile
Notes
Given a vector
V
of lengthN
, the q-th quantile ofV
is the valueq
of the way from the minimum to the maximum in a sorted copy ofV
. The values and distances of the two nearest neighbors as well as the interpolation parameter will determine the quantile if the normalized ranking does not match the location ofq
exactly. This function is the same as the median ifq=0.5
, the same as the minimum ifq=0.0
and the same as the maximum ifq=1.0
.‘axis’ is currently not supported.
‘out’ is currently not supported.
‘overwrite_input’ is currently not supported.
only ‘linear’ ‘interpolation’ is currently supported.
‘keepdims’ is currently not supported.
Examples
The doctests shown below are copied from NumPy. They won’t show the correct result until you operate
get()
.>>> a = nps.array([[10, 7, 4], [3, 2, 1]]) >>> a.get() array([[10, 7, 4], [ 3, 2, 1]])
-
nums.numpy.api.stats.
std
(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)[source] Compute the standard deviation along the specified axis.
This docstring was copied from numpy.std.
Some inconsistencies with the NumS version may exist.
Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.
- Parameters
a (BlockArray) – Calculate the standard deviation of these values.
axis (None or int or tuple of ints, optional) – Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array. If this is a tuple of ints, a standard deviation is performed over multiple axes, instead of a single axis or all the axes as before.
dtype (dtype, optional) – Type to use in computing the standard deviation. For arrays of integer type the default is None.
out (BlockArray, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output but the type (of the calculated values) will be cast if necessary.
ddof (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations is
N - ddof
, whereN
represents the number of elements. By default ddof is zero.keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array. If the default value is passed, then keepdims will not be passed through to the std method of sub-classes of BlockArray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.
- Returns
standard_deviation – If out is None, return a new array containing the standard deviation, otherwise return a reference to the output array.
- Return type
BlockArray, see dtype parameter above.
Notes
The standard deviation is the square root of the average of the squared deviations from the mean, i.e.,
std = sqrt(mean(abs(x - x.mean())**2))
.The average squared deviation is normally calculated as
x.sum() / N
, whereN = len(x)
. If, however, ddof is specified, the divisorN - ddof
is used instead. In standard statistical practice,ddof=1
provides an unbiased estimator of the variance of the infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even withddof=1
, it will not be an unbiased estimate of the standard deviation per se.Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.
‘out’ is currently not supported.
Examples
The doctests shown below are copied from NumPy. They won’t show the correct result until you operate
get()
.>>> a = nps.array([[1, 2], [3, 4]]) >>> nps.std(a).get() array(1.1180339887498949) # may vary >>> nps.std(a, axis=0).get() array([1., 1.]) >>> nps.std(a, axis=1).get() array([0.5, 0.5])
-
nums.numpy.api.stats.
var
(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)[source] Compute the variance along the specified axis.
This docstring was copied from numpy.var.
Some inconsistencies with the NumS version may exist.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
- Parameters
a (BlockArray) – Array containing numbers whose variance is desired. If a is not an array, a conversion is attempted.
axis (None or int or tuple of ints, optional) – Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array. If this is a tuple of ints, a variance is performed over multiple axes, instead of a single axis or all the axes as before.
dtype (data-type, optional) – Type to use in computing the variance. For arrays of integer type the default is float; for arrays of float types it is the same as the array type.
out (BlockArray, optional) – Alternate output array in which to place the result. It must have the same shape as the expected output, but the type is cast if necessary.
ddof (int, optional) – “Delta Degrees of Freedom”: the divisor used in the calculation is
N - ddof
, whereN
represents the number of elements. By default ddof is zero.keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be passed through to the var method of sub-classes of BlockArray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.
- Returns
variance – If
out=None
, returns a new array containing the variance; otherwise, a reference to the output array is returned.- Return type
BlockArray, see dtype parameter above
Notes
The variance is the average of the squared deviations from the mean, i.e.,
var = mean(abs(x - x.mean())**2)
.The mean is normally calculated as
x.sum() / N
, whereN = len(x)
. If, however, ddof is specified, the divisorN - ddof
is used instead. In standard statistical practice,ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables.Note that for complex numbers, the absolute value is taken before squaring, so that the result is always real and nonnegative.
‘out’ is currently not supported.
Examples
The doctests shown below are copied from NumPy. They won’t show the correct result until you operate
get()
.>>> a = nps.array([[1, 2], [3, 4]]) >>> nps.var(a).get() array(1.25) >>> nps.var(a, axis=0).get() array([1., 1.]) >>> nps.var(a, axis=1).get() array([0.25, 0.25])