Documentation 0.16.0. > API Reference > stats

stats
- Module: stats.analysis

`stats`

Module: `stats.analysis`

`AveragePointwiseEuclideanMetric`	Computes the average of pointwise Euclidean distances between two sequential data.
`QuickBundles`(threshold[, metric, …])	Clusters streamlines using QuickBundles [Garyfallidis12].
`Streamlines`	alias of `nibabel.streamlines.array_sequence.ArraySequence`
`cKDTree`(data[, leafsize, compact_nodes, …])	kd-tree for quick nearest-neighbor lookup
`bundle_analysis`(model_bundle_folder, …[, …])	Applies statistical analysis on bundles and saves the results in a directory specified by `out_dir`.
`dti_measures`(bundle, metric, dt, pname, …)	Calculates dti measure (eg: FA, MD) per point on streamlines and save it in hd5 file.
`load_nifti`(fname[, return_img, …])
`load_peaks`(fname[, verbose])	Load a PeaksAndMetrics HDF5 file (PAM5)
`load_trk`(filename[, lazy_load])	Loads tractogram files (*.tck)
`map_coordinates`(input, coordinates[, …])	Map the input array to new coordinates by interpolation.
`optional_package`(name[, trip_msg])	Return package-like thing and module setup for package name
`peak_values`(bundle, peaks, dt, pname, bname, …)	Peak_values function finds the peak direction and peak value of a point on a streamline used while tracking (generating the tractogram) and save it in hd5 file.
`set_number_of_points`	Change the number of points of streamlines
`transform_streamlines`(streamlines, mat[, …])	Apply affine transformation to streamlines

`AveragePointwiseEuclideanMetric`

class dipy.stats.analysis.AveragePointwiseEuclideanMetric

Bases: dipy.segment.metricspeed.SumPointwiseEuclideanMetric

Computes the average of pointwise Euclidean distances between two sequential data.

A sequence of N-dimensional points is represented as a 2D array with shape (nb_points, nb_dimensions). A feature object can be specified in order to calculate the distance between the features, rather than directly between the sequential data.

Parameters:	feature : Feature object, optional It is used to extract features before computing the distance.

Notes

The distance between two 2D sequential data:

s1       s2

0*   a    *0
  \       |
   \      |
   1*     |
    |  b  *1
    |      \
    2*      \
        c    *2

is equal to $(a+b+c)/3$ where $a$ is the Euclidean distance between s1[0] and s2[0], $b$ between s1[1] and s2[1] and $c$ between s1[2] and s2[2].

Attributes:	`feature` Feature object used to extract features from sequential data `is_order_invariant` Is this metric invariant to the sequence’s ordering

Methods

`are_compatible`	Checks if features can be used by metric.dist based on their shape.
`dist`	Computes a distance between two data points based on their features.

__init__($self, /, *args, **kwargs): Initialize self. See help(type(self)) for accurate signature.

`QuickBundles`

class dipy.stats.analysis.QuickBundles(threshold, metric='MDF_12points', max_nb_clusters=2147483647)

Bases: dipy.segment.clustering.Clustering

Clusters streamlines using QuickBundles [Garyfallidis12].

Given a list of streamlines, the QuickBundles algorithm sequentially assigns each streamline to its closest bundle in $\mathcal{O}(Nk)$ where $N$ is the number of streamlines and $k$ is the final number of bundles. If for a given streamline its closest bundle is farther than threshold, a new bundle is created and the streamline is assigned to it except if the number of bundles has already exceeded max_nb_clusters.

Parameters:

Parameters:	threshold : float The maximum distance from a bundle for a streamline to be still considered as part of it. metric : str or Metric object (optional) The distance metric to use when comparing two streamlines. By default, the Minimum average Direct-Flip (MDF) distance [Garyfallidis12] is used and streamlines are automatically resampled so they have 12 points. max_nb_clusters : int Limits the creation of bundles.

threshold : float: The maximum distance from a bundle for a streamline to be still considered as part of it.
metric : str or Metric object (optional): The distance metric to use when comparing two streamlines. By default, the Minimum average Direct-Flip (MDF) distance [Garyfallidis12] is used and streamlines are automatically resampled so they have 12 points.
max_nb_clusters : int: Limits the creation of bundles.

References

[Garyfallidis12]

(1, 2, 3, 4) Garyfallidis E. et al., QuickBundles a method for tractography simplification, Frontiers in Neuroscience, vol 6, no 175, 2012.

Examples

>>> from dipy.segment.clustering import QuickBundles
>>> from dipy.data import get_fnames
>>> from nibabel import trackvis as tv
>>> streams, hdr = tv.read(get_fnames('fornix'))
>>> streamlines = [i[0] for i in streams]
>>> # Segment fornix with a treshold of 10mm and streamlines resampled
>>> # to 12 points.
>>> qb = QuickBundles(threshold=10.)
>>> clusters = qb.cluster(streamlines)
>>> len(clusters)
4
>>> list(map(len, clusters))
[61, 191, 47, 1]
>>> # Resampling streamlines differently is done explicitly as follows.
>>> # Note this has an impact on the speed and the accuracy (tradeoff).
>>> from dipy.segment.metric import ResampleFeature
>>> from dipy.segment.metric import AveragePointwiseEuclideanMetric
>>> feature = ResampleFeature(nb_points=2)
>>> metric = AveragePointwiseEuclideanMetric(feature)
>>> qb = QuickBundles(threshold=10., metric=metric)
>>> clusters = qb.cluster(streamlines)
>>> len(clusters)
4
>>> list(map(len, clusters))
[58, 142, 72, 28]

Methods

cluster(streamlines[, ordering]) Clusters streamlines into bundles.

__init__(threshold, metric='MDF_12points', max_nb_clusters=2147483647): Initialize self. See help(type(self)) for accurate signature.

cluster(streamlines, ordering=None)

Clusters streamlines into bundles.

Performs quickbundles algorithm using predefined metric and threshold.

Parameters:	streamlines : list of 2D arrays Each 2D array represents a sequence of 3D points (points, 3). ordering : iterable of indices Specifies the order in which data points will be clustered.
Returns:	`ClusterMapCentroid` object Result of the clustering.

`Streamlines`

dipy.stats.analysis.Streamlines: alias of nibabel.streamlines.array_sequence.ArraySequence

`cKDTree`

class dipy.stats.analysis.cKDTree(data, leafsize=16, compact_nodes=True, copy_data=False, balanced_tree=True)

Bases: object

kd-tree for quick nearest-neighbor lookup

This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.

The algorithm used is described in Maneewongvatana and Mount 1999. The general idea is that the kd-tree is a binary trie, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.

During construction, the axis and splitting point are chosen by the “sliding midpoint” rule, which ensures that the cells do not all become long and thin.

The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors.

For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.

Parameters:

Parameters:	data : array_like, shape (n,m) The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles, and so modifying this data will result in bogus results. The data are also copied if the kd-tree is built with copy_data=True. leafsize : positive int, optional The number of points at which the algorithm switches over to brute-force. Default: 16. compact_nodes : bool, optional If True, the kd-tree is built to shrink the hyperrectangles to the actual data range. This usually gives a more compact tree that is robust against degenerated input data and gives faster queries at the expense of longer build time. Default: True. copy_data : bool, optional If True the data is always copied to protect the kd-tree against data corruption. Default: False. balanced_tree : bool, optional If True, the median is used to split the hyperrectangles instead of the midpoint. This usually gives a more compact tree and faster queries at the expense of longer build time. Default: True. boxsize : array_like or scalar, optional Apply a m-d toroidal topology to the KDTree.. The topology is generated by \(x_i + n_i L_i\) where \(n_i\) are integers and \(L_i\) is the boxsize along i-th dimension. The input data shall be wrapped into \([0, L_i)\). A ValueError is raised if any of the data is outside of this bound.

data : array_like, shape (n,m): The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles, and so modifying this data will result in bogus results. The data are also copied if the kd-tree is built with copy_data=True.
leafsize : positive int, optional: The number of points at which the algorithm switches over to brute-force. Default: 16.
compact_nodes : bool, optional: If True, the kd-tree is built to shrink the hyperrectangles to the actual data range. This usually gives a more compact tree that is robust against degenerated input data and gives faster queries at the expense of longer build time. Default: True.
copy_data : bool, optional: If True the data is always copied to protect the kd-tree against data corruption. Default: False.
balanced_tree : bool, optional: If True, the median is used to split the hyperrectangles instead of the midpoint. This usually gives a more compact tree and faster queries at the expense of longer build time. Default: True.
boxsize : array_like or scalar, optional: Apply a m-d toroidal topology to the KDTree.. The topology is generated by $x_i + n_i L_i$ where $n_i$ are integers and $L_i$ is the boxsize along i-th dimension. The input data shall be wrapped into $[0, L_i)$. A ValueError is raised if any of the data is outside of this bound.

See also

KDTree: Implementation of cKDTree in pure Python

Attributes:

Attributes:	data : ndarray, shape (n,m) The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles. The data are also copied if the kd-tree is built with copy_data=True. leafsize : positive int The number of points at which the algorithm switches over to brute-force. m : int The dimension of a single data-point. n : int The number of data points. maxes : ndarray, shape (m,) The maximum value in each dimension of the n data points. mins : ndarray, shape (m,) The minimum value in each dimension of the n data points. tree : object, class cKDTreeNode This class exposes a Python view of the root node in the cKDTree object. size : int The number of nodes in the tree.

data : ndarray, shape (n,m): The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles. The data are also copied if the kd-tree is built with copy_data=True.
leafsize : positive int: The number of points at which the algorithm switches over to brute-force.
m : int: The dimension of a single data-point.
n : int: The number of data points.
maxes : ndarray, shape (m,): The maximum value in each dimension of the n data points.
mins : ndarray, shape (m,): The minimum value in each dimension of the n data points.
tree : object, class cKDTreeNode: This class exposes a Python view of the root node in the cKDTree object.
size : int: The number of nodes in the tree.

Methods

`count_neighbors`(self, other, r[, p, …])	Count how many nearby pairs can be formed.
`query`(self, x[, k, eps, p, …])	Query the kd-tree for nearest neighbors
`query_ball_point`(self, x, r[, p, eps])	Find all points within distance r of point(s) x.
`query_ball_tree`(self, other, r[, p, eps])	Find all pairs of points whose distance is at most r
`query_pairs`(self, r[, p, eps])	Find all pairs of points whose distance is at most r.
`sparse_distance_matrix`(self, other, max_distance)	Compute a sparse distance matrix

__init__($self, /, *args, **kwargs): Initialize self. See help(type(self)) for accurate signature.

boxsize

count_neighbors(self, other, r, p=2., weights=None, cumulative=True)

Count how many nearby pairs can be formed. (pair-counting)

Count the number of pairs (x1,x2) can be formed, with x1 drawn from self and x2 drawn from other, and where distance(x1, x2, p) <= r.

Data points on self and other are optionally weighted by the weights argument. (See below)

The algorithm we implement here is based on [1]. See notes for further discussion.

Parameters:

Parameters:	other : cKDTree instance The other tree to draw points from, can be the same tree as self. r : float or one-dimensional array of floats The radius to produce a count for. Multiple radii are searched with a single tree traversal. If the count is non-cumulative(`cumulative=False`), `r` defines the edges of the bins, and must be non-decreasing. p : float, optional 1<=p<=infinity. Which Minkowski p-norm to use. Default 2.0. weights : tuple, array_like, or None, optional If None, the pair-counting is unweighted. If given as a tuple, weights[0] is the weights of points in `self`, and weights[1] is the weights of points in `other`; either can be None to indicate the points are unweighted. If given as an array_like, weights is the weights of points in `self` and `other`. For this to make sense, `self` and `other` must be the same tree. If `self` and `other` are two different trees, a `ValueError` is raised. Default: None cumulative : bool, optional Whether the returned counts are cumulative. When cumulative is set to `False` the algorithm is optimized to work with a large number of bins (>10) specified by `r`. When `cumulative` is set to True, the algorithm is optimized to work with a small number of `r`. Default: True
Returns:	result : scalar or 1-D array The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If cumulative is False, `result[i]` contains the counts with `(-inf if i == 0 else r[i-1]) < R <= r[i]`

other : cKDTree instance: The other tree to draw points from, can be the same tree as self.
r : float or one-dimensional array of floats: The radius to produce a count for. Multiple radii are searched with a single tree traversal. If the count is non-cumulative(cumulative=False), r defines the edges of the bins, and must be non-decreasing.
p : float, optional: 1<=p<=infinity. Which Minkowski p-norm to use. Default 2.0.
weights : tuple, array_like, or None, optional: If None, the pair-counting is unweighted. If given as a tuple, weights[0] is the weights of points in self, and weights[1] is the weights of points in other; either can be None to indicate the points are unweighted. If given as an array_like, weights is the weights of points in self and other. For this to make sense, self and other must be the same tree. If self and other are two different trees, a ValueError is raised. Default: None
cumulative : bool, optional: Whether the returned counts are cumulative. When cumulative is set to False the algorithm is optimized to work with a large number of bins (>10) specified by r. When cumulative is set to True, the algorithm is optimized to work with a small number of r. Default: True

Returns:

result : scalar or 1-D array: The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If cumulative is False, result[i] contains the counts with (-inf if i == 0 else r[i-1]) < R <= r[i]

Notes

Pair-counting is the basic operation used to calculate the two point correlation functions from a data set composed of position of objects.

Two point correlation function measures the clustering of objects and is widely used in cosmology to quantify the large scale structure in our Universe, but it may be useful for data analysis in other fields where self-similar assembly of objects also occur.

The Landy-Szalay estimator for the two point correlation function of D measures the clustering signal in D. [2]

For example, given the position of two sets of objects,

objects D (data) contains the clustering signal, and
objects R (random) that contains no signal,

\[\xi(r) = \frac{<D, D> - 2 f <D, R> + f^2<R, R>}{f^2<R, R>},\]

where the brackets represents counting pairs between two data sets in a finite bin around r (distance), corresponding to setting cumulative=False, and f = float(len(D)) / float(len(R)) is the ratio between number of objects from data and random.

The algorithm implemented here is loosely based on the dual-tree algorithm described in [1]. We switch between two different pair-cumulation scheme depending on the setting of cumulative. The computing time of the method we use when for cumulative == False does not scale with the total number of bins. The algorithm for cumulative == True scales linearly with the number of bins, though it is slightly faster when only 1 or 2 bins are used. [5].

As an extension to the naive pair-counting, weighted pair-counting counts the product of weights instead of number of pairs. Weighted pair-counting is used to estimate marked correlation functions ([3], section 2.2), or to properly calculate the average of data per distance bin (e.g. [4], section 2.1 on redshift).

[1]	(1, 2) Gray and Moore, “N-body problems in statistical learning”, Mining the sky, 2000, https://arxiv.org/abs/astro-ph/0012333

[2]	Landy and Szalay, “Bias and variance of angular correlation functions”, The Astrophysical Journal, 1993, http://adsabs.harvard.edu/abs/1993ApJ…412…64L

[3]	Sheth, Connolly and Skibba, “Marked correlations in galaxy formation models”, Arxiv e-print, 2005, https://arxiv.org/abs/astro-ph/0511773

[4]	Hawkins, et al., “The 2dF Galaxy Redshift Survey: correlation functions, peculiar velocities and the matter density of the Universe”, Monthly Notices of the Royal Astronomical Society, 2002, http://adsabs.harvard.edu/abs/2003MNRAS.346…78H

[5]	https://github.com/scipy/scipy/pull/5647#issuecomment-168474926

data

indices

leafsize

m

maxes

mins

n

query(self, x, k=1, eps=0, p=2, distance_upper_bound=np.inf, n_jobs=1)

Query the kd-tree for nearest neighbors

Parameters:

Parameters:	x : array_like, last dimension self.m An array of points to query. k : list of integer or integer The list of k-th nearest neighbors to return. If k is an integer it is treated as a list of [1, … k] (range(1, k+1)). Note that the counting starts from 1. eps : non-negative float Return approximate nearest neighbors; the k-th returned value is guaranteed to be no further than (1+eps) times the distance to the real k-th nearest neighbor. p : float, 1<=p<=infinity Which Minkowski p-norm to use. 1 is the sum-of-absolute-values “Manhattan” distance 2 is the usual Euclidean distance infinity is the maximum-coordinate-difference distance distance_upper_bound : nonnegative float Return only neighbors within this distance. This is used to prune tree searches, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point. n_jobs : int, optional Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: 1.
Returns:	d : array of floats The distances to the nearest neighbors. If `x` has shape `tuple+(self.m,)`, then `d` has shape `tuple+(k,)`. When k == 1, the last dimension of the output is squeezed. Missing neighbors are indicated with infinite distances. i : ndarray of ints The locations of the neighbors in `self.data`. If `x` has shape `tuple+(self.m,)`, then `i` has shape `tuple+(k,)`. When k == 1, the last dimension of the output is squeezed. Missing neighbors are indicated with `self.n`.

x : array_like, last dimension self.m: An array of points to query.
k : list of integer or integer: The list of k-th nearest neighbors to return. If k is an integer it is treated as a list of [1, … k] (range(1, k+1)). Note that the counting starts from 1.
eps : non-negative float: Return approximate nearest neighbors; the k-th returned value is guaranteed to be no further than (1+eps) times the distance to the real k-th nearest neighbor.
p : float, 1<=p<=infinity: Which Minkowski p-norm to use. 1 is the sum-of-absolute-values “Manhattan” distance 2 is the usual Euclidean distance infinity is the maximum-coordinate-difference distance
distance_upper_bound : nonnegative float: Return only neighbors within this distance. This is used to prune tree searches, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.
n_jobs : int, optional: Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: 1.

Returns:

d : array of floats: The distances to the nearest neighbors. If x has shape tuple+(self.m,), then d has shape tuple+(k,). When k == 1, the last dimension of the output is squeezed. Missing neighbors are indicated with infinite distances.
i : ndarray of ints: The locations of the neighbors in self.data. If x has shape tuple+(self.m,), then i has shape tuple+(k,). When k == 1, the last dimension of the output is squeezed. Missing neighbors are indicated with self.n.

Notes

If the KD-Tree is periodic, the position x is wrapped into the box.

When the input k is a list, a query for arange(max(k)) is performed, but only columns that store the requested values of k are preserved. This is implemented in a manner that reduces memory usage.

Examples

>>> import numpy as np
>>> from scipy.spatial import cKDTree
>>> x, y = np.mgrid[0:5, 2:8]
>>> tree = cKDTree(np.c_[x.ravel(), y.ravel()])

To query the nearest neighbours and return squeezed result, use

>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=1)
>>> print(dd, ii)
[2.         0.14142136] [ 0 13]

To query the nearest neighbours and return unsqueezed result, use

>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=[1])
>>> print(dd, ii)
[[2.        ]
 [0.14142136]] [[ 0]
 [13]]

To query the second nearest neighbours and return unsqueezed result, use

>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=[2])
>>> print(dd, ii)
[[2.23606798]
 [0.90553851]] [[ 6]
 [12]]

To query the first and second nearest neighbours, use

>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=2)
>>> print(dd, ii)
[[2.         2.23606798]
 [0.14142136 0.90553851]] [[ 0  6]
 [13 12]]

or, be more specific

>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=[1, 2])
>>> print(dd, ii)
[[2.         2.23606798]
 [0.14142136 0.90553851]] [[ 0  6]
 [13 12]]

query_ball_point(self, x, r, p=2., eps=0)

Find all points within distance r of point(s) x.

Parameters:

Parameters:	x : array_like, shape tuple + (self.m,) The point or points to search for neighbors of. r : positive float The radius of points to return. p : float, optional Which Minkowski p-norm to use. Should be in the range [1, inf]. eps : nonnegative float, optional Approximate search. Branches of the tree are not explored if their nearest points are further than `r / (1 + eps)`, and branches are added in bulk if their furthest points are nearer than `r * (1 + eps)`. n_jobs : int, optional Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: 1.
Returns:	results : list or array of lists If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.

x : array_like, shape tuple + (self.m,): The point or points to search for neighbors of.
r : positive float: The radius of points to return.
p : float, optional: Which Minkowski p-norm to use. Should be in the range [1, inf].
eps : nonnegative float, optional: Approximate search. Branches of the tree are not explored if their nearest points are further than r / (1 + eps), and branches are added in bulk if their furthest points are nearer than r * (1 + eps).
n_jobs : int, optional: Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: 1.

Returns:

results : list or array of lists: If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.

Notes

If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a cKDTree and using query_ball_tree.

Examples

>>> from scipy import spatial
>>> x, y = np.mgrid[0:4, 0:4]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = spatial.cKDTree(points)
>>> tree.query_ball_point([2, 0], 1)
[4, 8, 9, 12]

query_ball_tree(self, other, r, p=2., eps=0)

Find all pairs of points whose distance is at most r

Parameters:

Parameters:	other : cKDTree instance The tree containing points to search against. r : float The maximum distance, has to be positive. p : float, optional Which Minkowski norm to use. p has to meet the condition `1 <= p <= infinity`. eps : float, optional Approximate search. Branches of the tree are not explored if their nearest points are further than `r/(1+eps)`, and branches are added in bulk if their furthest points are nearer than `r * (1+eps)`. eps has to be non-negative.
Returns:	results : list of lists For each element `self.data[i]` of this tree, `results[i]` is a list of the indices of its neighbors in `other.data`.

other : cKDTree instance: The tree containing points to search against.
r : float: The maximum distance, has to be positive.
p : float, optional: Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity.
eps : float, optional: Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.

Returns:

results : list of lists: For each element self.data[i] of this tree, results[i] is a list of the indices of its neighbors in other.data.

query_pairs(self, r, p=2., eps=0)

Find all pairs of points whose distance is at most r.

Parameters:

Parameters:	r : positive float The maximum distance. p : float, optional Which Minkowski norm to use. `p` has to meet the condition `1 <= p <= infinity`. eps : float, optional Approximate search. Branches of the tree are not explored if their nearest points are further than `r/(1+eps)`, and branches are added in bulk if their furthest points are nearer than `r * (1+eps)`. eps has to be non-negative. output_type : string, optional Choose the output container, ‘set’ or ‘ndarray’. Default: ‘set’
Returns:	results : set or ndarray Set of pairs `(i,j)`, with `i < j`, for which the corresponding positions are close. If output_type is ‘ndarray’, an ndarry is returned instead of a set.

r : positive float: The maximum distance.
p : float, optional: Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity.
eps : float, optional: Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.
output_type : string, optional: Choose the output container, ‘set’ or ‘ndarray’. Default: ‘set’

Returns:

results : set or ndarray: Set of pairs (i,j), with i < j, for which the corresponding positions are close. If output_type is ‘ndarray’, an ndarry is returned instead of a set.

size

sparse_distance_matrix(self, other, max_distance, p=2.)

Compute a sparse distance matrix

Computes a distance matrix between two cKDTrees, leaving as zero any distance greater than max_distance.

Parameters:	other : cKDTree max_distance : positive float p : float, 1<=p<=infinity Which Minkowski p-norm to use. output_type : string, optional Which container to use for output data. Options: ‘dok_matrix’, ‘coo_matrix’, ‘dict’, or ‘ndarray’. Default: ‘dok_matrix’.
Returns:	result : dok_matrix, coo_matrix, dict or ndarray Sparse matrix representing the results in “dictionary of keys” format. If a dict is returned the keys are (i,j) tuples of indices. If output_type is ‘ndarray’ a record array with fields ‘i’, ‘j’, and ‘k’ is returned,

tree

bundle_analysis

dipy.stats.analysis.bundle_analysis(model_bundle_folder, bundle_folder, orig_bundle_folder, metric_folder, group, subject, no_disks=100, out_dir='')

Applies statistical analysis on bundles and saves the results in a directory specified by out_dir.

Parameters:

Parameters:	model_bundle_folder : string Path to the input model bundle files. This path may contain wildcards to process multiple inputs at once. bundle_folder : string Path to the input bundle files in common space. This path may contain wildcards to process multiple inputs at once. orig_folder : string Path to the input bundle files in native space. This path may contain wildcards to process multiple inputs at once. metric_folder : string Path to the input dti metric or/and peak files. It will be used as metric for statistical analysis of bundles. group : string what group subject belongs to e.g. control or patient subject : string subject id e.g. 10001 no_disks : integer, optional Number of disks used for dividing bundle into disks. (Default 100) out_dir : string, optional Output directory (default input file directory)

model_bundle_folder : string: Path to the input model bundle files. This path may contain wildcards to process multiple inputs at once.
bundle_folder : string: Path to the input bundle files in common space. This path may contain wildcards to process multiple inputs at once.
orig_folder : string: Path to the input bundle files in native space. This path may contain wildcards to process multiple inputs at once.
metric_folder : string: Path to the input dti metric or/and peak files. It will be used as metric for statistical analysis of bundles.
group : string: what group subject belongs to e.g. control or patient
subject : string: subject id e.g. 10001
no_disks : integer, optional: Number of disks used for dividing bundle into disks. (Default 100)
out_dir : string, optional: Output directory (default input file directory)

References

[Chandio19]

Chandio, B.Q., S. Koudoro, D. Reagan, J. Harezlak,

E. Garyfallidis, Bundle Analytics: a computational and statistical analyses framework for tractometric studies, Proceedings of: International Society of Magnetic Resonance in Medicine (ISMRM), Montreal, Canada, 2019.

dti_measures

dipy.stats.analysis.dti_measures(bundle, metric, dt, pname, bname, subject, group, ind, dir)

Calculates dti measure (eg: FA, MD) per point on streamlines and save it in hd5 file.

Parameters:

Parameters:	bundle : string Name of bundle being analyzed metric : matrix of float values dti metric e.g. FA, MD dt : DataFrame DataFrame to be populated pname : string Name of the dti metric bname : string Name of bundle being analyzed. subject : string subject number as a string (e.g. 10001) group : string which group subject belongs to (e.g. patient or control) ind : integer list ind tells which disk number a point belong. dir : string path of output directory

bundle : string: Name of bundle being analyzed
metric : matrix of float values: dti metric e.g. FA, MD
dt : DataFrame: DataFrame to be populated
pname : string: Name of the dti metric
bname : string: Name of bundle being analyzed.
subject : string: subject number as a string (e.g. 10001)
group : string: which group subject belongs to (e.g. patient or control)
ind : integer list: ind tells which disk number a point belong.
dir : string: path of output directory

load_nifti

dipy.stats.analysis.load_nifti(fname, return_img=False, return_voxsize=False, return_coords=False)

load_peaks

dipy.stats.analysis.load_peaks(fname, verbose=False)

Load a PeaksAndMetrics HDF5 file (PAM5)

Parameters:	fname : string Filename of PAM5 file. verbose : bool Print summary information about the loaded file.
Returns:	pam : PeaksAndMetrics object

load_trk

dipy.stats.analysis.load_trk(filename, lazy_load=False)

Loads tractogram files (*.tck)

Parameters:	filename : str input trk filename lazy_load : {False, True}, optional If True, load streamlines in a lazy manner i.e. they will not be kept in memory and only be loaded when needed. Otherwise, load all streamlines in memory.
Returns:	streamlines : list of 2D arrays Each 2D array represents a sequence of 3D points (points, 3). hdr : dict header from a trk file

map_coordinates

dipy.stats.analysis.map_coordinates(input, coordinates, output=None, order=3, mode='constant', cval=0.0, prefilter=True)

Map the input array to new coordinates by interpolation.

The array of coordinates is used to find, for each point in the output, the corresponding coordinates in the input. The value of the input at those coordinates is determined by spline interpolation of the requested order.

The shape of the output is derived from that of the coordinate array by dropping the first axis. The values of the array along the first axis are the coordinates in the input array at which the output value is found.

Parameters:

Parameters:	input : array_like The input array. coordinates : array_like The coordinates at which input is evaluated. output : array or dtype, optional The array in which to place the output, or the dtype of the returned array. By default an array of the same dtype as input will be created. order : int, optional The order of the spline interpolation, default is 3. The order has to be in the range 0-5. mode : {‘reflect’, ‘constant’, ‘nearest’, ‘mirror’, ‘wrap’}, optional The mode parameter determines how the input array is extended when the filter overlaps a border. Default is ‘reflect’. Behavior for each valid value is as follows: ‘reflect’ (d c b a \| a b c d \| d c b a) The input is extended by reflecting about the edge of the last pixel. ‘constant’ (k k k k \| a b c d \| k k k k) The input is extended by filling all values beyond the edge with the same constant value, defined by the cval parameter. ‘nearest’ (a a a a \| a b c d \| d d d d) The input is extended by replicating the last pixel. ‘mirror’ (d c b \| a b c d \| c b a) The input is extended by reflecting about the center of the last pixel. ‘wrap’ (a b c d \| a b c d \| a b c d) The input is extended by wrapping around to the opposite edge. cval : scalar, optional Value to fill past edges of input if mode is ‘constant’. Default is 0.0. prefilter : bool, optional Determines if the input array is prefiltered with spline_filter before interpolation. The default is True, which will create a temporary float64 array of filtered values if order > 1. If setting this to False, the output will be slightly blurred if order > 1, unless the input is prefiltered, i.e. it is the result of calling spline_filter on the original input.
Returns:	map_coordinates : ndarray The result of transforming the input. The shape of the output is derived from that of coordinates by dropping the first axis.

input : array_like

The input array.

coordinates : array_like

The coordinates at which input is evaluated.

output : array or dtype, optional

The array in which to place the output, or the dtype of the returned array. By default an array of the same dtype as input will be created.

order : int, optional

The order of the spline interpolation, default is 3. The order has to be in the range 0-5.

mode : {‘reflect’, ‘constant’, ‘nearest’, ‘mirror’, ‘wrap’}, optional

The mode parameter determines how the input array is extended when the filter overlaps a border. Default is ‘reflect’. Behavior for each valid value is as follows:

‘reflect’ (d c b a | a b c d | d c b a): The input is extended by reflecting about the edge of the last pixel.
‘constant’ (k k k k | a b c d | k k k k): The input is extended by filling all values beyond the edge with the same constant value, defined by the cval parameter.
‘nearest’ (a a a a | a b c d | d d d d): The input is extended by replicating the last pixel.
‘mirror’ (d c b | a b c d | c b a): The input is extended by reflecting about the center of the last pixel.
‘wrap’ (a b c d | a b c d | a b c d): The input is extended by wrapping around to the opposite edge.

cval : scalar, optional

Value to fill past edges of input if mode is ‘constant’. Default is 0.0.

prefilter : bool, optional

Determines if the input array is prefiltered with spline_filter before interpolation. The default is True, which will create a temporary float64 array of filtered values if order > 1. If setting this to False, the output will be slightly blurred if order > 1, unless the input is prefiltered, i.e. it is the result of calling spline_filter on the original input.

Returns:

map_coordinates : ndarray: The result of transforming the input. The shape of the output is derived from that of coordinates by dropping the first axis.

optional_package

dipy.stats.analysis.optional_package(name, trip_msg=None)

Return package-like thing and module setup for package name

Parameters:	name : str package name trip_msg : None or str message to give when someone tries to use the return package, but we could not import it, and have returned a TripWire object instead. Default message if None.
Returns:	pkg_like : module or `TripWire` instance If we can import the package, return it. Otherwise return an object raising an error when accessed have_pkg : bool True if import for package was successful, false otherwise module_setup : function callable usually set as `setup_module` in calling namespace, to allow skipping tests.

peak_values

dipy.stats.analysis.peak_values(bundle, peaks, dt, pname, bname, subject, group, ind, dir)

Peak_values function finds the peak direction and peak value of a point on a streamline used while tracking (generating the tractogram) and save it in hd5 file.

Parameters:

Parameters:	bundle : string Name of bundle being analyzed peaks : peaks contains peak directions and values dt : DataFrame DataFrame to be populated pname : string Name of the dti metric bname : string Name of bundle being analyzed. subject : string subject number as a string (e.g. 10001) group : string which group subject belongs to (e.g. patient or control) ind : integer list ind tells which disk number a point belong. dir : string path of output directory

bundle : string: Name of bundle being analyzed
peaks : peaks: contains peak directions and values
dt : DataFrame: DataFrame to be populated
pname : string: Name of the dti metric
bname : string: Name of bundle being analyzed.
subject : string: subject number as a string (e.g. 10001)
group : string: which group subject belongs to (e.g. patient or control)
ind : integer list: ind tells which disk number a point belong.
dir : string: path of output directory

set_number_of_points

dipy.stats.analysis.set_number_of_points()

Change the number of points of streamlines: (either by downsampling or upsampling)

Change the number of points of streamlines in order to obtain nb_points-1 segments of equal length. Points of streamlines will be modified along the curve.

Parameters:	streamlines : ndarray or a list or `dipy.tracking.Streamlines` If ndarray, must have shape (N,3) where N is the number of points of the streamline. If list, each item must be ndarray shape (Ni,3) where Ni is the number of points of streamline i. If `dipy.tracking.Streamlines`, its common_shape must be 3. nb_points : int integer representing number of points wanted along the curve.
Returns:	new_streamlines : ndarray or a list or `dipy.tracking.Streamlines` Results of the downsampling or upsampling process.

Examples

>>> from dipy.tracking.streamline import set_number_of_points
>>> import numpy as np

One streamline, a semi-circle:

>>> theta = np.pi*np.linspace(0, 1, 100)
>>> x = np.cos(theta)
>>> y = np.sin(theta)
>>> z = 0 * x
>>> streamline = np.vstack((x, y, z)).T
>>> modified_streamline = set_number_of_points(streamline, 3)
>>> len(modified_streamline)
3

Multiple streamlines:

>>> streamlines = [streamline, streamline[::2]]
>>> new_streamlines = set_number_of_points(streamlines, 10)
>>> [len(s) for s in streamlines]
[100, 50]
>>> [len(s) for s in new_streamlines]
[10, 10]

transform_streamlines

dipy.stats.analysis.transform_streamlines(streamlines, mat, in_place=False)

Apply affine transformation to streamlines

Parameters:	streamlines : Streamlines Streamlines object mat : array, (4, 4) transformation matrix in_place : bool If True then change data in place. Be careful changes input streamlines.
Returns:	new_streamlines : Streamlines Sequence transformed 2D ndarrays of shape[-1]==3

`stats`

Module: `stats.analysis`

`AveragePointwiseEuclideanMetric`

`QuickBundles`

`Streamlines`

`cKDTree`

bundle_analysis

dti_measures

load_nifti

load_peaks

load_trk

map_coordinates

optional_package

peak_values

set_number_of_points

transform_streamlines

About

Friends

Support

stats

Module: stats.analysis

AveragePointwiseEuclideanMetric

QuickBundles

Streamlines

cKDTree

bundle_analysis

dti_measures

load_nifti

load_peaks

load_trk

map_coordinates

optional_package

peak_values

set_number_of_points

transform_streamlines

About

Friends

Support

`stats`

Module: `stats.analysis`

`AveragePointwiseEuclideanMetric`

`QuickBundles`

`Streamlines`

`cKDTree`