stats

Module: stats.analysis

AveragePointwiseEuclideanMetric

Computes the average of pointwise Euclidean distances between two sequential data.

QuickBundles(threshold[, metric, ...])

Clusters streamlines using QuickBundles [Garyfallidis12].

Streamlines

alias of ArraySequence

cKDTree(data[, leafsize, compact_nodes, ...])

kd-tree for quick nearest-neighbor lookup

afq_profile(data, bundle, affine[, ...])

Calculates a summarized profile of data for a bundle or tract along its length.

anatomical_measures(bundle, metric, dt, ...)

Calculates dti measure (eg: FA, MD) per point on streamlines and

assignment_map(target_bundle, model_bundle, ...)

Calculates assignment maps of the target bundle with reference to model bundle centroids.

gaussian_weights(bundle[, n_points, ...])

Calculate weights for each streamline/node in a bundle, based on a Mahalanobis distance from the core the bundle, at that node (mean, per default).

mahalanobis(u, v, VI)

Compute the Mahalanobis distance between two 1-D arrays.

map_coordinates(input, coordinates[, ...])

Map the input array to new coordinates by interpolation.

optional_package(name[, trip_msg])

Return package-like thing and module setup for package name

orient_by_streamline(streamlines, standard)

Orient a bundle of streamlines to a standard streamline.

peak_values(bundle, peaks, dt, pname, bname, ...)

Peak_values function finds the generalized fractional anisotropy (gfa)

save_buan_profiles_hdf5(fname, dt)

Saves the given input dataframe to .h5 file

set_number_of_points

Change the number of points of streamlines

values_from_volume(data, streamlines, affine)

Extract values of a scalar/vector along each streamline from a volume.

AveragePointwiseEuclideanMetric

class dipy.stats.analysis.AveragePointwiseEuclideanMetric

Bases: SumPointwiseEuclideanMetric

Computes the average of pointwise Euclidean distances between two sequential data.

A sequence of N-dimensional points is represented as a 2D array with shape (nb_points, nb_dimensions). A feature object can be specified in order to calculate the distance between the features, rather than directly between the sequential data.

Parameters:
featureFeature object, optional

It is used to extract features before computing the distance.

Notes

The distance between two 2D sequential data:

s1       s2

0*   a    *0
  \       |
   \      |
   1*     |
    |  b  *1
    |      \
    2*      \
        c    *2

is equal to \((a+b+c)/3\) where \(a\) is the Euclidean distance between s1[0] and s2[0], \(b\) between s1[1] and s2[1] and \(c\) between s1[2] and s2[2].

Attributes:
feature

Feature object used to extract features from sequential data

is_order_invariant

Is this metric invariant to the sequence’s ordering

Methods

are_compatible

Checks if features can be used by metric.dist based on their shape.

dist

Computes a distance between two data points based on their features.

__init__(*args, **kwargs)

QuickBundles

class dipy.stats.analysis.QuickBundles(threshold, metric='MDF_12points', max_nb_clusters=2147483647)

Bases: Clustering

Clusters streamlines using QuickBundles [Garyfallidis12].

Given a list of streamlines, the QuickBundles algorithm sequentially assigns each streamline to its closest bundle in \(\mathcal{O}(Nk)\) where \(N\) is the number of streamlines and \(k\) is the final number of bundles. If for a given streamline its closest bundle is farther than threshold, a new bundle is created and the streamline is assigned to it except if the number of bundles has already exceeded max_nb_clusters.

Parameters:
thresholdfloat

The maximum distance from a bundle for a streamline to be still considered as part of it.

metricstr or Metric object (optional)

The distance metric to use when comparing two streamlines. By default, the Minimum average Direct-Flip (MDF) distance [Garyfallidis12] is used and streamlines are automatically resampled so they have 12 points.

max_nb_clustersint

Limits the creation of bundles.

References

[Garyfallidis12] (1,2,3)

Garyfallidis E. et al., QuickBundles a method for tractography simplification, Frontiers in Neuroscience, vol 6, no 175, 2012.

Examples

>>> from dipy.segment.clustering import QuickBundles
>>> from dipy.data import get_fnames
>>> from dipy.io.streamline import load_tractogram
>>> from dipy.tracking.streamline import Streamlines
>>> fname = get_fnames('fornix')
>>> fornix = load_tractogram(fname, 'same',
...                          bbox_valid_check=False).streamlines
>>> streamlines = Streamlines(fornix)
>>> # Segment fornix with a threshold of 10mm and streamlines resampled
>>> # to 12 points.
>>> qb = QuickBundles(threshold=10.)
>>> clusters = qb.cluster(streamlines)
>>> len(clusters)
4
>>> list(map(len, clusters))
[61, 191, 47, 1]
>>> # Resampling streamlines differently is done explicitly as follows.
>>> # Note this has an impact on the speed and the accuracy (tradeoff).
>>> from dipy.segment.featurespeed import ResampleFeature
>>> from dipy.segment.metricspeed import AveragePointwiseEuclideanMetric
>>> feature = ResampleFeature(nb_points=2)
>>> metric = AveragePointwiseEuclideanMetric(feature)
>>> qb = QuickBundles(threshold=10., metric=metric)
>>> clusters = qb.cluster(streamlines)
>>> len(clusters)
4
>>> list(map(len, clusters))
[58, 142, 72, 28]

Methods

cluster(streamlines[, ordering])

Clusters streamlines into bundles.

__init__(threshold, metric='MDF_12points', max_nb_clusters=2147483647)
cluster(streamlines, ordering=None)

Clusters streamlines into bundles.

Performs quickbundles algorithm using predefined metric and threshold.

Parameters:
streamlineslist of 2D arrays

Each 2D array represents a sequence of 3D points (points, 3).

orderingiterable of indices

Specifies the order in which data points will be clustered.

Returns:
ClusterMapCentroid object

Result of the clustering.

Streamlines

dipy.stats.analysis.Streamlines

alias of ArraySequence

cKDTree

class dipy.stats.analysis.cKDTree(data, leafsize=16, compact_nodes=True, copy_data=False, balanced_tree=True, boxsize=None)

Bases: object

kd-tree for quick nearest-neighbor lookup

This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.

Note

cKDTree is functionally identical to KDTree. Prior to SciPy v1.6.0, cKDTree had better performance and slightly different functionality but now the two names exist only for backward-compatibility reasons. If compatibility with SciPy < 1.6 is not a concern, prefer KDTree.

Parameters:
dataarray_like, shape (n,m)

The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles, and so modifying this data will result in bogus results. The data are also copied if the kd-tree is built with copy_data=True.

leafsizepositive int, optional

The number of points at which the algorithm switches over to brute-force. Default: 16.

compact_nodesbool, optional

If True, the kd-tree is built to shrink the hyperrectangles to the actual data range. This usually gives a more compact tree that is robust against degenerated input data and gives faster queries at the expense of longer build time. Default: True.

copy_databool, optional

If True the data is always copied to protect the kd-tree against data corruption. Default: False.

balanced_treebool, optional

If True, the median is used to split the hyperrectangles instead of the midpoint. This usually gives a more compact tree and faster queries at the expense of longer build time. Default: True.

boxsizearray_like or scalar, optional

Apply a m-d toroidal topology to the KDTree.. The topology is generated by \(x_i + n_i L_i\) where \(n_i\) are integers and \(L_i\) is the boxsize along i-th dimension. The input data shall be wrapped into \([0, L_i)\). A ValueError is raised if any of the data is outside of this bound.

Notes

The algorithm used is described in Maneewongvatana and Mount 1999. The general idea is that the kd-tree is a binary tree, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.

During construction, the axis and splitting point are chosen by the “sliding midpoint” rule, which ensures that the cells do not all become long and thin.

The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors.

For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.

Attributes:
datandarray, shape (n,m)

The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles. The data are also copied if the kd-tree is built with copy_data=True.

leafsizepositive int

The number of points at which the algorithm switches over to brute-force.

mint

The dimension of a single data-point.

nint

The number of data points.

maxesndarray, shape (m,)

The maximum value in each dimension of the n data points.

minsndarray, shape (m,)

The minimum value in each dimension of the n data points.

treeobject, class cKDTreeNode

This attribute exposes a Python view of the root node in the cKDTree object. A full Python view of the kd-tree is created dynamically on the first access. This attribute allows you to create your own query functions in Python.

sizeint

The number of nodes in the tree.

Methods

count_neighbors(self, other, r[, p, ...])

Count how many nearby pairs can be formed.

query(self, x[, k, eps, p, ...])

Query the kd-tree for nearest neighbors

query_ball_point(self, x, r[, p, eps, ...])

Find all points within distance r of point(s) x.

query_ball_tree(self, other, r[, p, eps])

Find all pairs of points between self and other whose distance is at most r

query_pairs(self, r[, p, eps])

Find all pairs of points in self whose distance is at most r.

sparse_distance_matrix(self, other, max_distance)

Compute a sparse distance matrix

__init__(*args, **kwargs)
boxsize
count_neighbors(self, other, r, p=2., weights=None, cumulative=True)

Count how many nearby pairs can be formed.

Count the number of pairs (x1,x2) can be formed, with x1 drawn from self and x2 drawn from other, and where distance(x1, x2, p) <= r.

Data points on self and other are optionally weighted by the weights argument. (See below)

This is adapted from the “two-point correlation” algorithm described by Gray and Moore [1]. See notes for further discussion.

Parameters:
othercKDTree instance

The other tree to draw points from, can be the same tree as self.

rfloat or one-dimensional array of floats

The radius to produce a count for. Multiple radii are searched with a single tree traversal. If the count is non-cumulative(cumulative=False), r defines the edges of the bins, and must be non-decreasing.

pfloat, optional

1<=p<=infinity. Which Minkowski p-norm to use. Default 2.0. A finite large p may cause a ValueError if overflow can occur.

weightstuple, array_like, or None, optional

If None, the pair-counting is unweighted. If given as a tuple, weights[0] is the weights of points in self, and weights[1] is the weights of points in other; either can be None to indicate the points are unweighted. If given as an array_like, weights is the weights of points in self and other. For this to make sense, self and other must be the same tree. If self and other are two different trees, a ValueError is raised. Default: None

cumulativebool, optional

Whether the returned counts are cumulative. When cumulative is set to False the algorithm is optimized to work with a large number of bins (>10) specified by r. When cumulative is set to True, the algorithm is optimized to work with a small number of r. Default: True

Returns:
resultscalar or 1-D array

The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If cumulative is False, result[i] contains the counts with (-inf if i == 0 else r[i-1]) < R <= r[i]

Notes

Pair-counting is the basic operation used to calculate the two point correlation functions from a data set composed of position of objects.

Two point correlation function measures the clustering of objects and is widely used in cosmology to quantify the large scale structure in our Universe, but it may be useful for data analysis in other fields where self-similar assembly of objects also occur.

The Landy-Szalay estimator for the two point correlation function of D measures the clustering signal in D. [2]

For example, given the position of two sets of objects,

  • objects D (data) contains the clustering signal, and

  • objects R (random) that contains no signal,

\[\xi(r) = \frac{<D, D> - 2 f <D, R> + f^2<R, R>}{f^2<R, R>},\]

where the brackets represents counting pairs between two data sets in a finite bin around r (distance), corresponding to setting cumulative=False, and f = float(len(D)) / float(len(R)) is the ratio between number of objects from data and random.

The algorithm implemented here is loosely based on the dual-tree algorithm described in [1]. We switch between two different pair-cumulation scheme depending on the setting of cumulative. The computing time of the method we use when for cumulative == False does not scale with the total number of bins. The algorithm for cumulative == True scales linearly with the number of bins, though it is slightly faster when only 1 or 2 bins are used. [5].

As an extension to the naive pair-counting, weighted pair-counting counts the product of weights instead of number of pairs. Weighted pair-counting is used to estimate marked correlation functions ([3], section 2.2), or to properly calculate the average of data per distance bin (e.g. [4], section 2.1 on redshift).

[1] (1,2)

Gray and Moore, “N-body problems in statistical learning”, Mining the sky, 2000, :arxiv:`astro-ph/0012333`

[2]

Landy and Szalay, “Bias and variance of angular correlation functions”, The Astrophysical Journal, 1993, :doi:`10.1086/172900`

[3]

Sheth, Connolly and Skibba, “Marked correlations in galaxy formation models”, 2005, :arxiv:`astro-ph/0511773`

[4]

Hawkins, et al., “The 2dF Galaxy Redshift Survey: correlation functions, peculiar velocities and the matter density of the Universe”, Monthly Notices of the Royal Astronomical Society, 2002, :doi:`10.1046/j.1365-2966.2003.07063.x`

Examples

You can count neighbors number between two kd-trees within a distance:

>>> import numpy as np
>>> from scipy.spatial import cKDTree
>>> rng = np.random.default_rng()
>>> points1 = rng.random((5, 2))
>>> points2 = rng.random((5, 2))
>>> kd_tree1 = cKDTree(points1)
>>> kd_tree2 = cKDTree(points2)
>>> kd_tree1.count_neighbors(kd_tree2, 0.2)
1

This number is same as the total pair number calculated by query_ball_tree:

>>> indexes = kd_tree1.query_ball_tree(kd_tree2, r=0.2)
>>> sum([len(i) for i in indexes])
1
data
indices
leafsize
m
maxes
mins
n
query(self, x, k=1, eps=0, p=2, distance_upper_bound=np.inf, workers=1)

Query the kd-tree for nearest neighbors

Parameters:
xarray_like, last dimension self.m

An array of points to query.

klist of integer or integer

The list of k-th nearest neighbors to return. If k is an integer it is treated as a list of [1, … k] (range(1, k+1)). Note that the counting starts from 1.

epsnon-negative float

Return approximate nearest neighbors; the k-th returned value is guaranteed to be no further than (1+eps) times the distance to the real k-th nearest neighbor.

pfloat, 1<=p<=infinity

Which Minkowski p-norm to use. 1 is the sum-of-absolute-values “Manhattan” distance 2 is the usual Euclidean distance infinity is the maximum-coordinate-difference distance A finite large p may cause a ValueError if overflow can occur.

distance_upper_boundnonnegative float

Return only neighbors within this distance. This is used to prune tree searches, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.

workersint, optional

Number of workers to use for parallel processing. If -1 is given all CPU threads are used. Default: 1.

Changed in version 1.9.0: The “n_jobs” argument was renamed “workers”. The old name “n_jobs” was deprecated in SciPy 1.6.0 and was removed in SciPy 1.9.0.

Returns:
darray of floats

The distances to the nearest neighbors. If x has shape tuple+(self.m,), then d has shape tuple+(k,). When k == 1, the last dimension of the output is squeezed. Missing neighbors are indicated with infinite distances.

indarray of ints

The index of each neighbor in self.data. If x has shape tuple+(self.m,), then i has shape tuple+(k,). When k == 1, the last dimension of the output is squeezed. Missing neighbors are indicated with self.n.

Notes

If the KD-Tree is periodic, the position x is wrapped into the box.

When the input k is a list, a query for arange(max(k)) is performed, but only columns that store the requested values of k are preserved. This is implemented in a manner that reduces memory usage.

Examples

>>> import numpy as np
>>> from scipy.spatial import cKDTree
>>> x, y = np.mgrid[0:5, 2:8]
>>> tree = cKDTree(np.c_[x.ravel(), y.ravel()])

To query the nearest neighbours and return squeezed result, use

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=1)
>>> print(dd, ii, sep='\n')
[2.         0.2236068]
[ 0 13]

To query the nearest neighbours and return unsqueezed result, use

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=[1])
>>> print(dd, ii, sep='\n')
[[2.        ]
 [0.2236068]]
[[ 0]
 [13]]

To query the second nearest neighbours and return unsqueezed result, use

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=[2])
>>> print(dd, ii, sep='\n')
[[2.23606798]
 [0.80622577]]
[[ 6]
 [19]]

To query the first and second nearest neighbours, use

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=2)
>>> print(dd, ii, sep='\n')
[[2.         2.23606798]
 [0.2236068  0.80622577]]
[[ 0  6]
 [13 19]]

or, be more specific

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=[1, 2])
>>> print(dd, ii, sep='\n')
[[2.         2.23606798]
 [0.2236068  0.80622577]]
[[ 0  6]
 [13 19]]
query_ball_point(self, x, r, p=2., eps=0, workers=1, return_sorted=None, return_length=False)

Find all points within distance r of point(s) x.

Parameters:
xarray_like, shape tuple + (self.m,)

The point or points to search for neighbors of.

rarray_like, float

The radius of points to return, shall broadcast to the length of x.

pfloat, optional

Which Minkowski p-norm to use. Should be in the range [1, inf]. A finite large p may cause a ValueError if overflow can occur.

epsnonnegative float, optional

Approximate search. Branches of the tree are not explored if their nearest points are further than r / (1 + eps), and branches are added in bulk if their furthest points are nearer than r * (1 + eps).

workersint, optional

Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: 1.

Changed in version 1.9.0: The “n_jobs” argument was renamed “workers”. The old name “n_jobs” was deprecated in SciPy 1.6.0 and was removed in SciPy 1.9.0.

return_sortedbool, optional

Sorts returned indicies if True and does not sort them if False. If None, does not sort single point queries, but does sort multi-point queries which was the behavior before this option was added.

New in version 1.2.0.

return_length: bool, optional

Return the number of points inside the radius instead of a list of the indices. .. versionadded:: 1.3.0

Returns:
resultslist or array of lists

If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.

Notes

If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a cKDTree and using query_ball_tree.

Examples

>>> import numpy as np
>>> from scipy import spatial
>>> x, y = np.mgrid[0:4, 0:4]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = spatial.cKDTree(points)
>>> tree.query_ball_point([2, 0], 1)
[4, 8, 9, 12]

Query multiple points and plot the results:

>>> import matplotlib.pyplot as plt
>>> points = np.asarray(points)
>>> plt.plot(points[:,0], points[:,1], '.')
>>> for results in tree.query_ball_point(([2, 0], [3, 3]), 1):
...     nearby_points = points[results]
...     plt.plot(nearby_points[:,0], nearby_points[:,1], 'o')
>>> plt.margins(0.1, 0.1)
>>> plt.show()
query_ball_tree(self, other, r, p=2., eps=0)

Find all pairs of points between self and other whose distance is at most r

Parameters:
othercKDTree instance

The tree containing points to search against.

rfloat

The maximum distance, has to be positive.

pfloat, optional

Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity. A finite large p may cause a ValueError if overflow can occur.

epsfloat, optional

Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.

Returns:
resultslist of lists

For each element self.data[i] of this tree, results[i] is a list of the indices of its neighbors in other.data.

Examples

You can search all pairs of points between two kd-trees within a distance:

>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> from scipy.spatial import cKDTree
>>> rng = np.random.default_rng()
>>> points1 = rng.random((15, 2))
>>> points2 = rng.random((15, 2))
>>> plt.figure(figsize=(6, 6))
>>> plt.plot(points1[:, 0], points1[:, 1], "xk", markersize=14)
>>> plt.plot(points2[:, 0], points2[:, 1], "og", markersize=14)
>>> kd_tree1 = cKDTree(points1)
>>> kd_tree2 = cKDTree(points2)
>>> indexes = kd_tree1.query_ball_tree(kd_tree2, r=0.2)
>>> for i in range(len(indexes)):
...     for j in indexes[i]:
...         plt.plot([points1[i, 0], points2[j, 0]],
...             [points1[i, 1], points2[j, 1]], "-r")
>>> plt.show()
query_pairs(self, r, p=2., eps=0)

Find all pairs of points in self whose distance is at most r.

Parameters:
rpositive float

The maximum distance.

pfloat, optional

Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity. A finite large p may cause a ValueError if overflow can occur.

epsfloat, optional

Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.

output_typestring, optional

Choose the output container, ‘set’ or ‘ndarray’. Default: ‘set’

Returns:
resultsset or ndarray

Set of pairs (i,j), with i < j, for which the corresponding positions are close. If output_type is ‘ndarray’, an ndarry is returned instead of a set.

Examples

You can search all pairs of points in a kd-tree within a distance:

>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> from scipy.spatial import cKDTree
>>> rng = np.random.default_rng()
>>> points = rng.random((20, 2))
>>> plt.figure(figsize=(6, 6))
>>> plt.plot(points[:, 0], points[:, 1], "xk", markersize=14)
>>> kd_tree = cKDTree(points)
>>> pairs = kd_tree.query_pairs(r=0.2)
>>> for (i, j) in pairs:
...     plt.plot([points[i, 0], points[j, 0]],
...             [points[i, 1], points[j, 1]], "-r")
>>> plt.show()
size
sparse_distance_matrix(self, other, max_distance, p=2.)

Compute a sparse distance matrix

Computes a distance matrix between two cKDTrees, leaving as zero any distance greater than max_distance.

Parameters:
othercKDTree
max_distancepositive float
pfloat, 1<=p<=infinity

Which Minkowski p-norm to use. A finite large p may cause a ValueError if overflow can occur.

output_typestring, optional

Which container to use for output data. Options: ‘dok_matrix’, ‘coo_matrix’, ‘dict’, or ‘ndarray’. Default: ‘dok_matrix’.

Returns:
resultdok_matrix, coo_matrix, dict or ndarray

Sparse matrix representing the results in “dictionary of keys” format. If a dict is returned the keys are (i,j) tuples of indices. If output_type is ‘ndarray’ a record array with fields ‘i’, ‘j’, and ‘v’ is returned,

Examples

You can compute a sparse distance matrix between two kd-trees:

>>> import numpy as np
>>> from scipy.spatial import cKDTree
>>> rng = np.random.default_rng()
>>> points1 = rng.random((5, 2))
>>> points2 = rng.random((5, 2))
>>> kd_tree1 = cKDTree(points1)
>>> kd_tree2 = cKDTree(points2)
>>> sdm = kd_tree1.sparse_distance_matrix(kd_tree2, 0.3)
>>> sdm.toarray()
array([[0.        , 0.        , 0.12295571, 0.        , 0.        ],
   [0.        , 0.        , 0.        , 0.        , 0.        ],
   [0.28942611, 0.        , 0.        , 0.2333084 , 0.        ],
   [0.        , 0.        , 0.        , 0.        , 0.        ],
   [0.24617575, 0.29571802, 0.26836782, 0.        , 0.        ]])

You can check distances above the max_distance are zeros:

>>> from scipy.spatial import distance_matrix
>>> distance_matrix(points1, points2)
array([[0.56906522, 0.39923701, 0.12295571, 0.8658745 , 0.79428925],
   [0.37327919, 0.7225693 , 0.87665969, 0.32580855, 0.75679479],
   [0.28942611, 0.30088013, 0.6395831 , 0.2333084 , 0.33630734],
   [0.31994999, 0.72658602, 0.71124834, 0.55396483, 0.90785663],
   [0.24617575, 0.29571802, 0.26836782, 0.57714465, 0.6473269 ]])
tree

afq_profile

dipy.stats.analysis.afq_profile(data, bundle, affine, n_points=100, profile_stat=<function average>, orient_by=None, weights=None, **weights_kwarg)

Calculates a summarized profile of data for a bundle or tract along its length.

Follows the approach outlined in [Yeatman2012].

Parameters:
data3D volume

The statistic to sample with the streamlines.

bundleStreamLines class instance
The collection of streamlines (possibly already resampled into an array

for each to have the same length) with which we are resampling. See Note below about orienting the streamlines.

affinearray_like (4, 4)

The mapping from voxel coordinates to streamline points. The voxel_to_rasmm matrix, typically from a NIFTI file.

n_points: int, optional

The number of points to sample along the bundle. Default: 100.

orient_by: streamline, optional.

A streamline to use as a standard to orient all of the streamlines in the bundle according to.

weights1D array or 2D array or callable (optional)

Weight each streamline (1D) or each node (2D) when calculating the tract-profiles. Must sum to 1 across streamlines (in each node if relevant). If callable, this is a function that calculates weights.

profile_statcallable

The statistic used to average the profile across streamlines. If weights is not None, this must take weights as a keyword argument. The default, np.average, is the same as np.mean but takes weights as a keyword argument.

weights_kwargkey-word arguments

Additional key-word arguments to pass to the weight-calculating function. Only to be used if weights is a callable.

Returns:
ndarraya 1D array with the profile of data along the length of

bundle

Notes

Before providing a bundle as input to this function, you will need to make sure that the streamlines in the bundle are all oriented in the same orientation relative to the bundle (use orient_by_streamline()).

References

[Yeatman2012]

Yeatman, Jason D., Robert F. Dougherty, Nathaniel J. Myall, Brian A. Wandell, and Heidi M. Feldman. 2012. “Tract Profiles of White Matter Properties: Automating Fiber-Tract Quantification” PloS One 7 (11): e49790.

anatomical_measures

dipy.stats.analysis.anatomical_measures(bundle, metric, dt, pname, bname, subject, group_id, ind, dir_name)
Calculates dti measure (eg: FA, MD) per point on streamlines and

save it in hd5 file.

Parameters:
bundlestring

Name of bundle being analyzed

metricmatrix of float values

dti metric e.g. FA, MD

dtDataFrame

DataFrame to be populated

pnamestring

Name of the dti metric

bnamestring

Name of bundle being analyzed.

subjectstring

subject number as a string (e.g. 10001)

group_idinteger

which group subject belongs to 1 for patient and 0 control

indinteger list

ind tells which disk number a point belong.

dir_namestring

path of output directory

assignment_map

dipy.stats.analysis.assignment_map(target_bundle, model_bundle, no_disks)

Calculates assignment maps of the target bundle with reference to model bundle centroids.

Parameters:
target_bundlestreamlines

target bundle extracted from subject data in common space

model_bundlestreamlines

atlas bundle used as reference

no_disksinteger, optional

Number of disks used for dividing bundle into disks. (Default 100)

References

[Chandio2020]

Chandio, B.Q., Risacher, S.L., Pestilli, F., Bullock, D.,

Yeh, FC., Koudoro, S., Rokem, A., Harezlak, J., and Garyfallidis, E. Bundle analytics, a computational framework for investigating the shapes and profiles of brain pathways across populations. Sci Rep 10, 17149 (2020)

gaussian_weights

dipy.stats.analysis.gaussian_weights(bundle, n_points=100, return_mahalnobis=False, stat=<function mean>)

Calculate weights for each streamline/node in a bundle, based on a Mahalanobis distance from the core the bundle, at that node (mean, per default).

Parameters:
bundleStreamlines

The streamlines to weight.

n_pointsint, optional

The number of points to resample to. If the `bundle` is an array, this input is ignored. Default: 100.

return_mahalanobisbool, optional

Whether to return the Mahalanobis distance instead of the weights. Default: False.

statcallable, optional.

The statistic used to calculate the central tendency of streamlines in each node. Can be one of {np.mean, np.median} or other functions that have similar API. Default: np.mean

Returns:
warray of shape (n_streamlines, n_points)

Weights for each node in each streamline, calculated as its relative inverse of the Mahalanobis distance, relative to the distribution of coordinates at that node position across streamlines.

mahalanobis

dipy.stats.analysis.mahalanobis(u, v, VI)

Compute the Mahalanobis distance between two 1-D arrays.

The Mahalanobis distance between 1-D arrays u and v, is defined as

\[\sqrt{ (u-v) V^{-1} (u-v)^T }\]

where V is the covariance matrix. Note that the argument VI is the inverse of V.

Parameters:
u(N,) array_like

Input array.

v(N,) array_like

Input array.

VIarray_like

The inverse of the covariance matrix.

Returns:
mahalanobisdouble

The Mahalanobis distance between vectors u and v.

Examples

>>> from scipy.spatial import distance
>>> iv = [[1, 0.5, 0.5], [0.5, 1, 0.5], [0.5, 0.5, 1]]
>>> distance.mahalanobis([1, 0, 0], [0, 1, 0], iv)
1.0
>>> distance.mahalanobis([0, 2, 0], [0, 1, 0], iv)
1.0
>>> distance.mahalanobis([2, 0, 0], [0, 1, 0], iv)
1.7320508075688772

map_coordinates

dipy.stats.analysis.map_coordinates(input, coordinates, output=None, order=3, mode='constant', cval=0.0, prefilter=True)

Map the input array to new coordinates by interpolation.

The array of coordinates is used to find, for each point in the output, the corresponding coordinates in the input. The value of the input at those coordinates is determined by spline interpolation of the requested order.

The shape of the output is derived from that of the coordinate array by dropping the first axis. The values of the array along the first axis are the coordinates in the input array at which the output value is found.

Parameters:
inputarray_like

The input array.

coordinatesarray_like

The coordinates at which input is evaluated.

outputarray or dtype, optional

The array in which to place the output, or the dtype of the returned array. By default an array of the same dtype as input will be created.

orderint, optional

The order of the spline interpolation, default is 3. The order has to be in the range 0-5.

mode{‘reflect’, ‘grid-mirror’, ‘constant’, ‘grid-constant’, ‘nearest’, ‘mirror’, ‘grid-wrap’, ‘wrap’}, optional

The mode parameter determines how the input array is extended beyond its boundaries. Default is ‘constant’. Behavior for each valid value is as follows (see additional plots and details on boundary modes):

‘reflect’ (d c b a | a b c d | d c b a)

The input is extended by reflecting about the edge of the last pixel. This mode is also sometimes referred to as half-sample symmetric.

‘grid-mirror’

This is a synonym for ‘reflect’.

‘constant’ (k k k k | a b c d | k k k k)

The input is extended by filling all values beyond the edge with the same constant value, defined by the cval parameter. No interpolation is performed beyond the edges of the input.

‘grid-constant’ (k k k k | a b c d | k k k k)

The input is extended by filling all values beyond the edge with the same constant value, defined by the cval parameter. Interpolation occurs for samples outside the input’s extent as well.

‘nearest’ (a a a a | a b c d | d d d d)

The input is extended by replicating the last pixel.

‘mirror’ (d c b | a b c d | c b a)

The input is extended by reflecting about the center of the last pixel. This mode is also sometimes referred to as whole-sample symmetric.

‘grid-wrap’ (a b c d | a b c d | a b c d)

The input is extended by wrapping around to the opposite edge.

‘wrap’ (d b c d | a b c d | b c a b)

The input is extended by wrapping around to the opposite edge, but in a way such that the last point and initial point exactly overlap. In this case it is not well defined which sample will be chosen at the point of overlap.

cvalscalar, optional

Value to fill past edges of input if mode is ‘constant’. Default is 0.0.

prefilterbool, optional

Determines if the input array is prefiltered with spline_filter before interpolation. The default is True, which will create a temporary float64 array of filtered values if order > 1. If setting this to False, the output will be slightly blurred if order > 1, unless the input is prefiltered, i.e. it is the result of calling spline_filter on the original input.

Returns:
map_coordinatesndarray

The result of transforming the input. The shape of the output is derived from that of coordinates by dropping the first axis.

See also

spline_filter, geometric_transform, scipy.interpolate

Notes

For complex-valued input, this function maps the real and imaginary components independently.

New in version 1.6.0: Complex-valued support added.

Examples

>>> from scipy import ndimage
>>> import numpy as np
>>> a = np.arange(12.).reshape((4, 3))
>>> a
array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.]])
>>> ndimage.map_coordinates(a, [[0.5, 2], [0.5, 1]], order=1)
array([ 2.,  7.])

Above, the interpolated value of a[0.5, 0.5] gives output[0], while a[2, 1] is output[1].

>>> inds = np.array([[0.5, 2], [0.5, 4]])
>>> ndimage.map_coordinates(a, inds, order=1, cval=-33.3)
array([  2. , -33.3])
>>> ndimage.map_coordinates(a, inds, order=1, mode='nearest')
array([ 2.,  8.])
>>> ndimage.map_coordinates(a, inds, order=1, cval=0, output=bool)
array([ True, False], dtype=bool)

optional_package

dipy.stats.analysis.optional_package(name, trip_msg=None)

Return package-like thing and module setup for package name

Parameters:
namestr

package name

trip_msgNone or str

message to give when someone tries to use the return package, but we could not import it, and have returned a TripWire object instead. Default message if None.

Returns:
pkg_likemodule or TripWire instance

If we can import the package, return it. Otherwise return an object raising an error when accessed

have_pkgbool

True if import for package was successful, false otherwise

module_setupfunction

callable usually set as setup_module in calling namespace, to allow skipping tests.

Examples

Typical use would be something like this at the top of a module using an optional package:

>>> from dipy.utils.optpkg import optional_package
>>> pkg, have_pkg, setup_module = optional_package('not_a_package')

Of course in this case the package doesn’t exist, and so, in the module:

>>> have_pkg
False

and

>>> pkg.some_function() 
Traceback (most recent call last):
    ...
TripWireError: We need package not_a_package for these functions, but
``import not_a_package`` raised an ImportError

If the module does exist - we get the module

>>> pkg, _, _ = optional_package('os')
>>> hasattr(pkg, 'path')
True

Or a submodule if that’s what we asked for

>>> subpkg, _, _ = optional_package('os.path')
>>> hasattr(subpkg, 'dirname')
True

orient_by_streamline

dipy.stats.analysis.orient_by_streamline(streamlines, standard, n_points=12, in_place=False, as_generator=False)

Orient a bundle of streamlines to a standard streamline.

Parameters:
streamlinesStreamlines, list

The input streamlines to orient.

standardStreamlines, list, or ndarrray

This provides the standard orientation according to which the streamlines in the provided bundle should be reoriented.

n_points: int, optional

The number of samples to apply to each of the streamlines.

in_placebool

Whether to make the change in-place in the original input (and return a reference), or to make a copy of the list and return this copy, with the relevant streamlines reoriented. Default: False.

as_generatorbool

Whether to return a generator as output. Default: False

Returns:
Streamlineswith each individual array oriented to be as similar as

possible to the standard.

peak_values

dipy.stats.analysis.peak_values(bundle, peaks, dt, pname, bname, subject, group_id, ind, dir_name)
Peak_values function finds the generalized fractional anisotropy (gfa)

and quantitative anisotropy (qa) values from peaks object (eg: csa) for every point on a streamline used while tracking and saves it in hd5 file.

Parameters:
bundlestring

Name of bundle being analyzed

peakspeaks

contains peak directions and values

dtDataFrame

DataFrame to be populated

pnamestring

Name of the dti metric

bnamestring

Name of bundle being analyzed.

subjectstring

subject number as a string (e.g. 10001)

group_idinteger

which group subject belongs to 1 patient and 0 for control

indinteger list

ind tells which disk number a point belong.

dir_namestring

path of output directory

save_buan_profiles_hdf5

dipy.stats.analysis.save_buan_profiles_hdf5(fname, dt)

Saves the given input dataframe to .h5 file

Parameters:
fnamestring

file name for saving the hdf5 file

dtPandas DataFrame

DataFrame to be saved as .h5 file

set_number_of_points

dipy.stats.analysis.set_number_of_points()
Change the number of points of streamlines

(either by downsampling or upsampling)

Change the number of points of streamlines in order to obtain nb_points-1 segments of equal length. Points of streamlines will be modified along the curve.

Parameters:
streamlinesndarray or a list or dipy.tracking.Streamlines

If ndarray, must have shape (N,3) where N is the number of points of the streamline. If list, each item must be ndarray shape (Ni,3) where Ni is the number of points of streamline i. If dipy.tracking.Streamlines, its common_shape must be 3.

nb_pointsint

integer representing number of points wanted along the curve.

Returns:
new_streamlinesndarray or a list or dipy.tracking.Streamlines

Results of the downsampling or upsampling process.

Examples

>>> from dipy.tracking.streamline import set_number_of_points
>>> import numpy as np

One streamline, a semi-circle:

>>> theta = np.pi*np.linspace(0, 1, 100)
>>> x = np.cos(theta)
>>> y = np.sin(theta)
>>> z = 0 * x
>>> streamline = np.vstack((x, y, z)).T
>>> modified_streamline = set_number_of_points(streamline, 3)
>>> len(modified_streamline)
3

Multiple streamlines:

>>> streamlines = [streamline, streamline[::2]]
>>> new_streamlines = set_number_of_points(streamlines, 10)
>>> [len(s) for s in streamlines]
[100, 50]
>>> [len(s) for s in new_streamlines]
[10, 10]

values_from_volume

dipy.stats.analysis.values_from_volume(data, streamlines, affine)

Extract values of a scalar/vector along each streamline from a volume.

Parameters:
data3D or 4D array

Scalar (for 3D) and vector (for 4D) values to be extracted. For 4D data, interpolation will be done on the 3 spatial dimensions in each volume.

streamlinesndarray or list

If array, of shape (n_streamlines, n_nodes, 3) If list, len(n_streamlines) with (n_nodes, 3) array in each element of the list.

affinearray_like (4, 4)

The mapping from voxel coordinates to streamline points. The voxel_to_rasmm matrix, typically from a NIFTI file.

Returns:
array or list (depending on the input)values interpolate to each

coordinate along the length of each streamline.

Notes

Values are extracted from the image based on the 3D coordinates of the nodes that comprise the points in the streamline, without any interpolation into segments between the nodes. Using this function with streamlines that have been resampled into a very small number of nodes will result in very few values.