stats
stats.analysis
AveragePointwiseEuclideanMetric |
Computes the average of pointwise Euclidean distances between two sequential data. |
QuickBundles (threshold[, metric, …]) |
Clusters streamlines using QuickBundles [Garyfallidis12]. |
Streamlines |
alias of nibabel.streamlines.array_sequence.ArraySequence |
cKDTree (data[, leafsize, compact_nodes, …]) |
kd-tree for quick nearest-neighbor lookup |
bundle_analysis (model_bundle_folder, …[, …]) |
Applies statistical analysis on bundles and saves the results in a directory specified by out_dir . |
dti_measures (bundle, metric, dt, pname, …) |
Calculates dti measure (eg: FA, MD) per point on streamlines and save it in hd5 file. |
load_nifti (fname[, return_img, …]) |
|
load_peaks (fname[, verbose]) |
Load a PeaksAndMetrics HDF5 file (PAM5) |
load_trk (filename[, lazy_load]) |
Loads tractogram files (*.tck) |
map_coordinates (input, coordinates[, …]) |
Map the input array to new coordinates by interpolation. |
optional_package (name[, trip_msg]) |
Return package-like thing and module setup for package name |
peak_values (bundle, peaks, dt, pname, bname, …) |
Peak_values function finds the peak direction and peak value of a point on a streamline used while tracking (generating the tractogram) and save it in hd5 file. |
set_number_of_points |
Change the number of points of streamlines |
transform_streamlines (streamlines, mat[, …]) |
Apply affine transformation to streamlines |
AveragePointwiseEuclideanMetric
dipy.stats.analysis.
AveragePointwiseEuclideanMetric
Bases: dipy.segment.metricspeed.SumPointwiseEuclideanMetric
Computes the average of pointwise Euclidean distances between two sequential data.
A sequence of N-dimensional points is represented as a 2D array with shape (nb_points, nb_dimensions). A feature object can be specified in order to calculate the distance between the features, rather than directly between the sequential data.
Parameters: |
|
---|
Notes
The distance between two 2D sequential data:
s1 s2
0* a *0
\ |
\ |
1* |
| b *1
| \
2* \
c *2
is equal to \((a+b+c)/3\) where \(a\) is the Euclidean distance between s1[0] and s2[0], \(b\) between s1[1] and s2[1] and \(c\) between s1[2] and s2[2].
Attributes: |
|
---|
Methods
are_compatible |
Checks if features can be used by metric.dist based on their shape. |
dist |
Computes a distance between two data points based on their features. |
QuickBundles
dipy.stats.analysis.
QuickBundles
(threshold, metric='MDF_12points', max_nb_clusters=2147483647)Bases: dipy.segment.clustering.Clustering
Clusters streamlines using QuickBundles [Garyfallidis12].
Given a list of streamlines, the QuickBundles algorithm sequentially assigns each streamline to its closest bundle in \(\mathcal{O}(Nk)\) where \(N\) is the number of streamlines and \(k\) is the final number of bundles. If for a given streamline its closest bundle is farther than threshold, a new bundle is created and the streamline is assigned to it except if the number of bundles has already exceeded max_nb_clusters.
Parameters: |
|
---|
References
[Garyfallidis12] | (1, 2, 3, 4) Garyfallidis E. et al., QuickBundles a method for tractography simplification, Frontiers in Neuroscience, vol 6, no 175, 2012. |
Examples
>>> from dipy.segment.clustering import QuickBundles
>>> from dipy.data import get_fnames
>>> from nibabel import trackvis as tv
>>> streams, hdr = tv.read(get_fnames('fornix'))
>>> streamlines = [i[0] for i in streams]
>>> # Segment fornix with a treshold of 10mm and streamlines resampled
>>> # to 12 points.
>>> qb = QuickBundles(threshold=10.)
>>> clusters = qb.cluster(streamlines)
>>> len(clusters)
4
>>> list(map(len, clusters))
[61, 191, 47, 1]
>>> # Resampling streamlines differently is done explicitly as follows.
>>> # Note this has an impact on the speed and the accuracy (tradeoff).
>>> from dipy.segment.metric import ResampleFeature
>>> from dipy.segment.metric import AveragePointwiseEuclideanMetric
>>> feature = ResampleFeature(nb_points=2)
>>> metric = AveragePointwiseEuclideanMetric(feature)
>>> qb = QuickBundles(threshold=10., metric=metric)
>>> clusters = qb.cluster(streamlines)
>>> len(clusters)
4
>>> list(map(len, clusters))
[58, 142, 72, 28]
Methods
cluster (streamlines[, ordering]) |
Clusters streamlines into bundles. |
__init__
(threshold, metric='MDF_12points', max_nb_clusters=2147483647)Initialize self. See help(type(self)) for accurate signature.
cluster
(streamlines, ordering=None)Clusters streamlines into bundles.
Performs quickbundles algorithm using predefined metric and threshold.
Parameters: |
|
---|---|
Returns: |
|
cKDTree
dipy.stats.analysis.
cKDTree
(data, leafsize=16, compact_nodes=True, copy_data=False, balanced_tree=True)Bases: object
kd-tree for quick nearest-neighbor lookup
This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.
The algorithm used is described in Maneewongvatana and Mount 1999. The general idea is that the kd-tree is a binary trie, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.
During construction, the axis and splitting point are chosen by the “sliding midpoint” rule, which ensures that the cells do not all become long and thin.
The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors.
For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.
Parameters: |
|
---|
See also
KDTree
Attributes: |
|
---|
Methods
count_neighbors (self, other, r[, p, …]) |
Count how many nearby pairs can be formed. |
query (self, x[, k, eps, p, …]) |
Query the kd-tree for nearest neighbors |
query_ball_point (self, x, r[, p, eps]) |
Find all points within distance r of point(s) x. |
query_ball_tree (self, other, r[, p, eps]) |
Find all pairs of points whose distance is at most r |
query_pairs (self, r[, p, eps]) |
Find all pairs of points whose distance is at most r. |
sparse_distance_matrix (self, other, max_distance) |
Compute a sparse distance matrix |
count_neighbors
(self, other, r, p=2., weights=None, cumulative=True)Count how many nearby pairs can be formed. (pair-counting)
Count the number of pairs (x1,x2) can be formed, with x1 drawn
from self and x2 drawn from other
, and where
distance(x1, x2, p) <= r
.
Data points on self and other are optionally weighted by the weights
argument. (See below)
The algorithm we implement here is based on [1]. See notes for further discussion.
Parameters: |
|
---|---|
Returns: |
|
Notes
Pair-counting is the basic operation used to calculate the two point correlation functions from a data set composed of position of objects.
Two point correlation function measures the clustering of objects and is widely used in cosmology to quantify the large scale structure in our Universe, but it may be useful for data analysis in other fields where self-similar assembly of objects also occur.
The Landy-Szalay estimator for the two point correlation function of
D
measures the clustering signal in D
. [2]
For example, given the position of two sets of objects,
D
(data) contains the clustering signal, andR
(random) that contains no signal,where the brackets represents counting pairs between two data sets
in a finite bin around r
(distance), corresponding to setting
cumulative=False, and f = float(len(D)) / float(len(R))
is the
ratio between number of objects from data and random.
The algorithm implemented here is loosely based on the dual-tree
algorithm described in [1]. We switch between two different
pair-cumulation scheme depending on the setting of cumulative
.
The computing time of the method we use when for
cumulative == False
does not scale with the total number of bins.
The algorithm for cumulative == True
scales linearly with the
number of bins, though it is slightly faster when only
1 or 2 bins are used. [5].
As an extension to the naive pair-counting, weighted pair-counting counts the product of weights instead of number of pairs. Weighted pair-counting is used to estimate marked correlation functions ([3], section 2.2), or to properly calculate the average of data per distance bin (e.g. [4], section 2.1 on redshift).
[1] | (1, 2) Gray and Moore, “N-body problems in statistical learning”, Mining the sky, 2000, https://arxiv.org/abs/astro-ph/0012333 |
[2] | Landy and Szalay, “Bias and variance of angular correlation functions”, The Astrophysical Journal, 1993, http://adsabs.harvard.edu/abs/1993ApJ…412…64L |
[3] | Sheth, Connolly and Skibba, “Marked correlations in galaxy formation models”, Arxiv e-print, 2005, https://arxiv.org/abs/astro-ph/0511773 |
[4] | Hawkins, et al., “The 2dF Galaxy Redshift Survey: correlation functions, peculiar velocities and the matter density of the Universe”, Monthly Notices of the Royal Astronomical Society, 2002, http://adsabs.harvard.edu/abs/2003MNRAS.346…78H |
[5] | https://github.com/scipy/scipy/pull/5647#issuecomment-168474926 |
query
(self, x, k=1, eps=0, p=2, distance_upper_bound=np.inf, n_jobs=1)Query the kd-tree for nearest neighbors
Parameters: |
|
---|---|
Returns: |
|
Notes
If the KD-Tree is periodic, the position x
is wrapped into the
box.
When the input k is a list, a query for arange(max(k)) is performed, but only columns that store the requested values of k are preserved. This is implemented in a manner that reduces memory usage.
Examples
>>> import numpy as np
>>> from scipy.spatial import cKDTree
>>> x, y = np.mgrid[0:5, 2:8]
>>> tree = cKDTree(np.c_[x.ravel(), y.ravel()])
To query the nearest neighbours and return squeezed result, use
>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=1)
>>> print(dd, ii)
[2. 0.14142136] [ 0 13]
To query the nearest neighbours and return unsqueezed result, use
>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=[1])
>>> print(dd, ii)
[[2. ]
[0.14142136]] [[ 0]
[13]]
To query the second nearest neighbours and return unsqueezed result, use
>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=[2])
>>> print(dd, ii)
[[2.23606798]
[0.90553851]] [[ 6]
[12]]
To query the first and second nearest neighbours, use
>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=2)
>>> print(dd, ii)
[[2. 2.23606798]
[0.14142136 0.90553851]] [[ 0 6]
[13 12]]
or, be more specific
>>> dd, ii = tree.query([[0, 0], [2.1, 2.9]], k=[1, 2])
>>> print(dd, ii)
[[2. 2.23606798]
[0.14142136 0.90553851]] [[ 0 6]
[13 12]]
query_ball_point
(self, x, r, p=2., eps=0)Find all points within distance r of point(s) x.
Parameters: |
|
---|---|
Returns: |
|
Notes
If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a cKDTree and using query_ball_tree.
Examples
>>> from scipy import spatial
>>> x, y = np.mgrid[0:4, 0:4]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = spatial.cKDTree(points)
>>> tree.query_ball_point([2, 0], 1)
[4, 8, 9, 12]
query_ball_tree
(self, other, r, p=2., eps=0)Find all pairs of points whose distance is at most r
Parameters: |
|
---|---|
Returns: |
|
query_pairs
(self, r, p=2., eps=0)Find all pairs of points whose distance is at most r.
Parameters: |
|
---|---|
Returns: |
|
sparse_distance_matrix
(self, other, max_distance, p=2.)Compute a sparse distance matrix
Computes a distance matrix between two cKDTrees, leaving as zero any distance greater than max_distance.
Parameters: |
|
---|---|
Returns: |
|
dipy.stats.analysis.
bundle_analysis
(model_bundle_folder, bundle_folder, orig_bundle_folder, metric_folder, group, subject, no_disks=100, out_dir='')Applies statistical analysis on bundles and saves the results
in a directory specified by out_dir
.
Parameters: |
|
---|
References
[Chandio19] | Chandio, B.Q., S. Koudoro, D. Reagan, J. Harezlak, |
E. Garyfallidis, Bundle Analytics: a computational and statistical analyses framework for tractometric studies, Proceedings of: International Society of Magnetic Resonance in Medicine (ISMRM), Montreal, Canada, 2019.
dipy.stats.analysis.
dti_measures
(bundle, metric, dt, pname, bname, subject, group, ind, dir)Calculates dti measure (eg: FA, MD) per point on streamlines and save it in hd5 file.
Parameters: |
|
---|
dipy.stats.analysis.
load_trk
(filename, lazy_load=False)Loads tractogram files (*.tck)
Parameters: |
|
---|---|
Returns: |
|
dipy.stats.analysis.
map_coordinates
(input, coordinates, output=None, order=3, mode='constant', cval=0.0, prefilter=True)Map the input array to new coordinates by interpolation.
The array of coordinates is used to find, for each point in the output, the corresponding coordinates in the input. The value of the input at those coordinates is determined by spline interpolation of the requested order.
The shape of the output is derived from that of the coordinate array by dropping the first axis. The values of the array along the first axis are the coordinates in the input array at which the output value is found.
Parameters: |
|
---|---|
Returns: |
|
See also
spline_filter
, geometric_transform
, scipy.interpolate
Examples
>>> from scipy import ndimage
>>> a = np.arange(12.).reshape((4, 3))
>>> a
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.]])
>>> ndimage.map_coordinates(a, [[0.5, 2], [0.5, 1]], order=1)
array([ 2., 7.])
Above, the interpolated value of a[0.5, 0.5] gives output[0], while a[2, 1] is output[1].
>>> inds = np.array([[0.5, 2], [0.5, 4]])
>>> ndimage.map_coordinates(a, inds, order=1, cval=-33.3)
array([ 2. , -33.3])
>>> ndimage.map_coordinates(a, inds, order=1, mode='nearest')
array([ 2., 8.])
>>> ndimage.map_coordinates(a, inds, order=1, cval=0, output=bool)
array([ True, False], dtype=bool)
dipy.stats.analysis.
optional_package
(name, trip_msg=None)Return package-like thing and module setup for package name
Parameters: |
|
---|---|
Returns: |
|
dipy.stats.analysis.
peak_values
(bundle, peaks, dt, pname, bname, subject, group, ind, dir)Peak_values function finds the peak direction and peak value of a point on a streamline used while tracking (generating the tractogram) and save it in hd5 file.
Parameters: |
|
---|
dipy.stats.analysis.
set_number_of_points
()Change the number of points of streamlines in order to obtain nb_points-1 segments of equal length. Points of streamlines will be modified along the curve.
Parameters: |
|
---|---|
Returns: |
|
Examples
>>> from dipy.tracking.streamline import set_number_of_points
>>> import numpy as np
One streamline, a semi-circle:
>>> theta = np.pi*np.linspace(0, 1, 100)
>>> x = np.cos(theta)
>>> y = np.sin(theta)
>>> z = 0 * x
>>> streamline = np.vstack((x, y, z)).T
>>> modified_streamline = set_number_of_points(streamline, 3)
>>> len(modified_streamline)
3
Multiple streamlines:
>>> streamlines = [streamline, streamline[::2]]
>>> new_streamlines = set_number_of_points(streamlines, 10)
>>> [len(s) for s in streamlines]
[100, 50]
>>> [len(s) for s in new_streamlines]
[10, 10]
dipy.stats.analysis.
transform_streamlines
(streamlines, mat, in_place=False)Apply affine transformation to streamlines
Parameters: |
|
---|---|
Returns: |
|