package scipy

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type tag = [
  1. | `CKDTree
]
type t = [ `CKDTree | `Object ] Obj.t
val of_pyobject : Py.Object.t -> t
val to_pyobject : [> tag ] Obj.t -> Py.Object.t
val create : ?leafsize:Py.Object.t -> ?compact_nodes:bool -> ?copy_data:bool -> ?balanced_tree:bool -> ?boxsize: [ `Bool of bool | `F of float | `S of string | `Ndarray of [> `Ndarray ] Np.Obj.t | `I of int ] -> data:[> `Ndarray ] Np.Obj.t -> unit -> t

cKDTree(data, leafsize=16, compact_nodes=True, copy_data=False, balanced_tree=True, boxsize=None)

kd-tree for quick nearest-neighbor lookup

This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.

The algorithm used is described in Maneewongvatana and Mount 1999. The general idea is that the kd-tree is a binary trie, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.

During construction, the axis and splitting point are chosen by the 'sliding midpoint' rule, which ensures that the cells do not all become long and thin.

The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors.

For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.

Parameters ---------- data : array_like, shape (n,m) The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles, and so modifying this data will result in bogus results. The data are also copied if the kd-tree is built with copy_data=True. leafsize : positive int, optional The number of points at which the algorithm switches over to brute-force. Default: 16. compact_nodes : bool, optional If True, the kd-tree is built to shrink the hyperrectangles to the actual data range. This usually gives a more compact tree that is robust against degenerated input data and gives faster queries at the expense of longer build time. Default: True. copy_data : bool, optional If True the data is always copied to protect the kd-tree against data corruption. Default: False. balanced_tree : bool, optional If True, the median is used to split the hyperrectangles instead of the midpoint. This usually gives a more compact tree and faster queries at the expense of longer build time. Default: True. boxsize : array_like or scalar, optional Apply a m-d toroidal topology to the KDTree.. The topology is generated by :math:`x_i + n_i L_i` where :math:`n_i` are integers and :math:`L_i` is the boxsize along i-th dimension. The input data shall be wrapped into :math:`0, L_i)`. A ValueError is raised if any of the data is outside of this bound. Attributes ---------- data : ndarray, shape (n,m) The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles. The data are also copied if the kd-tree is built with `copy_data=True`. leafsize : positive int The number of points at which the algorithm switches over to brute-force. m : int The dimension of a single data-point. n : int The number of data points. maxes : ndarray, shape (m,) The maximum value in each dimension of the n data points. mins : ndarray, shape (m,) The minimum value in each dimension of the n data points. tree : object, class cKDTreeNode This attribute exposes a Python view of the root node in the cKDTree object. A full Python view of the kd-tree is created dynamically on the first access. This attribute allows you to create your own query functions in Python. size : int The number of nodes in the tree. See Also -------- KDTree : Implementation of `cKDTree` in pure Python

val count_neighbors : ?p:float -> ?weights:[ `Ndarray of [> `Ndarray ] Np.Obj.t | `Tuple of Py.Object.t ] -> ?cumulative:bool -> other:Py.Object.t -> r:[ `F of float | `Ndarray of [> `Ndarray ] Np.Obj.t ] -> [> tag ] Obj.t -> Py.Object.t

count_neighbors(self, other, r, p=2., weights=None, cumulative=True)

Count how many nearby pairs can be formed. (pair-counting)

Count the number of pairs (x1,x2) can be formed, with x1 drawn from self and x2 drawn from ``other``, and where ``distance(x1, x2, p) <= r``.

Data points on self and other are optionally weighted by the ``weights`` argument. (See below)

The algorithm we implement here is based on 1_. See notes for further discussion.

Parameters ---------- other : cKDTree instance The other tree to draw points from, can be the same tree as self. r : float or one-dimensional array of floats The radius to produce a count for. Multiple radii are searched with a single tree traversal. If the count is non-cumulative(``cumulative=False``), ``r`` defines the edges of the bins, and must be non-decreasing. p : float, optional 1<=p<=infinity. Which Minkowski p-norm to use. Default 2.0. A finite large p may cause a ValueError if overflow can occur. weights : tuple, array_like, or None, optional If None, the pair-counting is unweighted. If given as a tuple, weights0 is the weights of points in ``self``, and weights1 is the weights of points in ``other``; either can be None to indicate the points are unweighted. If given as an array_like, weights is the weights of points in ``self`` and ``other``. For this to make sense, ``self`` and ``other`` must be the same tree. If ``self`` and ``other`` are two different trees, a ``ValueError`` is raised. Default: None cumulative : bool, optional Whether the returned counts are cumulative. When cumulative is set to ``False`` the algorithm is optimized to work with a large number of bins (>10) specified by ``r``. When ``cumulative`` is set to True, the algorithm is optimized to work with a small number of ``r``. Default: True

Returns ------- result : scalar or 1-D array The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If cumulative is False, ``resulti`` contains the counts with ``(-inf if i == 0 else ri-1) < R <= ri``

Notes ----- Pair-counting is the basic operation used to calculate the two point correlation functions from a data set composed of position of objects.

Two point correlation function measures the clustering of objects and is widely used in cosmology to quantify the large scale structure in our Universe, but it may be useful for data analysis in other fields where self-similar assembly of objects also occur.

The Landy-Szalay estimator for the two point correlation function of ``D`` measures the clustering signal in ``D``. 2_

For example, given the position of two sets of objects,

  • objects ``D`` (data) contains the clustering signal, and
  • objects ``R`` (random) that contains no signal,

.. math::

\xi(r) = \frac<D, D> - 2 f <D, R> + f^2<R, R>f^2<R, R>,

where the brackets represents counting pairs between two data sets in a finite bin around ``r`` (distance), corresponding to setting `cumulative=False`, and ``f = float(len(D)) / float(len(R))`` is the ratio between number of objects from data and random.

The algorithm implemented here is loosely based on the dual-tree algorithm described in 1_. We switch between two different pair-cumulation scheme depending on the setting of ``cumulative``. The computing time of the method we use when for ``cumulative == False`` does not scale with the total number of bins. The algorithm for ``cumulative == True`` scales linearly with the number of bins, though it is slightly faster when only 1 or 2 bins are used. 5_.

As an extension to the naive pair-counting, weighted pair-counting counts the product of weights instead of number of pairs. Weighted pair-counting is used to estimate marked correlation functions (3_, section 2.2), or to properly calculate the average of data per distance bin (e.g. 4_, section 2.1 on redshift).

.. 1 Gray and Moore, 'N-body problems in statistical learning', Mining the sky, 2000, https://arxiv.org/abs/astro-ph/0012333

.. 2 Landy and Szalay, 'Bias and variance of angular correlation functions', The Astrophysical Journal, 1993, http://adsabs.harvard.edu/abs/1993ApJ...412...64L

.. 3 Sheth, Connolly and Skibba, 'Marked correlations in galaxy formation models', Arxiv e-print, 2005, https://arxiv.org/abs/astro-ph/0511773

.. 4 Hawkins, et al., 'The 2dF Galaxy Redshift Survey: correlation functions, peculiar velocities and the matter density of the Universe', Monthly Notices of the Royal Astronomical Society, 2002, http://adsabs.harvard.edu/abs/2003MNRAS.346...78H

.. 5 https://github.com/scipy/scipy/pull/5647#issuecomment-168474926

val query_ball_point : ?p:float -> ?eps:Py.Object.t -> x: [ `Ndarray of [> `Ndarray ] Np.Obj.t | `Shape_tuple_self_m_ of Py.Object.t ] -> r:[ `F of float | `Ndarray of [> `Ndarray ] Np.Obj.t ] -> [> tag ] Obj.t -> Py.Object.t

query_ball_point(self, x, r, p=2., eps=0)

Find all points within distance r of point(s) x.

Parameters ---------- x : array_like, shape tuple + (self.m,) The point or points to search for neighbors of. r : array_like, float The radius of points to return, shall broadcast to the length of x. p : float, optional Which Minkowski p-norm to use. Should be in the range 1, inf. A finite large p may cause a ValueError if overflow can occur. eps : nonnegative float, optional Approximate search. Branches of the tree are not explored if their nearest points are further than ``r / (1 + eps)``, and branches are added in bulk if their furthest points are nearer than ``r * (1 + eps)``. n_jobs : int, optional Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: 1. return_sorted : bool, optional Sorts returned indicies if True and does not sort them if False. If None, does not sort single point queries, but does sort multi-point queries which was the behavior before this option was added.

.. versionadded:: 1.2.0 return_length: bool, optional Return the number of points inside the radius instead of a list of the indices. .. versionadded:: 1.3.0

Returns ------- results : list or array of lists If `x` is a single point, returns a list of the indices of the neighbors of `x`. If `x` is an array of points, returns an object array of shape tuple containing lists of neighbors.

Notes ----- If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a cKDTree and using query_ball_tree.

Examples -------- >>> from scipy import spatial >>> x, y = np.mgrid0:4, 0:4 >>> points = np.c_x.ravel(), y.ravel() >>> tree = spatial.cKDTree(points) >>> tree.query_ball_point(2, 0, 1) 4, 8, 9, 12

val query_ball_tree : ?p:float -> ?eps:float -> other:Py.Object.t -> r:float -> [> tag ] Obj.t -> Py.Object.t

query_ball_tree(self, other, r, p=2., eps=0)

Find all pairs of points whose distance is at most r

Parameters ---------- other : cKDTree instance The tree containing points to search against. r : float The maximum distance, has to be positive. p : float, optional Which Minkowski norm to use. `p` has to meet the condition ``1 <= p <= infinity``. A finite large p may cause a ValueError if overflow can occur. eps : float, optional Approximate search. Branches of the tree are not explored if their nearest points are further than ``r/(1+eps)``, and branches are added in bulk if their furthest points are nearer than ``r * (1+eps)``. `eps` has to be non-negative.

Returns ------- results : list of lists For each element ``self.datai`` of this tree, ``resultsi`` is a list of the indices of its neighbors in ``other.data``.

val query_pairs : ?p:float -> ?eps:float -> r:float -> [> tag ] Obj.t -> Py.Object.t

query_pairs(self, r, p=2., eps=0)

Find all pairs of points whose distance is at most r.

Parameters ---------- r : positive float The maximum distance. p : float, optional Which Minkowski norm to use. ``p`` has to meet the condition ``1 <= p <= infinity``. A finite large p may cause a ValueError if overflow can occur. eps : float, optional Approximate search. Branches of the tree are not explored if their nearest points are further than ``r/(1+eps)``, and branches are added in bulk if their furthest points are nearer than ``r * (1+eps)``. `eps` has to be non-negative. output_type : string, optional Choose the output container, 'set' or 'ndarray'. Default: 'set'

Returns ------- results : set or ndarray Set of pairs ``(i,j)``, with ``i < j``, for which the corresponding positions are close. If output_type is 'ndarray', an ndarry is returned instead of a set.

val sparse_distance_matrix : ?p:[ `F of float | `T1_p_infinity of Py.Object.t ] -> other:Py.Object.t -> max_distance:float -> [> tag ] Obj.t -> Py.Object.t

sparse_distance_matrix(self, other, max_distance, p=2.)

Compute a sparse distance matrix

Computes a distance matrix between two cKDTrees, leaving as zero any distance greater than max_distance.

Parameters ---------- other : cKDTree

max_distance : positive float

p : float, 1<=p<=infinity Which Minkowski p-norm to use. A finite large p may cause a ValueError if overflow can occur.

output_type : string, optional Which container to use for output data. Options: 'dok_matrix', 'coo_matrix', 'dict', or 'ndarray'. Default: 'dok_matrix'.

Returns ------- result : dok_matrix, coo_matrix, dict or ndarray Sparse matrix representing the results in 'dictionary of keys' format. If a dict is returned the keys are (i,j) tuples of indices. If output_type is 'ndarray' a record array with fields 'i', 'j', and 'v' is returned,

val data : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t

Attribute data: get value or raise Not_found if None.

val data_opt : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t option

Attribute data: get value as an option.

val leafsize : t -> Py.Object.t

Attribute leafsize: get value or raise Not_found if None.

val leafsize_opt : t -> Py.Object.t option

Attribute leafsize: get value as an option.

val m : t -> int

Attribute m: get value or raise Not_found if None.

val m_opt : t -> int option

Attribute m: get value as an option.

val n : t -> int

Attribute n: get value or raise Not_found if None.

val n_opt : t -> int option

Attribute n: get value as an option.

val maxes : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t

Attribute maxes: get value or raise Not_found if None.

val maxes_opt : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t option

Attribute maxes: get value as an option.

val mins : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t

Attribute mins: get value or raise Not_found if None.

val mins_opt : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t option

Attribute mins: get value as an option.

val tree : t -> Py.Object.t

Attribute tree: get value or raise Not_found if None.

val tree_opt : t -> Py.Object.t option

Attribute tree: get value as an option.

val size : t -> int

Attribute size: get value or raise Not_found if None.

val size_opt : t -> int option

Attribute size: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Stdlib.Format.formatter -> t -> unit

Pretty-print the object to a formatter.