package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
val get_py : string -> Py.Object.t

Get an attribute of this module as a Py.Object.t. This is useful to pass a Python function to another function.

module EllipticEnvelope : sig ... end
module EmpiricalCovariance : sig ... end
module GraphicalLasso : sig ... end
module GraphicalLassoCV : sig ... end
module LedoitWolf : sig ... end
module MinCovDet : sig ... end
module OAS : sig ... end
module ShrunkCovariance : sig ... end
val empirical_covariance : ?assume_centered:bool -> x:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t

Computes the Maximum likelihood covariance estimator

Parameters ---------- X : ndarray of shape (n_samples, n_features) Data from which to compute the covariance estimate

assume_centered : bool, default=False If True, data will not be centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False, data will be centered before computation.

Returns ------- covariance : ndarray of shape (n_features, n_features) Empirical covariance (Maximum Likelihood Estimator).

Examples -------- >>> from sklearn.covariance import empirical_covariance >>> X = [1,1,1],[1,1,1],[1,1,1], ... [0,0,0],[0,0,0],[0,0,0] >>> empirical_covariance(X) array([0.25, 0.25, 0.25], [0.25, 0.25, 0.25], [0.25, 0.25, 0.25])

val fast_mcd : ?support_fraction:float -> ?cov_computation_method:Py.Object.t -> ?random_state:int -> x:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t * [> `ArrayLike ] Np.Obj.t * Py.Object.t

Estimates the Minimum Covariance Determinant matrix.

Read more in the :ref:`User Guide <robust_covariance>`.

Parameters ---------- X : array-like of shape (n_samples, n_features) The data matrix, with p features and n samples.

support_fraction : float, default=None The proportion of points to be included in the support of the raw MCD estimate. Default is `None`, which implies that the minimum value of `support_fraction` will be used within the algorithm: `(n_sample + n_features + 1) / 2`. This parameter must be in the range (0, 1).

cov_computation_method : callable, default=:func:`sklearn.covariance.empirical_covariance` The function which will be used to compute the covariance. Must return an array of shape (n_features, n_features).

random_state : int or RandomState instance, default=None Determines the pseudo random number generator for shuffling the data. Pass an int for reproducible results across multiple function calls. See :term: `Glossary <random_state>`.

Returns ------- location : ndarray of shape (n_features,) Robust location of the data.

covariance : ndarray of shape (n_features, n_features) Robust covariance of the features.

support : ndarray of shape (n_samples,), dtype=bool A mask of the observations that have been used to compute the robust location and covariance estimates of the data set.

Notes ----- The FastMCD algorithm has been introduced by Rousseuw and Van Driessen in 'A Fast Algorithm for the Minimum Covariance Determinant Estimator, 1999, American Statistical Association and the American Society for Quality, TECHNOMETRICS'. The principle is to compute robust estimates and random subsets before pooling them into a larger subsets, and finally into the full data set. Depending on the size of the initial sample, we have one, two or three such computation levels.

Note that only raw estimates are returned. If one is interested in the correction and reweighting steps described in RouseeuwVan_, see the MinCovDet object.

References ----------

.. RouseeuwVan A Fast Algorithm for the Minimum Covariance Determinant Estimator, 1999, American Statistical Association and the American Society for Quality, TECHNOMETRICS

.. Butler1993 R. W. Butler, P. L. Davies and M. Jhun, Asymptotics For The Minimum Covariance Determinant Estimator, The Annals of Statistics, 1993, Vol. 21, No. 3, 1385-1400

val graphical_lasso : ?cov_init:[> `ArrayLike ] Np.Obj.t -> ?mode:[ `Cd | `Lars ] -> ?tol:float -> ?enet_tol:float -> ?max_iter:int -> ?verbose:int -> ?return_costs:bool -> ?eps:float -> ?return_n_iter:bool -> emp_cov:[> `ArrayLike ] Np.Obj.t -> alpha:float -> unit -> [> `ArrayLike ] Np.Obj.t * [> `ArrayLike ] Np.Obj.t * Py.Object.t * int

l1-penalized covariance estimator

Read more in the :ref:`User Guide <sparse_inverse_covariance>`.

.. versionchanged:: v0.20 graph_lasso has been renamed to graphical_lasso

Parameters ---------- emp_cov : ndarray of shape (n_features, n_features) Empirical covariance from which to compute the covariance estimate.

alpha : float The regularization parameter: the higher alpha, the more regularization, the sparser the inverse covariance. Range is (0, inf].

cov_init : array of shape (n_features, n_features), default=None The initial guess for the covariance.

mode : 'cd', 'lars', default='cd' The Lasso solver to use: coordinate descent or LARS. Use LARS for very sparse underlying graphs, where p > n. Elsewhere prefer cd which is more numerically stable.

tol : float, default=1e-4 The tolerance to declare convergence: if the dual gap goes below this value, iterations are stopped. Range is (0, inf].

enet_tol : float, default=1e-4 The tolerance for the elastic net solver used to calculate the descent direction. This parameter controls the accuracy of the search direction for a given column update, not of the overall parameter estimate. Only used for mode='cd'. Range is (0, inf].

max_iter : int, default=100 The maximum number of iterations.

verbose : bool, default=False If verbose is True, the objective function and dual gap are printed at each iteration.

return_costs : bool, default=Flase If return_costs is True, the objective function and dual gap at each iteration are returned.

eps : float, default=eps The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Default is `np.finfo(np.float64).eps`.

return_n_iter : bool, default=False Whether or not to return the number of iterations.

Returns ------- covariance : ndarray of shape (n_features, n_features) The estimated covariance matrix.

precision : ndarray of shape (n_features, n_features) The estimated (sparse) precision matrix.

costs : list of (objective, dual_gap) pairs The list of values of the objective function and the dual gap at each iteration. Returned only if return_costs is True.

n_iter : int Number of iterations. Returned only if `return_n_iter` is set to True.

See Also -------- GraphicalLasso, GraphicalLassoCV

Notes ----- The algorithm employed to solve this problem is the GLasso algorithm, from the Friedman 2008 Biostatistics paper. It is the same algorithm as in the R `glasso` package.

One possible difference with the `glasso` R package is that the diagonal coefficients are not penalized.

val ledoit_wolf : ?assume_centered:bool -> ?block_size:int -> x:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t * float

Estimates the shrunk Ledoit-Wolf covariance matrix.

Read more in the :ref:`User Guide <shrunk_covariance>`.

Parameters ---------- X : array-like of shape (n_samples, n_features) Data from which to compute the covariance estimate

assume_centered : bool, default=False If True, data will not be centered before computation. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, data will be centered before computation.

block_size : int, default=1000 Size of the blocks into which the covariance matrix will be split. This is purely a memory optimization and does not affect results.

Returns ------- shrunk_cov : ndarray of shape (n_features, n_features) Shrunk covariance.

shrinkage : float Coefficient in the convex combination used for the computation of the shrunk estimate.

Notes ----- The regularized (shrunk) covariance is:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features

val ledoit_wolf_shrinkage : ?assume_centered:bool -> ?block_size:int -> x:[> `ArrayLike ] Np.Obj.t -> unit -> float

Estimates the shrunk Ledoit-Wolf covariance matrix.

Read more in the :ref:`User Guide <shrunk_covariance>`.

Parameters ---------- X : array-like of shape (n_samples, n_features) Data from which to compute the Ledoit-Wolf shrunk covariance shrinkage.

assume_centered : bool, default=False If True, data will not be centered before computation. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, data will be centered before computation.

block_size : int, default=1000 Size of the blocks into which the covariance matrix will be split.

Returns ------- shrinkage : float Coefficient in the convex combination used for the computation of the shrunk estimate.

Notes ----- The regularized (shrunk) covariance is:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features

val log_likelihood : emp_cov:[> `ArrayLike ] Np.Obj.t -> precision:[> `ArrayLike ] Np.Obj.t -> unit -> float

Computes the sample mean of the log_likelihood under a covariance model

computes the empirical expected log-likelihood (accounting for the normalization terms and scaling), allowing for universal comparison (beyond this software package)

Parameters ---------- emp_cov : ndarray of shape (n_features, n_features) Maximum Likelihood Estimator of covariance.

precision : ndarray of shape (n_features, n_features) The precision matrix of the covariance model to be tested.

Returns ------- log_likelihood_ : float Sample mean of the log-likelihood.

val oas : ?assume_centered:bool -> x:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t * float

Estimate covariance with the Oracle Approximating Shrinkage algorithm.

Parameters ---------- X : array-like of shape (n_samples, n_features) Data from which to compute the covariance estimate.

assume_centered : bool, default=False If True, data will not be centered before computation. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, data will be centered before computation.

Returns ------- shrunk_cov : array-like of shape (n_features, n_features) Shrunk covariance.

shrinkage : float Coefficient in the convex combination used for the computation of the shrunk estimate.

Notes ----- The regularised (shrunk) covariance is:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features

The formula we used to implement the OAS is slightly modified compared to the one given in the article. See :class:`OAS` for more details.

val shrunk_covariance : ?shrinkage:float -> emp_cov:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t

Calculates a covariance matrix shrunk on the diagonal

Read more in the :ref:`User Guide <shrunk_covariance>`.

Parameters ---------- emp_cov : array-like of shape (n_features, n_features) Covariance matrix to be shrunk

shrinkage : float, default=0.1 Coefficient in the convex combination used for the computation of the shrunk estimate. Range is 0, 1.

Returns ------- shrunk_cov : ndarray of shape (n_features, n_features) Shrunk covariance.

Notes ----- The regularized (shrunk) covariance is given by:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features