package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type tag = [
  1. | `ColumnTransformer
]
type t = [ `BaseEstimator | `ColumnTransformer | `Object | `TransformerMixin ] Obj.t
val of_pyobject : Py.Object.t -> t
val to_pyobject : [> tag ] Obj.t -> Py.Object.t
val as_transformer : t -> [ `TransformerMixin ] Obj.t
val as_estimator : t -> [ `BaseEstimator ] Obj.t
val create : ?remainder: [ `Passthrough | `BaseEstimator of [> `BaseEstimator ] Np.Obj.t | `Drop ] -> ?sparse_threshold:float -> ?n_jobs:int -> ?transformer_weights:Dict.t -> ?verbose:int -> transformers: (string * [> `TransformerMixin ] Np.Obj.t * [ `S of string | `I of int | `Ss of string list | `Is of int list | `Slice of Np.Wrap_utils.Slice.t | `Arr of [> `ArrayLike ] Np.Obj.t | `Callable of Py.Object.t ]) list -> unit -> t

Applies transformers to columns of an array or pandas DataFrame.

This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.

Read more in the :ref:`User Guide <column_transformer>`.

.. versionadded:: 0.20

Parameters ---------- transformers : list of tuples List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data.

name : str Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using ``set_params`` and searched in grid search. transformer : 'drop', 'passthrough' or estimator Estimator must support :term:`fit` and :term:`transform`. Special-cased strings 'drop' and 'passthrough' are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively. columns : str, array-like of str, int, array-like of int, array-like of bool, slice or callable Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where ``transformer`` expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data `X` and can return any of the above. To select multiple columns by name or dtype, you can use :obj:`make_column_selector`.

remainder : 'drop', 'passthrough' or estimator, default='drop' By default, only the specified columns in `transformers` are transformed and combined in the output, and the non-specified columns are dropped. (default of ``'drop'``). By specifying ``remainder='passthrough'``, all remaining columns that were not specified in `transformers` will be automatically passed through. This subset of columns is concatenated with the output of the transformers. By setting ``remainder`` to be an estimator, the remaining non-specified columns will use the ``remainder`` estimator. The estimator must support :term:`fit` and :term:`transform`. Note that using this feature requires that the DataFrame columns input at :term:`fit` and :term:`transform` have identical order.

sparse_threshold : float, default=0.3 If the output of the different transformers contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use ``sparse_threshold=0`` to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.

n_jobs : int, default=None Number of jobs to run in parallel. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.

transformer_weights : dict, default=None Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.

verbose : bool, default=False If True, the time elapsed while fitting each transformer will be printed as it is completed.

Attributes ---------- transformers_ : list The collection of fitted transformers as tuples of (name, fitted_transformer, column). `fitted_transformer` can be an estimator, 'drop', or 'passthrough'. In case there were no columns selected, this will be the unfitted transformer. If there are remaining columns, the final element is a tuple of the form: ('remainder', transformer, remaining_columns) corresponding to the ``remainder`` parameter. If there are remaining columns, then ``len(transformers_)==len(transformers)+1``, otherwise ``len(transformers_)==len(transformers)``.

named_transformers_ : :class:`~sklearn.utils.Bunch` Read-only attribute to access any transformer by given name. Keys are transformer names and values are the fitted transformer objects.

sparse_output_ : bool Boolean flag indicating whether the output of ``transform`` is a sparse matrix or a dense numpy array, which depends on the output of the individual transformers and the `sparse_threshold` keyword.

Notes ----- The order of the columns in the transformed feature matrix follows the order of how the columns are specified in the `transformers` list. Columns of the original feature matrix that are not specified are dropped from the resulting transformed feature matrix, unless specified in the `passthrough` keyword. Those columns specified with `passthrough` are added at the right to the output of the transformers.

See also -------- sklearn.compose.make_column_transformer : convenience function for combining the outputs of multiple transformer objects applied to column subsets of the original feature space. sklearn.compose.make_column_selector : convenience function for selecting columns based on datatype or the columns name with a regex pattern.

Examples -------- >>> import numpy as np >>> from sklearn.compose import ColumnTransformer >>> from sklearn.preprocessing import Normalizer >>> ct = ColumnTransformer( ... ('norm1', Normalizer(norm='l1'), [0, 1]), ... ('norm2', Normalizer(norm='l1'), slice(2, 4))) >>> X = np.array([0., 1., 2., 2.], ... [1., 1., 0., 1.]) >>> # Normalizer scales each row of X to unit norm. A separate scaling >>> # is applied for the two first and two last elements of each >>> # row independently. >>> ct.fit_transform(X) array([0. , 1. , 0.5, 0.5], [0.5, 0.5, 0. , 1. ])

val fit : ?y:[> `ArrayLike ] Np.Obj.t -> x:[ `Dataframe of Py.Object.t | `Arr of [> `ArrayLike ] Np.Obj.t ] -> [> tag ] Obj.t -> t

Fit all transformers using X.

Parameters ---------- X : array-like, dataframe of shape (n_samples, n_features) Input data, of which specified subsets are used to fit the transformers.

y : array-like of shape (n_samples,...), default=None Targets for supervised learning.

Returns ------- self : ColumnTransformer This estimator

val fit_transform : ?y:[> `ArrayLike ] Np.Obj.t -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Fit all transformers, transform the data and concatenate results.

Parameters ---------- X : array-like, dataframe of shape (n_samples, n_features) Input data, of which specified subsets are used to fit the transformers.

y : array-like of shape (n_samples,), default=None Targets for supervised learning.

Returns ------- X_t : array-like, sparse matrix of shape (n_samples, sum_n_components) hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.

val get_feature_names : [> tag ] Obj.t -> string list

Get feature names from all transformers.

Returns ------- feature_names : list of strings Names of the features produced by transform.

val get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : dict Parameter names mapped to their values.

val set_params : ?kwargs:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

Valid parameter keys can be listed with ``get_params()``.

Returns ------- self

val transform : x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Transform X separately by each transformer, concatenate results.

Parameters ---------- X : array-like, dataframe of shape (n_samples, n_features) The data to be transformed by subset.

Returns ------- X_t : array-like, sparse matrix of shape (n_samples, sum_n_components) hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.

val transformers_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute transformers_: get value or raise Not_found if None.

val transformers_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute transformers_: get value as an option.

val named_transformers_ : t -> Dict.t

Attribute named_transformers_: get value or raise Not_found if None.

val named_transformers_opt : t -> Dict.t option

Attribute named_transformers_: get value as an option.

val sparse_output_ : t -> bool

Attribute sparse_output_: get value or raise Not_found if None.

val sparse_output_opt : t -> bool option

Attribute sparse_output_: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Stdlib.Format.formatter -> t -> unit

Pretty-print the object to a formatter.