package tdigest

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type delta =
  1. | Merging of float
  2. | Discrete

delta is the compression factor, the max fraction of mass that can be owned by one centroid (bigger, up to 1.0, means more compression). ~delta:Discrete switches off TDigest behavior and treats the distribution as discrete, with no merging and exact values reported.

type k =
  1. | Manual
  2. | Automatic of float

k is a size threshold that triggers recompression as the TDigest grows during input. ~k:Manual disables automatic recompression.

type cx =
  1. | Always
  2. | Growth of float

cx (default: 1.1) specifies how often to update cached cumulative totals used for quantile estimation during ingest. This is a tradeoff between performance and accuracy. ~cx:Always will recompute cumulatives on every new datapoint, but the performance drops by 15-25x or even more depending on the size of the dataset.

type t
type info = {
  1. count : int;
  2. size : int;
  3. cumulates_count : int;
  4. compress_count : int;
  5. auto_compress_count : int;
}

count: sum of all n

size: size of the internal B-Tree. Calling Tdigest.compress will usually reduce this size.

cumulates_count: number of cumulate operations over the life of this Tdigest instance.

compress_count: number of compression operations over the life of this Tdigest instance.

auto_cumulates_count: number of compression operations over the life of this Tdigest instance that were not triggered by a manual call to Tdigest.compress.

val create : ?delta:delta -> ?k:k -> ?cx:cx -> unit -> t

Tdigest.create ?delta ?k ?cx ()

Allocate an empty Tdigest instance.

delta (default: 0.01) is the compression factor, the max fraction of mass that can be owned by one centroid (bigger, up to 1.0, means more compression). ~delta:Discrete switches off TDigest behavior and treats the distribution as discrete, with no merging and exact values reported.

k (default: 25) is a size threshold that triggers recompression as the TDigest grows during input. ~k:Manual disables automatic recompression.

cx (default: 1.1) specifies how often to update cached cumulative totals used for quantile estimation during ingest. This is a tradeoff between performance and accuracy. ~cx:Always will recompute cumulatives on every new datapoint, but the performance drops by 15-25x or even more depending on the size of the dataset.

val info : t -> info

Tdigest.info td returns a record with these fields:

count: sum of all n

size: size of the internal B-Tree. Calling Tdigest.compress will usually reduce this size.

cumulates_count: number of cumulate operations over the life of this Tdigest instance.

compress_count: number of compression operations over the life of this Tdigest instance.

auto_cumulates_count: number of compression operations over the life of this Tdigest instance that were not triggered by a manual call to Tdigest.compress.

val add : ?n:int -> data:float -> t -> t

Tdigest.add ?n ~data td

Incorporate a value (data) having count n (default: 1) into a new Tdigest.

val add_list : ?n:int -> float list -> t -> t

Tdigest.add_list ?n ll td

Incorporate a list of values each having count n (default: 1) into a new Tdigest.

val p_rank : t -> float -> t * float option

Tdigest.p_rank td q For a value q estimate the percentage (0..1) of values <= q.

Returns a new Tdigest to reuse intermediate computations.

val p_ranks : t -> float list -> t * float option list

Same as Tdigest.p_rank but for a list of values.

Returns a new Tdigest to reuse intermediate computations.

val percentile : t -> float -> t * float option

Tdigest.percentile td p

For a percentage p (0..1) estimate the smallest value q at which at least p percent of the values <= q.

For discrete distributions, this selects q using the Nearest Rank Method https://en.wikipedia.org/wiki/Percentile#The_Nearest_Rank_method

For continuous distributions, interpolates data values between count-weighted bracketing means.

Returns a new Tdigest to reuse intermediate computations.

val percentiles : t -> float list -> t * float option list

Same as Tdigest.percentile but for a list of values.

Returns a new Tdigest to reuse intermediate computations.

val compress : ?delta:delta -> t -> t

Tdigest.compress ?delta td

Manual recompression. Not guaranteed to reduce size further if too few values have been added since the last compression.

delta (default: initial value passed to Tdigest.create) The compression level to use for this operation only. This does not alter the delta used by the Tdigest going forward.

val to_string : t -> t * string

Tdigest.to_string td

Serialize the internal state into a binary string that can be stored or concatenated with other such binary strings.

Use Tdigest.of_string to create a new Tdigest instance from it.

Returns a new Tdigest to reuse intermediate computations.

val of_string : ?delta:delta -> ?k:k -> ?cx:cx -> string -> t

Tdigest.of_string ?delta ?k ?cx str

See Tdigest.create for the meaning of the optional parameters.

Allocate a new Tdigest from a string or concatenation of strings originally created by Tdigest.to_string.

module Testing : sig ... end

For internal use