package obandit

  1. Overview
  2. Docs

Ocaml Multi-Armed Bandits

%%VERSION%% — homepage

Obandit

module type BanditParam = sig ... end
module type Bandit = sig ... end

The Exp3 Bandit for adversarial regret minimization.

The UCB1 Bandit for stochastic regret minimization .

The Epsilon-Greedy Bandit with a fixed exploration rate.

module type RangeParam = sig ... end
module WrapRange (R : RangeParam) (P : BanditParam) (B : functor (Pb : BanditParam) -> Bandit) : Bandit

The WrapRange functor wraps a bandit algorithm with the doubling trick. This heuristic allows to use a andit algorithm without knowing the reward ranges. All rewards are linearly rescaled to a range (initially given by a RangeParam). When a value is observed above the range, the bandit algorithm is restarted and the range interval is doubled in that direction.

module WrapRange01 (P : BanditParam) (B : functor (Pb : BanditParam) -> Bandit) : Bandit

The WrapRange01 functor is a convenience aliasing of WrapRange with an initial "standard" range of 0,1.

OCaml

Innovation. Community. Security.