package obandit

You can search for identifiers within the package.

in-package search v0.2.0

On This Page

Obandit

package obandit

obandit
- Obandit
  - Bandit
  - BanditParam
  - MakeEpsilonGreedy
    
    P
  - MakeExp3
    
    P
  - MakeUCB1
    
    P
  - RangeParam
  - WrapRange
    
    B
    
    Pb
    
    P
    
    R
  - WrapRange01
    
    B
    
    Pb
    
    P

Legend:
Library
Module
Module type
Parameter
Class
Class type

Ocaml Multi-Armed Bandits

%%VERSION%% — homepage

Obandit

module type BanditParam = sig ... end

module type Bandit = sig ... end

module MakeExp3 (P : BanditParam) : Bandit

The Exp3 Bandit for adversarial regret minimization.

module MakeUCB1 (P : BanditParam) : Bandit

The UCB1 Bandit for stochastic regret minimization .

module MakeEpsilonGreedy (P : BanditParam) : Bandit

The Epsilon-Greedy Bandit with a fixed exploration rate.

module type RangeParam = sig ... end

module WrapRange
  (R : RangeParam)
  (P : BanditParam)
  (B : functor (Pb : BanditParam) -> Bandit) : 
  Bandit

The WrapRange functor wraps a bandit algorithm with the doubling trick. This heuristic allows to use a andit algorithm without knowing the reward ranges. All rewards are linearly rescaled to a range (initially given by a RangeParam). When a value is observed above the range, the bandit algorithm is restarted and the range interval is doubled in that direction.

module WrapRange01
  (P : BanditParam)
  (B : functor (Pb : BanditParam) -> Bandit) : 
  Bandit

The WrapRange01 functor is a convenience aliasing of WrapRange with an initial "standard" range of 0,1.