package obandit

  1. Overview
  2. Docs

The WrapRange functor wraps a bandit algorithm with the doubling trick. This heuristic allows to use a andit algorithm without knowing the reward ranges. All rewards are linearly rescaled to a range (initially given by a RangeParam). When a value is observed above the range, the bandit algorithm is restarted and the range interval is doubled in that direction.

Parameters

module R : RangeParam
module P : BanditParam
module B (Pb : BanditParam) : Bandit

Signature

val getAction : float -> int

A Mutable bandit.

The getAction function mutates the bandit one step further in the bandit game. The argument is the reward for the last action and the result is the next action. Rewards are floats in 0,1 and actions are integers in 0,n-1. The first reward is discarded. In order to use rewards larger than 1, please use the WrapDoubling functor.

OCaml

Innovation. Community. Security.