1. Overview
  2. Docs

The WrapRange functor wraps a bandit algorithm with the doubling trick. This heuristic allows to use a bandit algorithm without knowing the reward ranges. All rewards are linearly rescaled to a range (initially given by a RangeParam). When a value is observed above the range, the bandit algorithm is restarted and the range interval is doubled in that direction.


module R : RangeParam
module B : Bandit


type bandit = B.bandit
val initialBandit : bandit rangedBandit