package obandit

  1. Overview
  2. Docs

This functor wraps a bandit algorithm with the doubling trick. This means that all rewards are rescaled according to a scale (initially, 1). When a value is observed above the scale, the bandit algorithm is restarted and the scale is doubled. This is useful when reward scale is unknown and larger than 1.

Parameters

module P : BanditParam
module B (Pb : BanditParam) : Bandit

Signature

val getAction : float -> int

A Mutable bandit.

Give the positive reward for the last action and choose the next action, encoded as an integer in the 0,n-1 range for n actions. Rewards should be between 0 and 1. For rewards larger than 1, use the WrapDoubling functor. The first reward is discarded.

OCaml

Innovation. Community. Security.