package obandit

  1. Overview
  2. Docs

The $\alpha$-UCB Bandit for stochastic regret minimization described in [1] .

Parameters

module P : AlphaUCBParam

Signature

type bandit = banditEstimates

The internal data structure of the bandit algorithm.

val initialBandit : bandit

The internal data structure of the bandit algorithm.

The initial state of the bandit algorithm.

val step : bandit -> float -> int * bandit

The initial state of the bandit algorithm.

step r advances the bandit game one step, where r is the reward for the last action. The result of this call is the next action, encoded as an integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.

OCaml

Innovation. Community. Security.