package obandit

  1. Overview
  2. Docs

The Exp3 Bandit for adversarial regret minimization with a decaying learning rate as per [1].

Parameters

module P : KBanditParam

Signature

type bandit = banditPolicy

The internal data structure of the bandit algorithm.

val initialBandit : bandit

The internal data structure of the bandit algorithm.

The initial state of the bandit algorithm.

val step : bandit -> float -> int * bandit

The initial state of the bandit algorithm.

step r advances the bandit game one step, where r is the reward for the last action. The result of this call is the next action, encoded as an integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.

OCaml

Innovation. Community. Security.