package obandit

You can search for identifiers within the package.

in-package search v0.2.0

On This Page

Parameters
Signature

package obandit

obandit
- Obandit
  - AlphaPhiUCBParam
  - AlphaUCBParam
  - Bandit
  - DecayingEpsilonGreedyParam
  - EpsilonGreedyParam
  - FixedExp3Param
  - HorizonExp3Param
  - KBanditParam
  - MakeAlphaPhiUCB
    
    P
  - MakeAlphaUCB
    
    P
  - MakeDecayingEpsilonGreedy
    
    P
  - MakeDecayingExp3
    
    P
  - MakeEpsilonGreedy
    
    P
  - MakeExp3
    
    P
  - MakeFixedExp3
    
    P
  - MakeHorizonExp3
    
    P
  - MakeParametrizableEpsilonGreedy
    
    P
  - MakeUCB1
    
    P
  - RangeParam
  - RangedBandit
  - RateBanditParam
  - WrapRange
    
    B
    
    R
  - WrapRange01
    
    B

Legend:
Library
Module
Module type
Parameter
Class
Class type

The Exp3 Bandit for adversarial regret minimization with a decaying learning rate as per [1].

Parameters

module P : FixedExp3Param

Signature

type bandit = banditPolicy

The internal data structure of the bandit algorithm.

val initialBandit : bandit

The internal data structure of the bandit algorithm.

The initial state of the bandit algorithm.

val step : bandit -> float -> int * bandit

The initial state of the bandit algorithm.

step r advances the bandit game one step, where r is the reward for the last action. The result of this call is the next action, encoded as an integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.