package obandit

  1. Overview
  2. Docs

UCB1 Bandit.

Parameters

module P : BanditParam

Signature

val getAction : float -> int

A Mutable bandit.

Give the positive reward for the last action and choose the next action, encoded as an integer in the 0,n-1 range for n actions. Rewards should be between 0 and 1. For rewards larger than 1, use the WrapDoubling functor. The first reward is discarded.

OCaml

Innovation. Community. Security.