Minimax $r$-Stage Strategy for the Multi-Armed Bandit Problem

Kolnogorov, Alexander

Root / Conference Proceedings / 9th IFAC Workshop "ADAPTATION AND LEARNING IN CONTROL AND SIGNAL PROCESSING" (ALCOSP'07) / Minimax $r$-Stage Strategy for the Multi-Armed Bandit Problem

Minimax $r$-Stage Strategy for the Multi-Armed Bandit Problem

Alexander Kolnogorov

The $r$-stage multi-armed bandit problem is considered in minimax setting on the finite sufficiently large time interval $T$. A sequential control procedure with a priori specified magnitudes of learning stages and thresholds is offered. The value of the minimax risk close to $T^{\alpha}$ with $\alpha=2^{r-1}/(2^r-1)$ is obtained. The applications to information transmission and medical treatments are discussed. Considered approach is especially valuable for systems with parallel processing in which the number of stages $r$ mainly influences the total duration of the process.

File: download