Minimax $r$-Stage Strategy for the Multi-Armed Bandit Problem
Alexander Kolnogorov
The $r$-stage multi-armed bandit problem is considered in minimax setting on the finite sufficiently large time interval $T$. A sequential control procedure with a priori specified magnitudes of learning stages and thresholds is offered. The value of the minimax risk close to $T^{\alpha}$ with $\alpha=2^{r-1}/(2^r-1)$ is obtained. The applications to information transmission and medical treatments are discussed. Considered approach is especially valuable for systems with parallel processing in which the number of stages $r$ mainly influences the total duration of the process.