This thesis lies in the fields of artificial intelligence, sequential statistics and optimization. We focus on the problem of best (in expectation) arm identification in unstructured muti-armed bandits. This problem has two approaches with very different levels of understanding.
The fixed-confidence framework is the best understood: asymptotically optimal strategies are known, and we are interested in obtaining non-asymptotic guarantees for (if possible) simple and natural strategies. Working with Gaussian bandits, we propose a finite risk analysis of a new (asymptotically optimal) strategy using the regularity properties of this model. This strategy slightly modifies the sampling rule of the Track-and-Stop algorithm into a more conservative and interpretable rule. In the more general context of an exponential model, we propose a preliminary analysis of the asymptotic optimality of adaptive Top-Two algorithms, whose sampling rules are particularly simple.
Independently, in the fixed-budget framework, for which the existence of a hypothetical complexity remains to be demonstrated, we propose generalizations to non-parametric models of the existing bounds (upper and lower) that were available so far only for very specific models. The obtained bounds involve more precise information-theoretic quantities than the gaps (differences between the means) which appeared previously. These new quantities could be the key to measuring the complexity of fixed-budget best-arm identification.