Abstract We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation("bandit") strategies.We provide sharo regret analysis of this algorithm in a standard stochastic noise setting,demo
Contextual bandit learning is a reinforcement learning problemwhere the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context