An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

Generative AI & LLMs
Published: arXiv: 2603.26647v1
Authors

Ashutosh Soni Peizhong Ju Atilla Eryilmaz Ness B. Shroff

Abstract

We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. This framework models real-world systems with both structural dependencies and volatility, such as social networks where users provide side-information about their peers' preferences, yet are not always online to be queried. To address this challenge, we propose UCB-LP-A, a novel policy that leverages a Linear Programming (LP) approach to optimize exploration-exploitation trade-offs under stochastic availability. Unlike standard network bandit algorithms that assume constant access, UCB-LP-A computes an optimal sampling distribution over the realizable activation sets, ensuring that the necessary observations are gathered using only the currently active arms. We derive a theoretical upper bound on the regret of our policy, characterizing the impact of both the network structure and the activation probabilities. Finally, we demonstrate through numerical simulations that UCB-LP-A significantly outperforms existing heuristics that ignore either the side-information or the availability constraints.

Paper Summary

Problem
The main problem this paper addresses is the Multi-Armed Bandit (MAB) problem, where a decision-maker sequentially selects an action and observes a reward under uncertainty. However, in real-world systems, actions are often correlated and their availability can change dynamically. The paper aims to tackle the challenge of stochastic availability, where the set of feasible actions varies dynamically in each round.
Key Innovation
The key innovation of this work is the development of a novel policy called UCB-LP-A, which leverages a Linear Programming (LP) approach to optimize exploration-exploitation trade-offs under stochastic availability. Unlike standard network bandit algorithms, UCB-LP-A computes an optimal sampling distribution over the realizable activation sets, ensuring that the necessary observations are gathered using only the currently active arms.
Practical Impact
This research has significant practical implications for various real-world systems, such as: * Social networks: where users provide side-information about their peers' preferences, yet are not always online to be queried. * Communication networks: where packets are sent over a path and traversing one path reveals observations for delays on each of the constituent links. * Advertising in online social networks: where promotional offers can be targeted to users and their friends. The UCB-LP-A policy can be applied to these systems to optimize exploration-exploitation trade-offs under stochastic availability, leading to improved decision-making and reduced regret.
Analogy / Intuitive Explanation
Imagine you're at a restaurant with multiple menu items, each with an unknown quality. You want to try each item to find the best one, but you can only try one item at a time. However, if you try an item, you'll also get some information about the other items on the menu that are related to it. This is similar to the MAB problem, where choosing an action not only generates a reward for itself, but also reveals some useful side-information for a subset of the remaining actions. The UCB-LP-A policy is like a smart waiter who optimally selects the menu items to try, taking into account the relationships between the items and the availability of each item.
Paper Information
Categories:
cs.LG eess.SY
Published Date:

arXiv ID:

2603.26647v1

Quick Actions