Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning

Published in ICML (Under submission), 2025

Recommended citation: Cheol Woo Kim, Jai Moondra, Shresth Verma, Madeleine Pollack, Lingkai Kong, Milind Tambe, Swati Gupta. (2025). "Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning." In arXiv. https://arxiv.org/abs/2502.09724

Abstract: In many real-world applications of reinforcement learning (RL), deployed policies have varied impacts on different stakeholders, creating challenges in reaching consensus on how to effectively aggregate their preferences. Generalized p-means form a widely used class of social welfare functions for this purpose, with broad applications in fair resource allocation, AI alignment, and decision-making. This class includes well-known welfare functions such as Egalitarian, Nash, and Utilitarian welfare. However, selecting the appropriate social welfare function is challenging for decision-makers, as the structure and outcomes of optimal policies can be highly sensitive to the choice of p. To address this challenge, we study the concept of an α-approximate portfolio in RL, a set of policies that are approximately optimal across the family of generalized p-means for all $p \in [-\infty, 1]$. We propose algorithms to compute such portfolios and provide theoretical guarantees on the trade-offs among approximation factor, portfolio size, and computational efficiency. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of our approach in summarizing the policy space induced by varying p values, empowering decision-makers to navigate this landscape more effectively.

arXiv