(For a full list see Google Scholar)
We propose the first theoretical framework to handle the nonconvexity and stochasticity nature of within-task CMDPs (safe RL) while exploiting inter-task dependency and intra-task geometries for meta-safe RL (Meta-SRL). We obtain task-averaged regret guarantees for the reward maximization and constraint violations using gradient-based meta-learning and show that the task-averaged optimality gap and constraint satisfaction improve with task-similarity. Our meta-algorithm performs inexact online learning on the upper bounds of intra-task optimality gap and constraint violations estimated by off-policy stationary distribution corrections. Furthermore, we enable the learning rates to be adapted for every task and extend our approach to settings with the dynamically changing task environments.
Vanshaj Khattar, Yuhao Ding, Bilgehan Sel, Javad Lavaei, Ming Jin
ICLR (2023) (spotlight presentation) (openreview | pdf)
This is the first study on the expressibility and learnability of solution functions of convex optimization and their multi-layer architectural extension. Some interesting results include: 1) the class of solution functions of linear programming (LP) and quadratic programming (QP) is a universal approximant, 2) compositionality in the form of deep architecture can achieve a substantial reduction in rate-distortion, and 3) the statistical bounds of empirical covering numbers for LP/QP, as well as a generic optimization problem (possibly nonconvex) can be characterized by tame geometry.
Ming Jin, Vanshaj Khattar, Harshal Kaushik, Bilgehan Sel, Ruoxi Jia
AAAI (2023) (oral presentation) (arXiv | pdf)
The CityLearn Challenge is an international competition for reinforcement learning (RL) solutions to address grand challenges in power and energy systems. In this paper, we present our winning solution using the solution function of optimization as policies to compute the actions for sequential decision-making, while notably adapting the parameters of the optimization model from online observations. Algorithmically, this is achieved by an evolutionary algorithm under a novel trajectory-based guidance scheme. Formally, the global convergence property is established.
Vanshaj Khattar, Ming Jin
AAAI (2023) AI for Social Impact Track (arXiv | pdf)