Splitting for planning

A divide-and-conquer approach to plan under constraints - by Panagiotis Grontas

Our life is filled with (boring) planning tasks. Planning is the art of balancing the immediate and long-term effects of our actions, especially when the future is uncertain. Typically, this balancing of short-term and long-term is done according to some notion of reward.

Imagine you are a student and studying leads you closer to graduation in the long term, yet strips away some joys of life in the short term. How do you mediate today’s sacrifice with tomorrow’s reward?

One mathematical tool to answer such questions is Markov Decision Processes, or MDPs for short.

So, what are MDPs? MDPs look a lot like a video game:

We have an intelligent agent that reads the state of its environment (the screenshot of your video game). Based on the state, the agent takes an action (the buttons of your controller).
As a result, the state changes and the agent gets a reward (the screenshot updates and your score increases).

Once the environment the actions, and the rewards are given, MDPs can tell us how the agent should act, given the state of the environment, in order to maximize its reward. Using this game formalism, MDPs have been used to solve extremely difficult planning tasks: from games (like playing chess at a superhuman level), to real-world problems (like warehouse logistics), to futuristic applications (like the control of fusion reactors).

A critical ingredient in the success of MDPs is correctly describing their end goal, i.e. the reward. Yet, defining a reward can be challenging when trying to satisfy multiple conflicting specifications:

In the previous example, our student tries to maximize academic progress, but should keep sleep deprivation to a minimum and coffee consumption to a maximum.
Closer to the real world, a warehouse manager makes monthly plans to maximize profit by ordering/storing/selling products, but should not exceed the capacity limitations of her warehouse.

Notice that some specifications arise as rewards (progress, profit), while others appear as constraints (sleep, coffee, capacity). Our goal is to maximize reward while respecting constraints. Unfortunately, adding constraints to MDPs makes them drastically more complex and difficult to solve.

To address this problem, we apply a divide-and-conquer approach. In a nutshell, our method proceeds by alternating two simple steps:

we teach a greedy agent to obtain reward from the MDP;
we teach a safe agent to respect constraints. Then, we get the

best of both worlds by appropriately combining the behavior of the two agents. In our student’s example, this means finding the optimal proportion of studying, sleeping, and drinking coffee. Maybe answering this question is what motivated us in the first place?

Text by Panagiotis Grontas; illustration created with Microsoft Powerpoint and ChatGPT

Blog

Geo-trax

Extract georeferenced vehicle trajectories from drone videos - by Robert Fonod

When you do not know how the world will react

Use distributionally robust Stackelberg games to make decisions - by Zheyu Wu

The Recipe for Adapting

Meta-learning, the secret to fast recalibration of controllers - by Salma Elfeki

When Water Meets Lithium

Can batteries support Run-of-River power plants during periods of low electricity prices? - by Yannick Schüpbach

The Coke vs. Pepsi Battle

A computational approach to uncover hidden opinion dynamics - by Michelle Egger

Know Thy Voltage

Using real-time sensors to detect early warning signs of power grid instability - by Ioannis Papadopoulos

Flex your heat pump!

Optimize your heat pump to stay warm, earn money, and support the grid - by Boxuan Yao

SIMBa

System Identification Methods leveraging Backpropagation - by Muhammad Zakwan

Splitting for planning

A divide-and-conquer approach to plan under constraints - by Panagiotis Grontas

A tale of bold flight and gentle landings

Strategy guidance for teaching robots to stay within safe bounds - by Nicholas Behr

A game plan for electricity

How smart incentives can keep the grid stable - by Jonas Matt

How to win the Olympics

Finding an optimal strategy for curling - by Patrick Oberlin

No perfect solution

Balancing climate goals in battery-electric train design - by Seraina Wurster

Stop overplanning!

Optimising long-term decisions using short-term insights - by Junan Lin

No model? Call SoS!

Using SoS optimisation to find a robust controller for unknown systems - by Caroline Gärtner

Playing tag with robots

Real-time tracking under real-life constraints - by Anastasis Vlachos

The power grid is changing. Are we ready?

Preventing blackouts in the age of renewables - by Emma Laub

You cannot always be safe

Risk management in automation - by Jonathan Hilberg

Madupite

Solve bigger Markov Decision Processes in less time - by Matilde Gargiani

Model-free optimal control

How to cope when your system is too complex to model

Where research gets real

Living labs: Combining research and practice where people live and work

Trade to save

Reducing energy cost through clustering and trading - by Cara Koepele

Preparing for the unexpected

A dynamic study for the power transmission grid of the future - by Anne Matthijs

Stochastic optimal control

A way to find happiness? - by Claudia Fischer

Giving robots a sense of touch

Practice makes perfect (controllers) - by Raul Cruz-Oliver

A balancing act

How small-scale devices work together to stabilise the power grid - by Jan Brändle

Who likes to be stuck in traffic?

Data-driven control helps to reduce traffic congestion - by Philippe Brigger

Splitting for planning

Related articles

Direct

Follow