Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-07T11:45:41.191Z Has data issue: false hasContentIssue false

A multi-task deep reinforcement learning-based recommender system for co-optimizing energy, comfort, and air quality in commercial buildings with humans-in-the-loop

Published online by Cambridge University Press:  04 November 2024

Stephen Xia
Affiliation:
Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, USA
Peter Wei
Affiliation:
Department of Electrical Engineering, Columbia University, New York, NY, USA
Yanchen Liu
Affiliation:
Department of Electrical Engineering, Columbia University, New York, NY, USA
Andrew Sonta
Affiliation:
School of Architecture, Civil and Environmental Engineering, EPFL, Lausanne, Vaud, Switzerland
Xiaofan Jiang*
Affiliation:
Department of Electrical Engineering, Columbia University, New York, NY, USA
*
Corresponding author: Xiaofan Jiang; Email: jiang@ee.columbia.edu

Abstract

We introduce a novel human-centric deep reinforcement learning recommender system designed to co-optimize energy consumption, thermal comfort, and air quality in commercial buildings. Existing approaches typically optimize these objectives separately or focus solely on controlling energy-consuming building resources without directly engaging occupants. We develop a deep reinforcement learning architecture based on multitask learning with humans-in-the-loop and demonstrate how it can jointly learn energy savings, comfort, and air quality improvements for different building and occupant actions. In addition to controlling typical building resources (e.g., thermostat setpoint), our system provides real-time actionable recommendations that occupants can take (e.g., move to a new location) to co-optimize energy, comfort, and air quality. Through real deployments across multiple commercial buildings, we show that our multitask deep reinforcement learning recommender system has the potential to reduce energy consumption by up to 8% in energy-focused optimization, improve all objectives by 5–10% in joint optimization, and improve thermal comfort by up to 21% in comfort and air quality-focused optimization compared to existing solutions.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. RECA’s system architecture. To account for the challenge of cold-start, RECA leverages a simulation environment with statistical models to estimate future building states and generate more training examples from a past history of observed building states and recommendations.

Figure 1

Figure 2. The embedding layer in our network converts the physical representation of occupant locations into a feature representation through a learned embedding matrix.

Figure 2

Figure 3. Our deep Q-network architecture includes an embedding layer for learning occupant locations and has three output tasks for learning energy savings, comfort and air quality improvements for each action.

Figure 3

Figure 4. The web interface for occupant feedback displays a list of recommendations. Estimated energy savings, comfort, and air quality improvements are shown with each recommendation.

Figure 4

Figure 5. The digital twin is constructed using a tripartite graph structure, from Wei et al. (2017a) and Chen et al. (2018), to store relations between energy resources, spaces, and occupants. The building state can be extracted from the graph structure to build visualizations.

Figure 5

Figure 6. To alleviate cold start, we create a simulation environment based on the digital twin, which takes as input the current building state and simulates the next state, energy savings, comfort improvement, and air quality improvement based on an action.

Figure 6

Figure 7. Comparison of EnergyPlus and random forest (RF) prediction for a single room in our deployments.

Figure 7

Figure 8. For a building state, the reinforcement learning agent provides an action to the simulation environment. The next state, energy savings, comfort, and air quality improvements are returned to the agent to tune the policy.

Figure 8

Table 1. Comparison of our deep Q-network architecture with and without an embedding layer, against existing strategies, on simulated building episodes with four different weighting combinations to emphasize different optimizations

Figure 9

Figure 9. Acceptance rate during the first two weeks (control) and second two weeks after retraining (adapted) for the setpoint (left) and move (right) recommendations.

Figure 10

Figure 10. Location optimization: At the red line, the occupant moves to location B, and HVAC service is reduced in location A. Due to differences in environment, the occupant’s thermal comfort and air quality are improved.

Figure 11

Figure 11. Group consolidation: At the blue and red lines, occupants 1 and 2 move to location C (not shown). Locations A and B reduce HVAC and lighting service, leading to energy savings. Comfort and air quality for both occupants change due to environmental differences.

Figure 12

Figure 12. Group Disbanding: At the red line, occupant 2 moves from location A to location B. Since only occupant 1 remains, HVAC service is reduced in location A at the blue line, leading to a comfort improvement for occupant 1.

Figure 13

Figure 13. Energy savings, comfort and air quality improvements, emphasizing energy savings (top), balanced improvements (middle), and comfort, and air quality (bottom) in two deployments (A and B).

Figure 14

Table 2. Execution time for each component of RECA as the number of people and spaces increase

Submit a response

Comments

No Comments have been published for this article.