Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-08T02:00:59.604Z Has data issue: false hasContentIssue false

Can time-series foundation models perform building energy management tasks?

Published online by Cambridge University Press:  02 March 2026

Ozan Baris Mulayim*
Affiliation:
Department of Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
Pengrui Quan*
Affiliation:
Department of Electrical and Computer Engineering, University of California, Los Angeles, CA, USA
Liying Han
Affiliation:
Department of Electrical and Computer Engineering, University of California, Los Angeles, CA, USA
Xiaomin Ouyang
Affiliation:
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong
Dezhi Hong
Affiliation:
Amazon.com Inc., Seattle, WA, USA
Mario Bergés
Affiliation:
Department of Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
Mani Srivastava
Affiliation:
Department of Electrical and Computer Engineering, University of California, Los Angeles, CA, USA
*
Corresponding authors: Ozan Baris Mulayim and Pengrui Quan; Emails: omulayim@andrew.cmu.edu; prquan@g.ucla.edu
Corresponding authors: Ozan Baris Mulayim and Pengrui Quan; Emails: omulayim@andrew.cmu.edu; prquan@g.ucla.edu

Abstract

Building energy management (BEM) tasks require processing and learning from a variety of time-series data. Existing solutions rely on bespoke task- and data-specific models to perform these tasks, limiting their broader applicability. Inspired by the transformative success of Large Language Models (LLMs), Time-Series Foundation Models (TSFMs), trained on diverse datasets, have the potential to change this. Were TSFMs to achieve a level of generalizability across tasks and contexts akin to LLMs, they could fundamentally address the scalability challenges pervasive in BEM. To understand where they stand today, we evaluate TSFMs across four dimensions: (1) generalizability in zero-shot univariate forecasting, (2) forecasting with covariates for thermal behavior modeling, (3) zero-shot representation learning for classification tasks, and (4) robustness to performance metrics and varying operational conditions. Our results reveal that TSFMs exhibit limited generalizability, performing only marginally better than statistical models on unseen datasets and modalities for univariate forecasting. Similarly, inclusion of covariates in TSFMs does not yield performance improvements, and their performance remains inferior to conventional models that utilize covariates. While TSFMs generate effective zero-shot representations for downstream classification tasks, they may remain inferior to statistical models in forecasting when statistical models perform test-time fitting. Moreover, TSFMs’ forecasting performance is sensitive to evaluation metrics, and they struggle in more complex building environments compared to statistical models. These findings underscore the need for targeted advancements in TSFM design, particularly their handling of covariates and incorporating context and temporal dynamics into prediction mechanisms, to develop more adaptable and scalable solutions for BEM.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Figure 1. Overview of different TSFM architectures. The task-specific heads enable adaptation to different downstream tasks. The encoder generates latent representations, while the decoder autoregressively predicts future tokens. The encoder and decoder architectures studied in this work are transformer-based but can be generalized to other model architectures.

Figure 1

Table 1. Comparison of TSFM attributes

Figure 2

Table 2. Data familiarity and model structures

Figure 3

Table 3. RMSE values for general analysis. The best is bold, and the second best is underscored

Figure 4

Figure 2. Distribution of RMSE values for the three datasets, averaged across varying duration-horizon pairs.

Figure 5

Table 4. Results (in °F) across different noise levels where $ D=448 $ and $ P=64 $. Bold and underscored values indicate the lowest and second-lowest error, respectively

Figure 6

Figure 3. Performance impact of adding covariates, measured as the change in overall RMSEall ($ \Delta $RMSEall = Univariate—Covariate).

Figure 7

Table 5. Results across different metrics and datasets. IBL, instance-based learning; E2E indicates that the model is trained directly using a classification loss in an end-to-end manner; SS-RL, self-supervised representation learning; PT-RL, pretrained representations; SVM/NN, an SVM or a Neural Network (NN) classification head is trained on top of a frozen representation; $ N $, number of the training samples. Bold and underscored values indicate the best and second-best performance across each metric, respectively

Figure 8

Figure 4. Visualization of new empirical data across conditions.

Figure 9

Figure 5. RMSE with varying prediction horizon ($ P $). The context durations ($ D $) are averaged.

Figure 10

Figure 6. RMSE with varying context duration ($ D $). The prediction horizons ($ P $) are averaged.

Figure 11

Figure 7. Temperature and electricity predictions during heat-occupied and unoccupied conditions, illustrated with detailed views of key periods.

Figure 12

Figure 8. Model rankings across various metrics for temperature (a) and electricity (b) data. The green triangle denotes the mean, while the black line is the median.

Figure 13

Figure 9. RMSE on temperature and electricity data of various building conditions.

Submit a response

Comments

No Comments have been published for this article.