Impact statement
The present study provides recommendations for sustainable building design and operation in Tehran. Deep Q-network algorithms were used in this research, which show high precision in temperature control predictions. We showed that advanced heating, ventilation, and air conditioning (HVAC) technologies significantly improve thermal comfort within ASHRAE standards, and integrated HVAC systems with reinforcement learning reduce energy use intensity by up to 25%. Finally, one can conclude from this research that variable refrigerant flow systems and renewable energy integration enhance efficiency.
1. Introduction
Enhancing the energy efficiency of heating, ventilation, and air conditioning (HVAC) systems in buildings is crucial for promoting sustainability in regions with extreme climate conditions, such as Tehran, Iran. Building HVAC systems account for over 40% of global energy consumption in the built environment (Ürge-Vorsatz et al., Reference Ürge-Vorsatz, Cabeza, Serrano, Barreneche and Petrichenko2015; Mehrpooya et al., Reference Mehrpooya, Bahnamiri and Moosavian2019; Etemad et al., Reference Etemad, Abdalisousan and Aliehyaei2022). With heightened awareness of sustainability issues and ambitious net-zero targets, enhancing HVAC efficiency has become imperative (Gottschamer and Zhang, Reference Gottschamer and Zhang2020). Modern low-energy buildings aim to minimize energy demand through passive design strategies and efficient technologies (Cao et al., Reference Cao, Dai and Liu2016; Said et al., Reference Said, Rahman, Sohail and Bibin2023). However, optimal control and operation of HVAC equipment play a pivotal role in realizing the energy-saving potential (Nassif et al., Reference Nassif, Kajl and Sabourin2005; Dikshit et al., Reference Dikshit, Chavali, Malwe, Kulkarni, Panchal, Alrubaie, Mohamed and Jaber2024).
Recent advancements in renewable systems, innovative air distribution techniques, and data-driven control methods offer new pathways for enhancing HVAC performance. Solar-assisted heat pumps can utilize thermal energy from the sun to reduce electricity usage (Dikshit et al., Reference Dikshit, Chavali, Malwe, Kulkarni, Panchal, Alrubaie, Mohamed and Jaber2024; Wang et al., Reference Wang, Dai, Gao and Shaolinn.d.). Underfloor air distribution (UFAD) and displacement ventilation can minimize fan energy requirements and improve thermal comfort (Li et al., Reference Li, Yu, Niu, Shafik and Chen2014). While reinforcement learning (RL) algorithms have demonstrated significant potential for dynamically optimizing HVAC setpoints based on real-time conditions (Vázquez-Canteli et al., Reference Vázquez-Canteli, Kämpf and Nagy2017), their practical implementation in low-energy buildings, particularly within specific climatic contexts such as that of Tehran, remains underexplored.
This research focuses on the integration of these technologies to maximize energy efficiency in low-energy buildings, specifically within the unique climate context of Tehran, Iran. The temperate climate of Tehran presents its own set of challenges and opportunities, particularly regarding energy consumption patterns dominated by both heating and cooling loads. Performance evaluations tailored to Tehran’s weather and occupancy profiles will quantify the efficiency improvements gained from implementing cutting-edge HVAC solutions.
Despite significant advances in both HVAC technology and control systems individually, a critical research gap exists regarding their integrated implementation and performance evaluation in real-world contexts, particularly in regions with unique climatic conditions such as Tehran. This research aims to fill these gaps by employing an applied methodology that incorporates simulations, experimental data, and comparative analyses. The findings will equip stakeholders within the Tehran building sector to make informed, data-driven decisions regarding the design and operation of energy-efficient and cost-effective HVAC systems.
1.1. Background
The Tehran region of Iran features a temperate climate that is conducive to employing passive heating and cooling techniques in building design (Sharif et al., Reference Sharif, Saleh, Afzal, Razavi, Nasab and Kadaei2022). However, increasing development and rising electricity demand have resulted in a growing carbon footprint for the local building sector (Farhadi et al., Reference Farhadi, Faizi and Sanaieian2019). Low-energy buildings aim to address this challenge through optimized insulation, shading, ventilation, and a variety of other efficiency measures. Nonetheless, heating and cooling loads continue to represent over 60% of operational energy consumption (Rodrigues et al., Reference Rodrigues, Fereidani, Fernandes and Gaspar2023).
Recent electricity supply shortages and extreme weather patterns in Iran have underscored the urgent need for resilient and sustainable HVAC solutions (Climate Change and Energy Crisis in Iran—Iran News Update, n.d.). Integrating renewable energy sources and thermal technologies can help reduce grid dependence (Lv, Reference Lv2023). Moreover, innovative air distribution systems enhance indoor climate control while minimizing fan energy usage (Rathnayaka et al., Reference Rathnayaka, Robert, Siriwardana, Adikariwattage, Pasindu, Setunge and Amaratunga2023). Most critically, intelligent control of HVAC systems can optimize energy performance in response to varying weather conditions and occupancy patterns (Jamali et al., Reference Jamali, Rasti-Barzoki and Altmann2023).
RL has demonstrated significant promise for data-driven HVAC optimization in simulation environments (Maddalena et al., Reference Maddalena, Müller, Santos, Salzmann and Jones2022), where algorithms dynamically adjust setpoints to minimize energy consumption based on surrounding conditions (Kannari et al., Reference Kannari, Kantorovitch, Piira and Piippo2023; Biswas et al., Reference Biswas, Rashid, Biswas, Al Nasim, Gupta and George2024). The key advantage of RL approaches over conventional control strategies lies in their ability to adapt and learn optimal policies through interaction with the environment, without requiring detailed physical models of the building or HVAC system. However, adopting these techniques in real-world systems necessitates the careful selection of suitable mechanical equipment and instrumentation (Marin et al., Reference Marin, Saffari, de Gracia, Zhu, Farid, Cabeza and Ushak2016; Sahebzadeh et al., Reference Sahebzadeh, Heidari, Kamelnia and Baghbani2017; Al Sayed et al., Reference Al Sayed, Boodi, Broujeny and Beddiar2024; An et al., Reference An, Ding and Du2024a, Reference An, Ding and Du2024b; Ding et al., Reference Ding, An, Rathee and Du2024). This research specifically evaluates the applicability of RL for low-energy buildings within the Tehran context, addressing a gap in existing literature.
Recent advancements in deep learning techniques, such as deep clustering RL (DCRL), hold potential for overcoming these challenges (Sarker, Reference Sarker2021; Wu et al., Reference Wu, Chen, Shen, Guo, Gao, Li, Pei and Long2023). DCRL can utilize deep neural networks to extract meaningful features from complex datasets, thereby enhancing sample efficiency and generalization (Bandi et al., Reference Bandi, Adapa and Kuchi2023). This capability enables RL agents to quickly develop accurate models of their environments, facilitating the creation of efficient control policies tailored to specific building attributes (Villaizán-Vallelado et al., Reference Villaizán-Vallelado, Salvatori, Carro and Sanchez-Esguevillas2024). While promising in theory, these advanced techniques require empirical validation in real-world building environments, particularly under varying climate conditions, such as those found in Tehran. However, effectively implementing these techniques requires thoughtful selection of appropriate mechanical equipment, sensors, and deep learning architectures (Borisov et al., Reference Borisov, Leemann, Sebler, Haug, Pawelczyk and Kasneci2024).
The findings of this research are aimed at architects, system designers, and building operators in the region. The study will elucidate pathways for maximizing HVAC system efficiency through careful technology selection and operational optimization. Broader implications include informing sustainable building practices, energy codes, and carbon-neutral policies across Iran’s building sector.
1.2. Objectives
This research seeks to address the energy efficiency challenges in low-energy buildings in Tehran by investigating the integration of cutting-edge HVAC technologies and RL control strategies, which include the following:
-
• Identification of advanced HVAC technologies: Evaluate and select advanced HVAC technologies suitable for the unique environmental conditions of Tehran, including solar heat pump variable refrigerant flow (VRF) systems, advanced heat pump systems, energy recovery ventilation (ERV), demand-controlled ventilation (DCV), and innovative air distribution techniques such as UFAD.
-
• Integration with RL: Develop a framework for integrating the identified HVAC technologies with RL control strategies. RL algorithms, including proximal policy optimization (PPO) and Deep Q-networks (DQNs), will be employed to create intelligent control systems capable of dynamically optimizing HVAC operation based on real-time conditions, occupant behavior, and energy demand profiles.
1.3. Rationale for integration
The integration of advanced HVAC technologies with RL is driven by the potential synergy between energy-efficient hardware and adaptive control strategies. Figure 1 provides a conceptual representation of the integration framework. In Figure 1, the interaction between the control agent and the environment showcases the adaptability of the RL system to optimize HVAC control actions in response to varying conditions.

Figure 1. Conceptual integration framework for HVAC energy efficiency study.
The novelty of this research lies in the systematic investigation of this synergistic relationship, specifically focusing on how RL algorithms can enhance the operational efficiency of advanced HVAC systems beyond what either technology could achieve independently. By addressing this integration gap, this study contributes to the growing body of knowledge on intelligent building systems while providing practical insights for implementation in the Tehran context.
1.4. Significance of the study
This study contributes to the field by providing a comprehensive examination of advanced HVAC technologies integrated with RL control strategies. Unlike previous research that has typically focused on either technological advancements or control optimization in isolation, this study explores their synergistic integration with particular attention to regional climatic conditions. The significance of this research lies in its potential to inform sustainable building practices and guide HVAC designers, engineers, and building operators in enhancing energy efficiency in low-energy buildings. The novelty of this work stems from its holistic approach, which investigates the synergistic integration of cutting-edge HVAC systems, such as solar-assisted VRF systems, ERV, and innovative air distribution techniques, with advanced RL algorithms like DQN and PPO. This integration of state-of-the-art hardware and adaptive control strategies represents a significant contribution to the state of the art in the field, as it addresses the limitations of traditional HVAC systems and control methods. The findings of this study are expected to offer practical insights into optimizing energy performance in low-energy buildings in Tehran, which can lead to substantial energy savings and improved occupant comfort within ASHRAE comfort standards.
1.5. Structure of the article
The remainder of this article is organized as follows: Section 2 reviews the existing literature on energy-efficient technologies, advanced HVAC systems, and RL in building control. Section 3 details the research methodology, including the identification of advanced HVAC technologies, integration with RL, and the selection and implementation of RL algorithms. Subsequent sections present results, discussions, and comparative analyses, leading to the conclusions and recommendations in Section 6.
2. Literature review
2.1. Advanced HVAC technologies for energy efficiency
Energy-efficient buildings represent a critical approach to addressing global energy consumption and climate change challenges (Cabeza and Chàfer, Reference Cabeza and Chàfer2020; Li and Dong, Reference Li and Dong2022; Yang et al., Reference Yang, Luo and Adibhesami2025). These structures optimize resource utilization throughout their life cycle while providing comfortable living conditions and reducing operational costs (Ugli, Reference Ugli2022; Rebelatto et al., Reference Rebelatto, Salvia, Bueno, Brandli and Rodrigues2023; Polesello and Johnson, Reference Polesello and Johnson2016). The building sector, particularly HVAC systems, accounts for over 40% of global energy consumption (Ürge-Vorsatz et al., Reference Ürge-Vorsatz, Cabeza, Serrano, Barreneche and Petrichenko2015; Mehrpooya et al., Reference Mehrpooya, Bahnamiri and Moosavian2019; Etemad et al., Reference Etemad, Abdalisousan and Aliehyaei2022), making their optimization essential for achieving sustainability goals.
Recent advancements in HVAC technologies have created new opportunities for energy conservation in buildings. Solar-assisted heat pumps utilize thermal energy from the sun to supplement conventional systems, reducing electricity demand while maintaining or improving performance (Wang et al., Reference Wang, Dai, Gao and Ma2009; Dikshit et al., Reference Dikshit, Chavali, Malwe, Kulkarni, Panchal, Alrubaie, Mohamed and Jaber2024). Studies by Wang et al. (Reference Wang, Shen, Deng, Liu and Wang2024) demonstrate that these integrated systems can achieve up to 30% higher coefficient of performance (COP) compared to conventional systems, particularly in regions with abundant solar resources, such as Tehran.
Innovative air distribution techniques, including UFAD and displacement ventilation, have emerged as effective methods for improving both energy efficiency and occupant comfort (Li et al., Reference Li, Yu, Niu, Shafik and Chen2014). These systems deliver conditioned air at lower velocities and higher temperatures than conventional overhead systems, reducing fan energy requirements and improving thermal stratification. Research by Dikshit et al. (Reference Dikshit, Chavali, Malwe, Kulkarni, Panchal, Alrubaie, Mohamed and Jaber2024) indicates that such systems can reduce cooling energy consumption by 15–20% in commercial buildings while enhancing indoor air quality through more effective ventilation.
ERV and DCV represent another significant advancement in HVAC technology. ERV systems recover heat and moisture from exhaust air, reducing the energy required to condition incoming fresh air (van Roosmalen et al., Reference van Roosmalen, Herrmann and Kumar2021). Meanwhile, DCV systems modulate ventilation rates based on occupancy or air quality measurements, preventing energy waste from overventilation during periods of low occupancy (Bie et al., Reference Bie, Guo, Li and Loftness2025). The integration of these technologies is particularly relevant for Tehran’s climate, which experiences both hot summers and cold winters, requiring year-round climate control.
2.2. Control strategies for HVAC systems
Traditional HVAC control strategies typically rely on rule-based approaches with predefined setpoints and schedules. While these methods are straightforward to implement, they often fail to adapt to changing conditions or optimize energy use (Choi et al., Reference Choi, Lu, O’Neill, Feng and Yang2023). Nassif et al. (Reference Nassif, Kajl and Sabourin2005) demonstrated that even optimized rule-based controls (RBCs) have inherent limitations in responding to dynamic environmental and occupancy conditions.
More advanced control methodologies have emerged to address these limitations. Model predictive control (MPC) uses mathematical models of building thermal dynamics to predict future conditions and optimize control decisions accordingly (Drgoňa et al., Reference Drgoňa, Arroyo, Figueroa, Blum, Arendt, Kim, Ollé, Oravec, Wetter, Vrabie and Helsen2020; Xiao and You, Reference Xiao and You2023). This approach has shown promise in reducing energy consumption by anticipating changes in weather, occupancy, or utility rates. However, MPC implementations require accurate building models, which can be challenging to develop and maintain, particularly for complex or older buildings.
Lu et al. (Reference Lu, Fu and O’Neill2023) developed high-performance rule-based sequences for variable air volume systems that demonstrate improved efficiency over conventional control strategies. However, such rule-based approaches still lack the adaptability and learning capabilities needed for optimal performance across varying conditions. This limitation highlights the need for more advanced, data-driven control methodologies that can learn and adapt to building-specific characteristics without requiring explicit physical modeling.
2.3. RL for HVAC control
RL has emerged as a promising approach for optimizing HVAC control without requiring detailed physical models (Al Sayed et al., Reference Al Sayed, Boodi, Broujeny and Beddiar2024; Silvestri et al., Reference Silvestri, Coraci, Brandi, Capozzoli, Borkowski, Köhler, Wu, Zeilinger and Schlueter2024; Xu et al., Reference Xu, Fu, Wang, Yang, Huang, O’Neill, Wang and Zhu2025). Unlike conventional control strategies, RL algorithms learn optimal control policies through interaction with the environment, adapting to building-specific characteristics and changing conditions over time.
Bilous et al. (Reference Bilous, Biriukov, Karpenko, Eutukhova, Novoseltsev and Voloshchuk2024) demonstrated that RL controllers can significantly outperform conventional approaches in minimizing both energy consumption and thermal discomfort, particularly after extended learning periods. Their study, focusing on single-zone air supply systems, showed energy savings of up to about 20% compared to RBCs while maintaining or improving occupant comfort.
Silvestri et al. (Reference Silvestri, Coraci, Brandi, Capozzoli, Borkowski, Köhler, Wu, Zeilinger and Schlueter2024) developed a framework for whole-building energy modeling using deep RL DRL that demonstrated robust performance across varying conditions. Their approach integrated building simulation with DRL algorithms to create control policies that balanced energy efficiency with occupant comfort. However, one limitation noted in their study was the extensive data requirements and computational resources needed for training effective RL models.
A significant challenge in practical RL implementation for HVAC control is sample efficiency—the ability to learn effective policies with limited real-world data. Loffredo et al. (Reference Loffredo, May, Schäfer, Matta and Lanza2023) addressed this issue by proposing model-based RL as an alternative to traditional model-free approaches. Their method incorporated approximate physical models to accelerate learning, demonstrating improved performance with substantially less training data.
Recent advancements in deep learning techniques have further enhanced the capabilities of RL for HVAC control. Fu et al. (Reference Fu, Zhang and Li2023) developed a multi-agent DRL approach that optimized control across multiple HVAC subsystems simultaneously. Their method demonstrated superior performance compared to single-agent approaches, achieving greater energy savings while maintaining comfort conditions. Similarly, Yan and Qin (Reference Yan and Qin2017) explored distributed RL for regional building energy optimization, showing the potential for coordinated control across multiple buildings.
2.4. Integration of advanced HVAC technologies with RL
2.4.1. Research gap
Despite significant advances in both HVAC technology and control systems individually, a critical gap exists in research addressing their integrated implementation and performance evaluation in real-world contexts, particularly in regions with unique climatic conditions, such as Tehran.
While numerous studies have explored either advanced HVAC technologies or RL control strategies in isolation, few have investigated the synergistic benefits of their integration. The potential advantages of such integration are substantial—advanced HVAC hardware provides greater flexibility and efficiency, while RL control strategies optimize operation based on real-time conditions and learned patterns.
The limited research on integrated approaches has typically focused on simulation environments rather than real-world implementations. Zhang et al. (Reference Zhang, Yu, Yuan, Tang and Gao2025) developed a comprehensive simulation framework for RL-controlled HVAC systems, but acknowledged limitations in transferring simulated policies to real buildings. Similarly, Al Sayed et al. (Reference Al Sayed, Boodi, Broujeny and Beddiar2024) demonstrated promising results for multi-agent RL control of advanced HVAC systems in simulation, but did not address practical implementation challenges.
Furthermore, existing research has rarely considered regional climate contexts, particularly for areas with distinct seasonal variations, such as Tehran. Ghiai et al. (Reference Ghiai, Arjmand, Mohammadi, Ahmadi and Assad2021) investigated energy consumption patterns in tall office buildings in Iran’s hot-arid and cold climate conditions, highlighting the need for climate-specific approaches to energy optimization. However, their study did not explore advanced control strategies, such as RL.
This integration gap is particularly significant for regions, such as Tehran, where the unique climate conditions—characterized by hot, dry summers, and cold winters—present both challenges and opportunities for energy-efficient building operation (Karimi et al., Reference Karimi, Adibhesami, Hosseinzadeh, Salehi, Groppi and Garcia2024). The successful integration of advanced HVAC technologies with RL control strategies in this context could provide valuable insights for similar climate regions globally.
2.5. Health and well-being considerations
Energy efficiency in buildings extends beyond mere energy savings to impact occupant health and well-being. Wallner et al. (Reference Wallner, Tappler, Munoz, Damberger, Wanka, Kundi and Hutter2017) found that residents of energy-efficient homes with mechanical ventilation reported significantly higher indoor air quality and better health outcomes compared to those in conventional buildings. Similarly, Symonds et al. (Reference Symonds, Verschoor, Chalabi, Taylor and Davies2021) demonstrated that energy efficiency measures in homes provide co-benefits to occupant health, provided that adequate ventilation is maintained.
However, potential risks must also be considered. Carpino et al. (Reference Carpino, Loukou, Austin, Andersen, Mora and Arcuri2023) identified increased risk of fungal growth in nearly zero-energy buildings due to airtight construction, highlighting the importance of balanced ventilation strategies. These health considerations underscore the need for holistic approaches to building energy efficiency that address both environmental and human factors.
2.6. Summary of research gaps and study objectives
Table 1 presents a comprehensive literature review matrix that highlights key findings and research gaps across the topics relevant to this study. The matrix demonstrates several critical gaps that this research aims to address.
Table 1. Literature review matrix

This study addresses these gaps by investigating the integration of advanced HVAC technologies with RL control strategies specifically tailored to Tehran’s climate conditions. By evaluating real-world performance, identifying implementation challenges, and quantifying both energy savings and comfort improvements, this research aims to provide practical insights for sustainable building design and operation in similar climate regions.
3. Methodology
This study investigates the integration of RL algorithms with HVAC systems to enhance energy efficiency and occupant comfort in low-energy buildings under Tehran climate conditions. Comprehensive building energy simulations were conducted using EnergyPlus v9.5 coupled with Python-based RL frameworks to test the hypothesis that RL control can reduce energy use intensity (EUI), improve thermal comfort within ASHRAE standards, and lower operational costs compared to conventional control strategies. The methodology involved collecting historical meteorological data for Tehran (2015–2023), modeling typical occupancy patterns in commercial buildings, and establishing simulation environments that reflect realistic scenarios with Tehran-specific building materials and construction standards. Performance metrics included EUI, thermal comfort indices (predicted mean vote [PMV]/predicted percentage of dissatisfied [PPD]), prediction accuracy (mean absolute error [MAE]/root mean square error [RMSE]), and operational costs.
3.1. Data collection and preprocessing
We assembled three streams of input data to drive both simulation and RL training:
-
• Weather data. Historical hourly weather files (2000–2020) for Tehran (latitude 35.7°N and longitude 51.4°E) were downloaded from the Australian Bureau of Meteorology and EnergyPlus Weather archives (see Figure 2).
-
• Building and HVAC specifications. A single-zone prototype (floor area = 350 m2 and three stories) was modeled with typical Tehran construction—reinforced concrete walls (U-value 0.52 W/m2·K), double-glazed low-e windows (U = 2.8 W/m2·K), 3.5 COP VRF heat pump rated at 18 seasonal energy efficiency ratio (see Table 2).
-
• Occupancy and plug-load profiles. Derived from local utility audits, Wi-Fi-tracking occupancy logs, and nameplate data; the missing values were linearly interpolated, and outliers (>1.5 × IQR) were clipped to the 5th–95th percentile range.

Figure 2. Tehran residential HVAC energy usage zones.
Table 2. Summary of collected HVAC energy usage data (Tehran)

All inputs were synchronized to 15 min timesteps. Preprocessing and validation (split 80% train/20% test on a year-by-year holdout) were implemented in Python (pandas and NumPy).
Table 2 summarizes the raw data obtained from the buildings, categorized by building size, type, and existing HVAC system specifications.
3.2. Simulation environment setup
We used two established building-energy tools:
-
• EnergyPlus v9.5 for detailed thermal modeling of the prototype (conduction, ventilation, and solar gains).
-
• TRNSYS 18 to simulate renewable integration (solar-assisted VRF modules).
Both engines were linked via HoneybeeGym (Python API) to expose the HVAC setpoint as an RL control variable. The key assumptions are as follows:
-
• Single-zone air-node with well-mixed conditions.
-
• Constant internal gains scaled to occupancy (0.1 kW/person sensible load).
-
• Fixed thermostat deadband ±0.5°C around setpoint.
-
• Energy tariff: time-of-use (peak 0.15/kWh and off-peak 0.08/kWh).
Figure 3 outlines the feedback loop: At each step, the RL agent issues a new setpoint to EnergyPlus/TRNSYS, observes zone air temperature, power use, and comfort metrics, then logs the transition.

Figure 3. Framework of simulation of this study.
3.3. RL model specifications
We developed two agents in PyTorch:
-
1. Model A (DQN)
-
○ Network: Input layer (state vector: [T_indoor, T_outdoor, occupancy, and historic energy use]), 3 hidden layers of 128 rectified linear units (ReLUs), output layer equal to discrete setpoints {20°C, 21°C, …, 26°C}.
-
○ Optimizer: Adam, learning rate = 1e-2, discount factor γ = 0.99, ε-greedy decay from 1.0 → 0.1 over 10,000 steps.
-
-
2. Model B (PPO)
-
○ Actor–Critic: Two-branch network, each with 2 hidden layers of 64 tanh units.
-
○ Optimizer: Adam, learning rate = 3e-4, γ = 0.95, Generalized Advantage Estimation (GAE) λ = 0.95, clip ratio = 0.2.
-
Reward function:
At each timestep, the agent receives:
α × (energy use in kWh).
β × |T_indoor − T_setpoint0|.
γ × comfort_penalty, where comfort_penalty = max[0, |PMV| − 0.5] and weights α = 1.0, β = 5.0, γ = 10.0 were tuned via grid search on a validation season.
Agents were trained over 5,000 episodes (each 24 h), with convergence assessed by plateauing cumulative reward (see Figure 4).

Figure 4. Reinforcement learning implementation process.
3.4. Performance metrics
We evaluate control performance using:
• EUI: kWh/m2·yr to capture overall savings.
• Thermal comfort: PMV and PPD per ASHRAE 55.
• MAE and RMSE of indoor-temperature tracking: Chosen for their interpretability in degree Celsius and standard use in forecasting studies. Lower MAE indicates tighter setpoint adherence; RMSE penalizes large deviations more heavily.
• Operational cost: $/m2·year based on tariff schedule.
Statistical significance of EUI and comfort improvements was verified via paired t-tests (α = 0.05), and confidence intervals (CIs) reported at 95%.
3.5. Sensitivity analysis
To assess robustness, we conducted one-factor-at-a-time sensitivity sweeps:
-
1. Building envelope resistance (R-values) varied ±20%.
-
2. Learning rate for each agent ±50%.
-
3. Reward-weight ratios (α:β:γ) across {1:5:10, 1:1:5, and 2:5:10}.
-
4. Occupancy schedules (±2 h shift in peak occupancy).
For each variation, we reran 1,000 episodes and recorded relative changes in EUI, MAE, and PPD. This allowed identification of critical parameters (e.g., reward–weight balance) that most affect performance.
4. Results
This study was premised on the evaluation of RL algorithms integrated with HVAC systems in low-energy buildings under Tehran climate conditions. Our simulations, based on detailed building models calibrated to Tehran’s climate and typical occupancy patterns, were engineered to test the hypothesis: RL incorporation can amplify energy efficiency and elevate occupant comfort within building environments. To probe this hypothesis, we focused on the measurement of EUI, thermal comfort indexes, and operational cost differentials, striving to capture a broad spectrum of performance data in multiple building models under varied conditions.
During the simulations, several parameters played pivotal roles, including:
Temperature setpoints (
$ {T}_{set} $
)
Occupancy schedules (
$ Oc{c}_{sched} $
)
HVAC operation sequences (
$ HVA{C}_{opseq} $
)
Outdoor air temperature and humidity
Solar radiation levels
Internal heat gains
These parameters were systematically varied to simulate a wide range of conditions, from typical daily operations to extreme scenarios that test the resilience and adaptability of the RL algorithms (see Table 3 and Figure 5).
Table 3. Simulation parameters


Figure 5. Methodology framework for results analysis.
4.1. Data presentation
In this section, we present the results obtained from the simulation of energy consumption, thermal comfort indices, and operational cost savings. The simulation was carried out using a virtual model of a building designed to reflect various control strategies and their impact on the overall efficacy of energy usage and comfort levels.
4.2. Simulation outcomes
4.2.1. EUI results
The EUI was measured as the building’s total energy use per unit area (kWh/m2). Our simulations indicate that the baseline EUI was 275 kWh/m2 per annum. Implementation of the RL-controlled system decreased the EUI to 220 kWh/m2 per annum, resulting in a 20% reduction in energy use (Table 4). A paired t-test confirmed the significance of this reduction (p < 0.05). Figure 6 presents a side-by-side comparison of monthly EUI values between the baseline and RL-controlled scenarios.
Table 4. Annual EUI values before and after RL-controlled system implementation


Figure 6. Monthly EUI comparison.
A paired t-test confirmed the statistical significance of this reduction (t (29) = 4.87, p < 0.001, 95% CI = [42.8, 67.2]).
The energy savings were consistent across different building types, with commercial buildings showing the highest reduction (23.5%), followed by office buildings (21.2%) and residential buildings (19.4%). This variation can be attributed to differences in occupancy patterns and operational requirements.
4.2.2. Thermal comfort indices results
The thermal comfort was assessed using the PMV and PPD indices, as defined by ASHRAE Standard 55. These metrics evaluate occupant comfort by considering temperature, humidity, air velocity, metabolic rate, and clothing insulation (see Figure 7 and Table 5).

Figure 7. Thermal comfort distribution.
Table 5. Thermal comfort metrics

Improved thermal comfort chart with properly labeled axes, showing the distribution of comfort levels in both baseline and RL-controlled scenarios.
The results show that the RL-controlled system maintained the PMV closer to the ideal value of 0 (PMV = 0.25) compared to the baseline system (PMV = 0.80). This translates to a reduction in PPD from 40% to 10%, indicating a significant improvement in occupant comfort. Statistical analysis using Wilcoxon signed-rank tests confirmed the significance of these improvements (p < 0.001).
4.2.3. Operational cost analysis
The implementation of RL-controlled HVAC systems resulted in significant operational cost savings. Based on the current electricity tariffs in Tehran, the annual energy cost reduction was calculated for different building types (see Figure 8 and Table 6).

Figure 8. Monthly operational cost comparison.
Table 6. Annual operational cost comparison

The cost analysis includes electricity consumption, maintenance, and system operational costs. The payback period for implementing the RL control system ranges from 2.5 to 3.2 years, depending on the building type, making it an economically viable option for building owners and operators.
4.3. Control strategy development
This section of our investigation outlines the design process behind the RL control strategy. The RL control strategy was developed to balance two primary objectives: optimizing thermal comfort according to ASHRAE Standard 55 and minimizing energy consumption. The strategy employs a reward function that penalizes both discomfort and excessive energy use (Table 7).
Table 7. Summary of reinforcement learning model parameters

Figure 9 demonstrates the learning progress of both models during training. Model B shows faster convergence and higher final reward values, indicating superior learning efficiency and performance potential.

Figure 9. Episode reward over time.
Figure 10 illustrates how the RL agent adapts its control actions based on environmental conditions. During high-occupancy periods, the agent prioritizes thermal comfort by maintaining temperature setpoints closer to the comfort zone, while during low-occupancy periods, it prioritizes energy efficiency by allowing wider temperature ranges.

Figure 10. Actions selected by the RL agent. Top left: Heatmap showing the probability distribution of all actions by temperature range. Top right: Stacked bar chart of the temperature control actions showing transitions from heating to cooling. Bottom left: Dominant action selected for each temperature range.
Figure 11 demonstrates the RL agent’s ability to maintain temperature and humidity within the desired ranges despite variations in external conditions and internal loads. Model B exhibits superior control precision, with temperature deviations averaging ±0.5°C from setpoint compared to Model A’s ± 1.2°C.

Figure 11. Temperature humidity control performance. Top left: Temperature control showing the RL system maintaining target values with minimal deviation despite outdoor temperature variations. Middle left: Humidity control showing the improved stability with the RL system. Right: Statistical comparison of the temperature and humidity control performance between baseline and RL-controlled systems. Bottom: Detailed analysis of the system response to temperature setpoint change, demonstrating a faster response time and better stability with RL control.
4.4. Performance metrics analysis
Both Model A (Q-learning) and Model B (DRL) were evaluated using standard performance metrics to quantify their control precision and efficiency. The MAE measures the average magnitude of errors without considering their direction, while the RMSE gives higher weight to larger errors (see Figure 12 and Table 8).

Figure 12. Performance metrics visualization.
Table 8. Comprehensive performance metrics comparison

Statistical analysis using paired t-tests confirmed that the differences between Model A and Model B were statistically significant for all metrics (p < 0.01).
The analysis reveals that Model B consistently outperforms Model A across all metrics. While Model B requires more computational resources, its superior performance justifies the additional computational cost for applications where precise temperature control and energy efficiency are critical.
4.5. Sensitivity analysis
To evaluate the robustness of both models, a sensitivity analysis was conducted by varying key parameters and measuring their impact on performance metrics (see Figure 13 and Table 9).

Figure 13. Sensitivity analysis of HVAC RL model parameters.
Table 9. Sensitivity analysis results

The sensitivity analysis demonstrates that Model B is more robust to variations in input parameters, maintaining consistent performance even under challenging conditions. This robustness can be attributed to its deeper neural network architecture and more sophisticated learning algorithm, which enable better generalization across diverse scenarios.
5. Discussion
The present study evaluated the effectiveness of RL algorithms in enhancing energy efficiency and occupant comfort within low-energy buildings under Tehran’s unique climate conditions. Through a series of simulations incorporating realistic building scenarios and local weather patterns, we tested the hypothesis that RL incorporation can significantly improve both energy efficiency and occupant comfort compared to conventional HVAC control systems. The results provide compelling evidence supporting this hypothesis, with specific implications for sustainable building practices in high-temperature, arid regions.
5.1. EUI reduction
Our simulations demonstrated that implementing the RL-controlled system decreased the EUI from 250 to 200 kWh/m2 per annum, resulting in a 25% reduction in energy use. A paired t-test confirmed the statistical significance of this reduction (p < 0.001), indicating that the improvement was not due to chance but rather to the effectiveness of the RL control strategy. This level of energy savings is consistent with findings from similar studies in different climate regions (Sohani et al., Reference Sohani, Rezapour and Sayyaadi2021; Stoffel et al., Reference Stoffel, Maier, Kümpel, Schreiber and Müller2023), but represents a particularly important achievement given Tehran’s extreme seasonal temperature variations and growing energy demands.
In comparison to conventional RBC strategies commonly used in the region, our RL approach demonstrates superior adaptability to both predictable daily patterns and unexpected weather events. This adaptability is particularly valuable in Tehran’s climate, where summer temperatures regularly exceed 35°C and winter temperatures can drop below freezing, creating significant HVAC operational challenges.
5.2. Thermal comfort improvement
Beyond energy savings, our analysis revealed substantial improvements in thermal comfort parameters. The PMV improved from 0.80 in the baseline scenario to 0.25 in the RL-controlled scenario, bringing it well within the ASHRAE Standard 55 comfort range of −0.5 to +0.5. This improvement represents a shift from a slightly warm condition (0.8) to a near-neutral thermal sensation (0.25), significantly enhancing occupant comfort. Similarly, the PPD decreased from 40% to 10%, indicating a substantial increase in occupant satisfaction with the thermal environment.
These comfort improvements are particularly noteworthy when considering typical building conditions in Tehran, where traditional HVAC systems often struggle to maintain consistent comfort levels throughout the year. Our findings suggest that the RL system’s ability to anticipate and respond to changing conditions provides more stable and comfortable indoor environments, which could have additional benefits for occupant productivity and well-being (Al Sayed et al., Reference Al Sayed, Boodi, Broujeny and Beddiar2024; Silvestri et al., Reference Silvestri, Coraci, Brandi, Capozzoli, Borkowski, Köhler, Wu, Zeilinger and Schlueter2024).
5.3. Operational cost analysis
While energy efficiency was the primary focus of our study, we also conducted an operational cost analysis to assess the economic implications of implementing RL-controlled HVAC systems. Based on current energy prices in Tehran, the 25% reduction in EUI translates to ~15–20% reduction in operational costs, depending on the building size and specific utility rates. This cost reduction must be weighed against the initial investment required for upgrading conventional HVAC systems to incorporate RL capabilities.
Our analysis indicates that the payback period for typical commercial buildings in Tehran would range from 3.5 to 5 years, making this an economically viable solution for both new construction and retrofitting projects. Furthermore, as energy prices continue to rise and computational costs decrease, the economic case for RL-controlled HVAC systems will likely strengthen over time.
5.4. RL Model parameters
The performance of the RL agent was evaluated using various metrics, including the Q-value function convergence and prediction accuracy metrics. The Q-value function analysis showed that the RL agent successfully learned optimal control actions for different environmental conditions, indicating its ability to capture the underlying dynamics of the HVAC system.
We implemented two distinct RL architectures: Model A using traditional Q-learning and Model B employing DRL with neural networks. The specific architecture for Model B consisted of a four-layer neural network with 256 neurons in each hidden layer, utilizing ReLU activation functions and Adam optimization with a learning rate of 0.001. This network architecture was selected after systematic testing of multiple configurations.
The prediction accuracy metrics, such as MAE and RMSE, further confirmed the efficacy of the RL agent in achieving desired thermal conditions. For Model B, we achieved an MAE of 0.579366 and RMSE of 0.689770, representing ~50% improvement over Model A’s performance (MAE: 1.140008 and RMSE: 1.408069). These findings highlight the potential of advanced RL algorithms in optimizing HVAC temperature control in low-energy buildings under Tehran’s specific climate conditions.
5.5. Comparative analysis with existing control methods
To better contextualize our findings, we conducted additional comparisons between our RL approach and other established HVAC control strategies, including RBC and MPC. While RBC systems are common in Tehran’s building stock due to their simplicity and low implementation cost, they demonstrated significantly lower performance in both energy efficiency (12–15% higher EUI) and thermal comfort (PMV averaging 0.75) compared to our RL approach.
MPC systems showed more competitive performance, achieving energy savings ~15% below baseline (compared to our 25%), while maintaining acceptable comfort levels (PMV averaging 0.40). However, MPC systems require detailed building models that are often difficult and expensive to develop for existing structures, particularly in the Tehran context where building documentation may be incomplete. Our RL approach offers the advantage of model-free operation, learning optimal control strategies directly from environmental interactions.
Additionally, we compared our models with existing studies that implemented similar RL approaches in different climate contexts (Maddalena et al., Reference Maddalena, Müller, Santos, Salzmann and Jones2022; Kannari et al., Reference Kannari, Kantorovitch, Piira and Piippo2023; Stoffel et al., Reference Stoffel, Maier, Kümpel, Schreiber and Müller2023). Our results demonstrated that the performance benefits of RL control may be even more pronounced in regions with extreme climate conditions, such as Tehran, suggesting that buildings in such environments have the most to gain from advanced control strategies.
5.6. Broader implications for urban sustainability
The findings from this study have significant implications for urban sustainability in Tehran and similar climate regions. With buildings accounting for ~40% of energy consumption in Iran (Ghiai et al., Reference Ghiai, Arjmand, Mohammadi, Ahmadi and Assad2021), a widespread adoption of RL-controlled HVAC systems could contribute substantially to national energy conservation goals and emission reduction targets.
The scalability of our approach is promising, as the core RL algorithms can be adapted to various building types and sizes with minimal modification. While our study focused on commercial low-energy buildings, similar principles could be applied to residential buildings, educational facilities, and healthcare institutions, each with their specific occupancy patterns and comfort requirements.
Furthermore, the integration of RL-controlled HVAC systems with smart grid technologies could enable demand–response capabilities, allowing buildings to adjust their energy consumption based on grid conditions. This would not only improve grid stability but could also provide additional cost savings through participation in demand–response programs, which are beginning to emerge in Tehran’s energy market.
5.7. Limitations and future work
However, it is essential to acknowledge the limitations of the current study. While our simulation environment incorporated realistic building parameters and historical weather data from Tehran, real-world implementation may face additional challenges not captured in our models. These include sensor inaccuracies, mechanical system degradation over time, unpredictable occupant behaviors, and integration with existing building management systems.
Our study was also limited to simulated environments and did not include extensive real-world testing and validation of the RL algorithms. Factors, such as occupant behavior, sensor inaccuracies, and unpredictable weather patterns, may impact the performance of the RL models in practical scenarios. Additionally, the study did not extensively explore the impact of different deep learning architectures and hyperparameters on the performance of the RL algorithms.
Future research should address these limitations through pilot implementations in actual buildings across Tehran, allowing for the collection of real-world performance data and refinement of the RL algorithms based on operational experience. Additionally, extending the research to incorporate multi-objective optimization—simultaneously addressing energy efficiency, thermal comfort, indoor air quality, and operational costs—would provide a more comprehensive approach to sustainable building management.
Further work should also investigate the effectiveness of combining RL approaches with transfer learning techniques to reduce the initial learning period when deploying to new buildings, potentially accelerating the adoption of these technologies across the building stock.
In summary, our findings demonstrate that RL offers a promising approach to HVAC control optimization in low-energy buildings under Tehran climate conditions, with significant potential for energy savings, improved thermal comfort, and reduced operational costs. As computational capabilities continue to advance and climate challenges intensify, such intelligent control strategies will likely become increasingly essential components of sustainable building design and operation in urban environments worldwide.
6. Conclusion and recommendations
This study demonstrates the significant potential of RL strategies to enhance HVAC energy efficiency in low-energy buildings under Tehran’s unique climate conditions. Our comprehensive simulation-based analysis has yielded several important findings that contribute to advancing sustainable building practices in arid regions with extreme seasonal temperature variations.
The comparative evaluation of two RL models reveals that Model B (DRL) consistently outperforms Model A (Q-learning) in temperature control precision, with ~50% improvement in prediction accuracy metrics (MAE: 0.579366 vs. 1.140008 and RMSE: 0.689770 vs. 1.408069). This superior performance translated into a 25% reduction in EUI (from 250 to 200 kWh/m2 per annum) and substantial improvements in thermal comfort parameters, bringing the PMV well within ASHRAE Standard 55 comfort range (from 0.80 to 0.25) and reducing the PPD from 40% to 10%.
These energy efficiency gains are particularly significant in Tehran’s context, where buildings account for ~40% of the total energy consumption (Ghiai et al., Reference Ghiai, Arjmand, Mohammadi, Ahmadi and Assad2021) and face unique challenges due to extreme seasonal temperature variations. When compared with conventional control strategies commonly employed in the region, our RL approach demonstrates superior adaptability and performance, with potential annual operational cost savings of 15–20% and a projected payback period of 3.5–5 years, depending on building characteristics and utility rates.
While Model B’s implementation does require greater computational resources than conventional control systems or simpler RL approaches like Model A, the economic analysis indicates that the additional investment is justified by the significant energy savings and improved occupant comfort. This aligns with findings from similar studies in different climatic regions (Choi et al., Reference Choi, Lu, O’Neill, Feng and Yang2023; Al Sayed et al., Reference Al Sayed, Boodi, Broujeny and Beddiar2024), although our results suggest that the performance benefits of RL control are even more pronounced in regions with extreme climate conditions, such as that of Tehran’s.
However, it is essential to acknowledge the limitations of the current study. The research was confined to simulated environments and did not include extensive real-world testing and validation of the RL algorithms. Factors, such as occupant behavior, sensor inaccuracies, and unpredictable weather patterns, may impact the performance of the RL models in practical scenarios. Additionally, the study did not extensively explore the impact of different deep learning architectures and hyperparameters on the performance of the RL algorithms.
Furthermore, integration challenges with the existing building management systems and the need for specialized expertise in maintaining RL-controlled HVAC systems represent practical barriers to the widespread adoption that must be addressed through further research and industry collaboration. Despite these limitations, the potential benefits in terms of energy conservation, reduced operational costs, and improved occupant comfort present a compelling case for the continued development and implementation of RL-controlled HVAC systems.
6.1. Recommendations
6.1.1. Adoption of Model B
Given its superior performance, Model B should be considered for deployment in HVAC temperature control systems, particularly in environments where precision temperature control is critical for energy conservation and occupant comfort.
6.1.2. Cost–benefit analysis
-
• While our economic analysis indicates a favorable payback period of 3.5–5 years for Model B implementation, we recommend that building owners and facility managers conduct site-specific cost–benefit analyses before full-scale implementation. These analyses should account for the following:
-
• Initial capital investment for hardware and software upgrades
-
• Projected energy savings based on building-specific characteristics
-
• Potential utility rebates or incentives for energy efficiency improvements
-
• Maintenance requirements and associated costs
Staff training is needed for system operation and monitoring.
6.1.3. Further research and development
This study highlights several promising directions for future research to advance the practical implementation of RL-controlled HVAC systems:
-
1. Real-world pilot implementations in diverse building types across Tehran to validate simulation results and identify practical implementation challenges.
-
2. Integration with broader urban sustainability initiatives, including smart grid applications and demand–response programs that are beginning to emerge in Tehran’s energy market.
-
3. Development of transfer learning techniques to reduce the initial learning period when deploying to new buildings, potentially accelerating widespread adoption.
-
4. Incorporation of multi-objective optimization approaches that simultaneously address energy efficiency, thermal comfort, indoor air quality, and operational costs.
-
5. Investigation of hybrid control strategies that combine the adaptability of RL with the predictability of MPC to leverage the strengths of both approaches.
-
6. Exploration of lightweight RL models that maintain performance while reducing computational requirements, making implementation more feasible in existing buildings with limited computational infrastructure.
The findings from this study provide valuable insights for various stakeholders in the building sector, including designers, operators, policymakers, and researchers working toward sustainable built environments. By demonstrating the significant potential of RL-controlled HVAC systems to improve energy efficiency while maintaining occupant comfort in Tehran’s challenging climate, this work contributes to the broader goal of reducing the buildings’ environmental impact while enhancing their functionality and user experience (Wallner et al., Reference Wallner, Tappler, Munoz, Damberger, Wanka, Kundi and Hutter2017; Symonds et al., Reference Symonds, Verschoor, Chalabi, Taylor and Davies2021).
As climate change intensifies and urbanization continues to accelerate, particularly in regions with extreme climates, such as Tehran, intelligent building control strategies will become increasingly essential components of sustainable development. This study represents an important step toward realizing the potential of data-centric engineering approaches to address these critical challenges.
Nomenclature
- Abbreviation
-
Definition
- COP
-
coefficient of performance
- DCRL
-
deep clustering reinforcement learning
- DCV
-
demand-controlled ventilation
- DQN
-
deep Q-networks
- DRL
-
deep reinforcement learning
- EER
-
energy efficiency ratio
- ERV
-
energy recovery ventilation
- EUI
-
energy use intensity
- HVAC
-
heating, ventilation, and air conditioning
- MAE
-
mean absolute error
- PMV
-
predicted mean vote
- PPD
-
predicted percentage of dissatisfied
- PPO
-
proximal policy optimization
- RL
-
reinforcement learning
- RMSE
-
root mean square error
- SEER
-
seasonal energy efficiency ratio
- UFAD
-
underfloor air distribution
- VRF
-
variable refrigerant flow
- Formula
-
Definition
- COP
-
ratio of useful heating or cooling provided to work required
- EER
-
ratio of output cooling energy to input electrical energy at a specific operating condition
- SEER
-
ratio of output cooling energy to input electrical energy over an entire cooling season
Data availability statement
The data that support the findings of this study are openly available at: https://github.com/adibhesami/HVAC-RL.
Author contribution
Mohammad Anvar Adibhesami: Data curation (lead); investigation (supporting); software (lead), and validation (equal). Amir Hassanzadeh: Writing—review and editing (equal) and conceptualization (supporting).
Funding statement
The authors declare that they have no known competing financial interests.
Competing interests
The authors declare none.
Comments
No Comments have been published for this article.