An explainable multi-agent recommendation system for energy-efficient decision support in smart homes

Abstract Transparent, understandable, and persuasive recommendations support the electricity consumers’ behavioral change to tackle the energy efficiency problem. This paper proposes an explainable multi-agent recommendation system for load shifting for household appliances. First, we extend a novel multi-agent approach by designing and implementing an Explainability Agent that provides explainable recommendations for optimal appliance scheduling in a textual and visual manner. Second, we enhance the predictive capacity of other agents by including weather data and applying state-of-the-art models (i.e., k-nearest-neighbors, extreme gradient boosting, adaptive boosting, Random Forest, logistic regression, and explainable boosting machines). Since we want to help the user understand a single recommendation, we focus on local explainability approaches. In particular, we apply post-model approaches local, interpretable, model-agnostic explanation and SHapley Additive exPlanations as model-agnostic tools that can explain the predictions of the chosen classifiers. We further provide an overview of the predictive and explainability performance. Our results show a substantial improvement in the performance of the multi-agent system while at the same time opening up the “black box” of recommendations.


Introduction
Europe faces a double urgency to increase energy efficiency: on the one hand, caused by the war in Ukraine, and on the other hand, due to the continuous rise in electricity consumption (European Commission, 2022).Tackling the energy efficiency problem through consumers' behavioral change is an obvious, however, challenging solution.People often need guidance and sometimes a soft nudge to put the intentions into actions (Frederiks et al., 2015), for instance, to change the timing of appliance usage.Recommender systems can suggest energy-efficient actions to facilitate such behavioral changes.To increase trust in the recommendation system, and, thus, the acceptance rate of recommendations, users need to understand why and how the model makes its predictions (Luo et al., 2021;Sayed et al., 2022).Thus, the recommendation system should be explainable.
Existing recommendation systems for enhancing energy efficiency in residential buildings vary in approach and implementation.Pinto et al. (2019) introduce a multi-agent case-based reasoning system that focuses on load curtailing.Ran and Leng (2019) propose a load-shifting strategy for optimizing energy costs.Jimenez-Bravo et al. (2019) present a multi-agent system that offers load-shifting recommendations.Sinha and De (2016) develop a load-shifting algorithm that prioritizes time-of-day tariffs, offering potential cost savings and scalability.Machorro-Cano et al. (2020) introduce a home energy management system that generates recommendations based on behavioral patterns.
However, the existing research on explainability in recommender systems for energy-efficient smart homes is very scarce (Himeur et al., 2021).Zhang and Chen (2020) provide a thorough literature review on explainability in recommender systems for other application domains.However, most existing approaches are not applicable to the smart home area because of the missing data structures.Sardianos et al. (2021) design an explainable context-aware recommendation system for a smart home ecosystem.They show that displaying the advantages and the reasoning behind recommendations leads to a 20% increase in acceptance rate.To the best of our knowledge, the issue of explainability in multi-agent recommendation systems for energy-efficient smart homes has not been studied yet.
Our contributions are twofold.First, we suggest an explainable multi-agent recommendation system for energy efficiency in private households.In particular, we extend a multi-agent approach of Riabchuk et al. (2022) by designing and implementing an Explainability Agent that provides explainable recommendations for optimal appliance scheduling in a textual and visual manner.Second, we enhance the predictive capacity of other agents by including weather data and applying state-of-the-art models.We also provide an overview of predictive and explainability performance.Riabchuk et al. (2022) introduce a utility-based context-aware multi-agent recommendation system that provides load-shifting recommendations for household devices for the next 24 h.Their system includes six agents: Price Agent (prepares external hourly electricity prices), Preparation Agent (prepares data for the other agents), Availability Agent (predicts the hourly user availability for the next 24 h), Usage Agent (calculates the devices' usage probabilities for the prediction day), Load Agent (extracts the typical devices' loads), and Recommendation Agent (collects the inputs from the other agents and provides recommendations).The multi-agent architecture is flexible and can be easily integrated into existing smart home systems.However, the cost of the simplicity of the approach (i.e., they use logistic regression for the availability and usage predictions) is a relatively low prediction accuracy.

Explainable multi-agent recommendation system
We address the limitations in Riabchuk et al. (2022) by enhancing the performance of the Availability and the Usage Agents.In particular, we apply the K-nearest neighbors (KNN), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), and Random Forest to predict the availability and usage probabilities in the smart home environment.Furthermore, we use logistic regression (Logit) and explainable boosting machines (EBM) as inherently explainable models designed for interpretability (Nori et al., 2019).
We justify our choice of advanced algorithms for the following reasons.First, the size of the dataset, the complex feature space, and the nonlinear energy consumption patterns make it difficult for a linear model such as linear regression to effectively capture the complexity.Second, our goal goes beyond predictive accuracy.We aim at providing explainable recommendations to consumers, thereby encouraging behavioral changes to enhance energy efficiency.Advanced models such as Random Forest, XGBoost, and EBM offer a balance between predictive power and interpretability that is crucial for our application.
The explainability models are divided into local and global, depending on their capability to explain a particular instance or the entire model.Since we want to help the user understand a single recommendation, we focus on explainers of local models aiming at uncovering the reasons for the decision of a e7-2 Alona Zharova et al.
black-box model for a specific instance.We apply post-model approaches local, interpretable, modelagnostic explanation (LIME) (Ribeiro et al., 2016) and SHapley Additive exPlanations (SHAP) (Lundberg and Lee, 2017) as model-agnostic tools that can explain the predictions of the chosen classifiers.In particular, LIME uses simple linear models to explain the black-box model and focuses on explaining the prediction of a single instance of data in an interpretable and faithful manner.SHAP explains the prediction of a black-box model by computing the contribution of each feature to the prediction.
We propose an Explainability Agent that is called within the Recommendation Agent (see Figure 1).To create an explanation for a recommendation, the Explainability Agent extracts feature importance from the explainability models and provides them to the Recommendation Agent.To inform the user about what causes the specific recommendation besides price, we design the explanation for a single device recommendation to include two parts: usage and availability explanations.The usage explanation shows which features lead to the specific device usage prediction for the day, whereas the availability explanation describes which features drive the user availability prediction for the hour.We do not include an explanation for the Load Agent since we do not consider the extracted typical load profile of every shiftable device as informative to the users.As a result, the Recommendation Agent suggests the cheapest starting hour within the hours of user availability for the shiftable devices that are likely to be used on the prediction day with an explanation in text and visual form.The system provides no recommendations if the predictions for the availability and usage probabilities are below the thresholds.

Experimental design
We use the REFIT electrical load measurement dataset (Murray et al., 2017).It contains the energy consumption in Watts of nine appliances used in 20 households in the United Kingdom as well as the aggregate energy consumption in each household over the period 2013-2015.Energy consumption readings are recorded at 8-s intervals.We perform a number of preparation steps, including data aggregation; data preparation (i.e., examining data for skewness, identifying outliers, handling outliers, scaling the data); feature creation; and weather data integration.The energy consumption data have skewed distributions, likely due to periods when appliances are switched off and not using energy.However, this skewness is considered acceptable for the analysis since the focus is on periods when appliances are actively in use.We handle the outliers in the energy consumption data through outlier truncation method to ensure that extreme values do not influence the analysis.
To capture hidden patterns, improve predictions, and reveal insights beyond the raw data, we create several types of features from the energy consumption dataset.The device usage features are binary variables that indicate whether a device has been used in a given time period.These features are based on the scaled energy consumption of each device and involve setting a threshold to account for noise in the consumption data.The last usage feature calculates how many time periods have passed since the last usage of a device, providing insight into the frequency and recency of device use.The user availability feature is a binary variable that indicates whether the user was available during a given time period.It is based on Figure 1.Architecture of the explainable multi-agent recommendation system.
Environmental Data Science e7-3 the use of appliances that require user interaction (e.g., tumble dryer, washing machine, dishwasher).The Time features include information such as the hour of the day, day of the week (i.e., index), day name, month index, month name, and whether the period falls on a weekend.The time lag features represent time shifts for specific periods and features.For example, the availability lag and hour lag features provide insight into how usage and time-related variables change over time intervals (e.g., lag of 1 h, 2 h, and 5 h).These additional features enhance the analysis of energy consumption and user behavior, facilitating the exploration of device usage, availability, and temporal patterns to gain insight into the factors that affect household energy consumption.Furthermore, the weather may be crucial for the use of some appliances in a household.Therefore, we extend our dataset with specific weather features from the Meteostat (2022).In particular, we use the following features: dew point (dwpt), relative humidity (rhum), air temperature (temp), average wind direction (wdir), and average wind speed (wspd).The missing values are imputed using the KNN algorithm.
We apply an exhaustive grid search over the algorithms mentioned in the previous section, excluding the EBM.The latter is computationally expensive.Overall, 87 parameter combinations are tested twice (with and without weather data) to quantify the benefit of including additional data.We use a KernelExplainer for explaining Logit, AdaBoost, KNN, and Random Forest.For XGBoost, we use the fastest version of the TreeExplainer-interventional feature perturbation (Lundberg et al., 2020).
To evaluate the performance of the multi-agent model, we apply multiple metrics depending on the task.The Usage and the Availability Agents perform a classification task, and therefore, we evaluate their area-under-the-ROC-curves (AUC).We evaluate each day by predicting probabilities of availability and usage and compare them to the true values.The Load Agent is evaluated by investigating the mean squared error (MSE) of the predicted load profile to real usages.We refer to Riabchuk et al. (2022) for details.We do not make any changes to the prediction approach of the Load Agent and, therefore, we do not report its performance evaluation.
With the evaluation of the explainability approaches, we aim at a quantitative comparison of the individual explanations that the models offer.For this purpose, we use three metrics within the Explainability Agent to reflect how well the explainability approaches work in giving accurate local explanations (Carvalho et al., 2019): accuracy, fidelity, and efficiency.Accuracy shows how well the explainable model predicts unseen instances compared to the real outcome.We use the AUC as a measure to compare the true labels with the predictions from the explainers.Fidelity determines how close the prediction from the explainable model is to the black-box model's prediction.In other words, fidelity describes how well the explainability model can imitate the prediction of the black-box model.Additionally, we calculate the mean absolute explainability error (MAEE) for every approach to measure how close is the decision of the black-box model to the explainability approach.The MAEE represents how well the explainability approaches are calibrated.The efficiency metric describes the algorithmic complexity of the explainability approaches when calculating the local explanations.For this purpose, we measure the time that each method needs to calculate all the local explanations for a day and average the values for all calculations.

Results
Table 1 provides the performance evaluation results for the Availability and Usage Agents without weather data.The models show relatively stable performance for the Availability Agent, with AdaBoost slightly outperforming the others.For the Usage Agent, we observe a much greater variation in performance across devices, with the highest AUC achieved by the KNN model.The inclusion of weather data (see Table 2) leads to slight performance improvements for the Availability Agent, but a significant performance increase for the Usage Agent.Without the additional data, most models achieve an AUC of around 0.7 for the Usage Agent.However, the inclusion of the weather data allows for a substantial increase in performance toward 0.9.Complex models in particular benefit considerably from the inclusion of the data, outperforming the approach of Riabchuk et al. (2022).
Specifically, the Random Forest model profits the most from additional data and consistently outperforms other models for both agents and devices.We conclude that it is not sufficient to use either a more complex model or weather data.Instead, it is the combination of both that leads to significant performance improvements.Environmental Data Science e7-5 To assess the stability of the Random Forest model, we further analyze the AUC across 10 households in Table 3 (see Table A1 in Appendix for performance evaluation results with logistic regression).The results indicate consistent performance for both the Availability and the Usage Agents, with an average AUC of 0.812 and 0.918, respectively, across all households.Based on these findings, we utilize the Random Forest model with the tuned hyperparameters obtained from the grid search for the prediction tasks of both agents.

Explainability
We examine the results of the explainability evaluation by using the predictions from LIME and SHAP (see Tables 4 and 5).SHAP generally performs better according to the higher accuracy and fidelity compared to LIME.Most strikingly, since the forecasts of the prediction algorithms and SHAP are so similar, the fidelity is almost perfect.In other words, SHAP works very well for mimicking the local behavior of the prediction model.The accuracy of LIME for the Usage Agent is not as high.The possible reason for this is that the predictions, for example, for AdaBoost, are very close to the chosen cutoff of 0.5 to create target values of the prediction.For example for the AdaBoost model, LIME is only off by an average of 0.0311 AUC points, sometimes exceeding the cutoff and assigning the values to the wrong class.For the Availability Agent, the poorer calibration of LIME does not have too much of an impact as the prediction values are more extreme.Furthermore, SHAP produces predictions faster than LIME in most cases.Therefore, we choose SHAP over LIME for our explainability task because of its higher performance on the metrics.
Since there is no significant difference between the fidelity of SHAP across the models, we apply SHAP with one of the most consistent and stable algorithms.The AdaBoost model shows the highest  performance for the Availability Agent, followed by the Random Forest with a slightly lesser performance.For the Usage Agent, the best model is the XGBoost, closely followed by the Random Forest.Thus, the Random Forest model has a consistently good performance across agents and is therefore selected as a final model for the prediction task.
The reported SHAP runtimes within the Random Forest model for the Availability and Usage Agents are 64.853 and 0.957, respectively (see Tables 4 and 5).To expedite the explainability evaluation (i.e., decrease the duration) while retaining accuracy, we conduct further runs with the TreeExplainer.This led to a significant reduction in the SHAP runtimes for Availability and Usage Agents to 0.558 and 0.0357, respectively.It is noteworthy that this alteration in runtime has minimal impact on the other metrics.Consequently, we utilize SHAP and Random Forest with TreeExplainer as the ultimate model.

Explainable recommendation
The recommendation for the next 24 h is provided once a day at the same time specified by the user.To create an explanation for the recommendation, we create a feature importance ranking using SHAP values.We provide two different explanations for the Availability and the Usage Agents.We separate the features into two categories for each of the agents: features based on weather data and non-weather features.The Recommendation Agent embeds the two most important features of each group into an explanation sentence.The usage explanation is provided for each device (if its usage is recommended) since their predictions differ.Additionally, we adapt the plots provided by SHAP to inform the user about the specific impact of the features.We only display the most important features to shorten the explanation.We show an exemplary explainable recommendation in Figure 2.

Future work
In our future work, we aim to create an integration of the explainable multi-agent recommendation system for the existing open-source smart home platform Home Assistant.With around 330,000 active users worldwide (Home Assistant, 2024), it provides a viable real-world environment for testing our solution within selected households.We are going to modify the recommendation system to work with real-time data streams from the smart home and include a user feedback option.This research work on an explainable multi-agent recommendation system lays a solid foundation for further evaluation of the interpretability, comprehensibility, and understandability of the recommendations using user feedback.

Conclusions
This paper presents an explainable multi-agent recommendation system aimed at enhancing energy efficiency in private households.Our empirical results show significant performance improvements by incorporating weather data into the recommendation system.In particular, we observe notable improvements in AUC values, especially for the Usage Agent, where performance increases from around 0.7 to approximately 0.9.This highlights the importance of leveraging weather data to improve energy efficiency recommendations.
In the context of explainability, our evaluation highlights the superiority of SHAP over LIME, with SHAP exhibiting higher accuracy and fidelity.Particularly, SHAP's ability to closely mimic the local behavior of the prediction model contributes to its selection as the preferred explainability method.Furthermore, our findings consistently identify the Random Forest model as a strong performer across different agents and devices.The ability of this model to consistently achieve high performance underscores its suitability for the prediction tasks within the recommendation system.
To conclude, our study demonstrates that generating load-shifting recommendations for household appliances as explainable enhances the transparency and trustworthiness of the system.The comprehensible and persuasive nature of these recommendations facilitates consumers behavior change, making them more inclined to address the energy efficiency problem.

Table 1 .
Performance evaluation results (in AUC) for the Availability and the Usage Agents with tuned hyperparameters excluding weather data

Table 2 .
Performance evaluation results (in AUC) for the Availability and the Usage Agents with tuned hyperparameters including weather data Abbreviations: TD-tumble dryer, WM-washing machine, DW-dishwasher.The best-performing model is in bold.

Table 3 .
Performance evaluation results (in AUC) for Random Forest for the Availability and the Usage Agents for 10 households Abbreviations: DW, dishwasher; TD, tumble dryer; WD, washdryer; WM, washing machine; WM2, second washing machine.

Table 4 .
Explainability evaluation results for LIME and SHAP for the Availability Agent for 10 households The duration is only comparable within the model since the evaluation was run on different machines with different background tasks. *

Table 5 .
Explainability evaluation results for LIME and SHAP for the Usage Agent for 10 households *The duration is only comparable within the model since the evaluation was run on different machines with different background tasks.