Time series analysis of GSS bonds

Abstract The market for green bonds, and environmentally aligned investment solutions, is increasing. As of 2022, the market of green bonds exceeded USD 2 trillion in issuance, with India, for example, having issued its first-ever sovereign green bonds totally R80bn (c.USD1bn) in January 2023. This paper lays the foundation for future papers and summarises the initial stages of our analysis, where we try to replicate the S&P Green Bond Index (i.e. this is a time series problem) over a period using non-traditional techniques. The models we use include neural networks such as CNNs, LSTMs and GRUs. We extend our analysis and use an open-source decision tree model called XGBoost. For the purposes of this paper, we use 1 day’s prior index information to predict today’s value and repeat this over a period of time. We ignore for example stationarity considerations and extending the input window/output horizon in our analysis, as these will be discussed in future papers. The paper explains the methodology used in our analysis, gives details of general underlying background information to the architecture models (CNNs, LSTMs, GRUs and XGBoost), as well as background to regularisation techniques specifically L2 regularisation, loss curves and hyperparameter optimisation, in particular, the open-source library Optuna.

Disclaimer: The views expressed in this publication are those of invited contributors and not necessarily those of the Institute and Faculty of Actuaries (IFoA).The Institute and Faculty of Actuaries do not endorse any of the views stated, nor any claims or representations made in this publication and accept no responsibility or liability to any person for loss or damage suffered as a consequence of their placing reliance upon any view, claim or representation made in this publication.The information and expressions of opinion contained in this publication are not intended to be a comprehensive study, nor to provide actuarial advice or advice of any nature and should not be treated as a substitute for specific advice concerning individual situations.On no account may any part of this publication be reproduced without the written permission of the Institute and Faculty of Actuaries.This paper expresses the views of the individual author and not necessarily those of their employers.

Executive summary
We are pleased to publish our first paper as a Working Party using data science techniques to look at sustainability and climate change-related issues.In this paper, we summarise the first stage of our analysis, where we introduce data science techniques to construct a time series analysis of the Standard & Poor's (S&P) Green Bond Index.

Scope of this Paper
The aim of this paper is to lay out the foundations for a time series analysis of green, social and sustainability (GSS) bond indices, and is not intended to be a definitive guide.We have deliberately excluded stationarity considerations and restricted this paper to a univariate analysis.We will include stationarity considerations and expand our examination to a multivariate analysis in subsequent papers.
For the purposes of this paper, we have focussed on the S&P Green Bond Index and performed various univariate time series analyses using a range of models, which include neural networks.This paper focusses on using a rolling window approach of one prior day's index value to predict today's index value.
In particular, this paper discusses (arranged as per the following Sections): • Section 2: Introduction ○ Background to GSS bonds and a brief explanation of the analysis covered in this paper.• Section 3: Data ○ Insight into the data used in our analysis along with summary information on the training/ validation/test splits.• Section 4: Summary of models used ○ A high-level summary of model architectures used in our analysis (i.e.neural networks and a decision tree) with supplemental background information, grouped into five model categories.
• Section 5: Training the models ○ Background information on the loss history, Adam optimiser, regularisation techniques, and hyperparameter optimisation techniques used in our analysis.• Section 6: Results ○ Summary tables and graphs for the best-performing model per model category.• Section 7: Conclusions and next steps ○ Summary of conclusions from our analysis and potential areas of analysis for subsequent papers.
Please note that Sections 4 and 5 have been included in this paper to assist with the general understanding of underlying model architecture, and the training process of neural networks.

Aim of the analysis
This paper focusses on the initial stages of our time series analysis of GSS bonds, specifically focussing on the daily values from the S&P Green Bond Index and whether we can create accurate prediction models using, for example, neural networks.This paper is the foundation for future analysis, where we hope to develop a model that can assist with GSS-bond index prediction, which will have wider applications such as index price modelling and investment portfolio analyses for actuaries and non-actuaries alike.
For the purpose of this paper, we are looking to predict a rolling 1-day value of the index, based on the prior day's index value over the period 2013 to 2023 inclusive.A rolling window approach is a typical approach for developing a time series model, where we assume prior history is used to imply a future stock index price.For example, we assume values of the prior x days influence the value of future value y days.The simplest approach of this type is to use 1 day prior to predict today's value (i.e. a rolling 1-day window) to repeat this process over a period of time.We discuss this in more detail in Section 3.2.
We set the baseline model such that today's value equals yesterday's value over the course of the full date range of 31 January 2013 to 17 February 2023 (referred to as 'Baseline model' in this paper).Applying a similar approach, we aimed to see if we can accurately create a time series model with non-traditional methods such as neural networks and a decision tree model (XGBoost).Please see Section 4 for further details.
Analyses using stationarity and multivariate techniques have been deferred to later papers.Similarly, we extend our analysis by looking at using the prior x index values (window) to predict the next y days in the future (horizon) in subsequent papers.

Data and method
We analysed the S&P Green Bond Index values between 31 January 2013 and 17 February 2023, splitting the data using 70%/20%/10% splits for training/validation/test data.
The machine learning models analysed can be categorised into the following model categories: Deep Neural Network (DNN), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural network architectures.These architectures and their distinctions are covered further in Section 4. A decision tree model, i.e.XGBoost, was also included as the final category in our analysis.Again, we discuss this in further detail in Section 4.
A prior, simplified examination was performed during the preliminary stages of our analysis, where we adjusted, for example, the number of hidden layers up to five hidden layers, to see if this produced material improvements in the model outputs.We do not address this part of the analysis further in this report.Following this stage, we narrowed down the model architecture per model category for further analysis which we present in this paper.Please see Appendix C for more details.
Though we have analysed different model architectures as described in Appendix C in our analysis, using the techniques described in Section 5, we have presented the best-performing models per category when discussing the results and conclusions in this paper (Sections 6 and 7, respectively).
The loss function used during model training was set to Mean Absolute Error (MAE) for all models, with an Adam optimiser used to update the weights in the neural network and L2 regularisation to reduce any overfitting across all models.Hyperparameter tuning of all models was completed via the open-source library Optuna, using the Bayesian optimisation algorithm Tree-structure Parzen Estimator (TPE).These are discussed in more detail in Section 5.

Results and conclusions
Although we analysed a range of models per model category (see Appendix C for more details) e.g., by varying the number and type of hidden layers, we have presented the best-performing model per model category in this paper.
The results of our analysis were inconclusive: the models from each category produced comparable results to the Baseline model with differences in Mean Absolute Percentage Error (MAPE) of up to approximately +/− 0.1% and hence with no material outperformance.The DNN and LSTM models marginally outperformed the Baseline model, based on the data range and parameters in our analysis.Given the initial approach of a 1-day rolling window, in effect we are potentially not sharing sufficient historical information or correlated information e.g., as per a multivariate analysis, for our models to learn underlying material information and patterns in the data to result in a model that materially outperforms the Baseline model.We aim to address this in future papers (see Sections 1.2.4 and 7 for more details).

Next steps
For future papers, we will expand our analysis to include the following: 1. Stationarity considerations by introducing, for example, an autoregressive, integrated, moving average model by widening the data input window and output horizon.A variation of this has been explored for example Peters et al. (2022) As mentioned, this paper lays the foundation for later papers and introduces a range of model architectures.We expect specific features such as convolutions for a CNN model, and iterations via LSTM and GRU cells to have a larger influence on results once we start expanding the input window and output horizon size in later papers.

Additional disclaimers
Please note the following: a. Information within this paper is valid up to 30 September 2023.Hence, there may be updates beyond this date that are not reflected in this paper e.g., changes to any legislation mentioned or updates to any open-source code/programming libraries used.b.This paper is not intended to be a comprehensive audit of models.Neither is this paper recommending or promoting one approach over another, nor promoting any of the sources or references stated in this paper.Any user of this paper should still reference the underlying legislation, reference any standard mentioned in this paper, and should there be any conflict, the underlying information in the relevant standard, reference or legislation supersedes any information presented in this paper.c.Though the work in this paper does not fall under the Financial Reporting Council's Technical Actuarial Standards, this paper has been reviewed both within the Working Party and by the Institute and Faculty of Actuaries' Data Science Practice Board.

Background to Green Bonds
A green bond is a type of fixed-income instrument that is specifically earmarked to raise money for climate and environmental projects (Segal, 2022).The key difference between green bonds versus conventional bonds is that green bonds are issued to finance projects that have a positive impact on the environment.Examples of such projects include renewable energy and clean transportation.For more background on green bonds, please see for example Investopedia's post (Segal, 2022) along with recent posts in The Actuary (Mitchell, 2022), which discuss the concept of a greenium.The first ever green bonds were issued in 2007 by the European Investment Bank (EIB) totalling approximately USD 807 million (CFI Team, n.d.).Since then, the market for green bonds has increased, with global cumulative green bond issuance passing the USD 1 trillion mark in 2020 (Climate Bonds Initiative, 2020), and stands at just over $2 trillion in 2022 (Statistica Research Department, 2023).
Please see immediate chart below from Climate Bonds, as well as related information, from their website (Climate Bonds Initiative, 2021) for details of this trend.
Please note that Figure 1 ignores any bonds issued prior to 2014.Currently, issuers of green bonds range from supranational institutions, public entities, and private companies (Iberdrola, n.d.).
In 2022, although Europe continued to lead the way in terms of issuing green bonds and represented approximately 47% of the green bonds issued that year (Climate Bonds Initiative, n.d.),China and the US accounted for approximately 18% and 13% respectively of the green bond market (Michetti et al., 2023).
The number of countries issuing green bonds continues to expand.Recently, for example, India issued its first-ever sovereign green bonds totalling R80 billion (approximately USD1 billion) in January 2023 (Reed, 2023).Similarly, Israel issued its first-ever sovereign green bonds at the start of 2023 approximately USD2 billion (Wrobel, 2023).
Please note that green bonds are part of the wider GSS-bond universe, which also cover socialand sustainability-aligned investments.For more details on GSS bonds, please see, for example, JP Morgan Asset Management (2023) and The World Bank (2023).

British Actuarial Journal
With the proposed legislative EU Green Bond Standard, we expect the trend of increased green bond issuance to continue.The Standard was originally submitted by the European Council back in 2021 (European Commission, 2021).The legislation received agreement between the EU Parliament and the EU Council on 28 February 2023 (European Commission, 2023a).Currently, this will be a voluntary regime which is intended to be the "gold standard" for green bonds (Latham & Watkins, 2023) and will be subject to supervision by competent authorities (as set out in the European Securities and Markets, Authority 2017 Prospectus Regulation).Some of the requirements include issuers to publish a prospectus and a green bond fact sheet.
See European Commission (2023b) for further details on the Standard, and Latham & Watkins (2023) for more general commentary.

Introduction to the S&P Green Bond Index and Our Time Series Analysis
There are several green bond indices in existence, including the Bloomberg Morgan Stanley Capital International (MSCI) Green Bond Indices, Financial Times Stock Exchange Green Impact Bond Index Series and the S&P Green Bond Index, where the index is designed to track the global green bond market.
For the purposes of this paper, we have focussed on the S&P Green Bond Index and aim to replicate the index using various models over a certain time period.We include data from the end of January 2013 to mid-February 2023 in our analysis.
The aim of this paper is to introduce some of the ideas from the initial stages of our analysis.Hence, we focus on a univariate time series model, where our model uses the prior day's value to predict the index value one day ahead (i.e.today's value).We aim to address more complex univariate time series and build on models discussed in papers such as Peters et al. (2022) (where a hybrid Seasonal Auto-Regressive Integrated Moving Average with Exogenous factors, SARIMAX and LSTM model is used) in future papers.Similarly, we aim to analyse any interaction between green bonds and the general stock market (i.e., multivariate time series) in a future paper, widening our analysis to cover other GSS bonds as well.
For the purposes of this paper, we will aim to produce a sufficiently accurate predictive model, where the majority of the models we analyse are based on a neural network architecture.Data from 31 January 2013 to around mid-February 2022 will be used to train and validate the models.These models are then used to predict daily index values on unseen data from mid-February 2022 to mid-February 2023 (the test data set).The difference in predicted values from our models to actual daily index figures will be used to gauge the accuracy of the proposed models.

Background to S&P Green Bond Index Data Used
For the purposes of this paper, we analyse the S&P Green Bond Index (Total Performance, in US dollars, from 31 January 2013 to 17 February 2023 inclusive).Part of the driver for this decision was to use an index that is freely available.For details of this index, please see the S&P Green Bond Index methodology paper published in February 2023, available via the main S&P website (S&P, 2023).
Figure 2 shows the index value over this period.
The underlying data is daily data taken from the main S&P website: https://www.spglobal.com/spdji/en/indices/esg/sp-green-bond-index/.Please note that the data is based on work days.We downloaded the data from the main S&P website and performed basic checks such as checking for blanks in the data, ensuring that there were no duplicate date entries, comparing the chart against the S&P website charts and other websites that displayed this index, as well as comparable charts from other research papers which analysed this index.No adjustments were made to the data.
One interesting item of note is how the index behaves after early 2020: there is a general upward trend in the index, increasing close to approximately 160 on 5 January 2021 and decreasing to approximately 110 on 21 October 2022.This may be reflective of the underlying impact of COVID on general markets from the start of this period, though this will be explored in more detail in subsequent papers.

Overview of Time Series Methodology
For the purposes of this paper, we have based our time series analysis on a simple sliding window technique, where a subperiod of historic data (window) is input into the model and used to predict the future period (horizon), where both the window and horizon are of duration 1 day.We then slide this window along the data by 1 day to predict the following day, and so on.Please see Brownlee (2020) for the background to this approach.Alternative methods such as a crossvalidation window approach (Shrivastava, 2020), which varies the window input length, are not analysed in this paper.We aim to look at more complex windowing techniques in future papers.
For the purposes of our analysis, the data set was split as follows: 70%/20%/10% between training/validation/test data sets.The first two splits are used to train/tweak our model (using the training/validation splits of data).We then tested the model against unseen data (test data) and compared the predicted outputs against actual observed data to gauge the model's accuracy.

Summary of Data
The splits mentioned above are shown in Figure 3.Given the nature of the analysis (a time series analysis), we have proportioned these splits in chronological order, so that we can build models to infer some form of prior/time dependency based on the underlying data and have not randomly allocated the data between these splits across the full data range.The splits correspond as follows: to 16 February 2020, to 16 February 2022 and to 17 February 2023, inclusive.Table 1 details further each data split.
In summary, there is greater volatility in the test data set range compared to the training and validation data sets.Hence, it will be interesting to see how our models cope given that they will be built on less volatile training and validation data.
Please note that, for the purposes of our analysis, we have not adjusted the data further i.e. no normalisation of the index (fixing data to lie within a scale from 0 to 1, a technique typically used to result in quicker convergence to a solution for a model) and no log transformation (which can be used to potentially dampen any impact of seasonality).Such techniques may be discussed in later papers.In this section, we give an overview of the underlying model architecture used in our analysis.For the purposes of this paper, we have mainly used models based on neural networks.In summary, these can be grouped as follows: 1.The Baseline model where we assume today's value is consistent with yesterday's, and then move our projection along by 1 day.The aim of a baseline model is to start our analysis off with something simplistic and act as a reference marker for other models to (ideally) beat (and hence justify any additional complexity in model design).2. A Deep Neural Network (DNN) model, which is a feedforward artificial neural network with 1, 2 or 3 hidden layers.(Please note that strictly speaking, a deep neural network has 2 or more hidden layers.For consistency and simplicity, we have retained the same labelling and categorisation approach between the models in this group.)3. A Convolutional Neural Network (CNN), where a CNN has a convolutional layer that effectively filters down information, stripping out noise to find an underlying pattern in the data.4. A Long Short-Term Memory (LSTM) model, which is a type of Recurrent Neural Network (RNN) aimed at resolving the vanishing gradient problem. 5.A Gated Recurrent Unit (GRU) model, which is a type of RNN and may be seen as a simplified version of LSTM.6.A decision tree/ensemble gradient-boosting library i.e.XGBoost.
We go into more detail on the architecture for the above models in Section 4.2.Please note that in our analysis we have used variations of models within each category above, for example by varying the number of hidden layers.For details of the models analysed, see Appendix C.However, for the purposes of this paper, we have presented the best-performing models per category in the results section (Section 6).

General Background to Neural Networks
The inspiration for the underlying design of neural networks is the design of the human brain.See IBM (n.d.) for a basic overview.
A traditional approach to tackling a problem such as modelling time series data is to take a prescriptive approach, for example, to consider a mathematical formula and build this formula based on observed data, where we analyse and assume some form of understanding of the underlying mechanics of a problem.The appeal of using neural networks is their potential to learn any form of problem.A neural network can train itself based on sufficient data, once the model's architecture has been decided upon.One positive of this approach is that we do not need to fully possess a mathematical model, or have even true underlying understanding, to tackle a problem.

Summary Comparison of Models Used
Below is a summary of the model architectures used in our analysis, with some explanatory comments on the underlying nature of each type of architecture.We have grouped our models into five categories in addition to the Baseline model.All models predict one day's value of the index, based on the prior day's observed index value over a time period.The underlying mechanics of a neural network are as follows: different weights and biases are applied to a value from a prior layer before an activation function is applied and then this value is passed on to the next layer.This is shown further in Figure 5, where we show the various components of a neuron for a hidden layer, where x i represents inputs, w i the weight and b the bias.
The aim of an activation function in a neural network is to introduce non-linearity to a regression model, such as a time series analysis.Please see Section 5.8 for more information on activation functions.

Deeper Dive into Model Architectures -CNN
CNNs are a type of neural network architecture that can work with two-dimensional (2D) data, for example, to classify images, though that can be extended to one-dimensional (1D) data such as time series and to three-dimensional (3D) data such as that used for video classification or medical image segmentation.
With a CNN, the underlying idea and aim of the architecture is to effectively summarise input information and extract underlying features or patterns in the data before performing further analysis.Typically, the information becomes quite large to handle and hence additional layers are introduced in the neural network to reduce this information whilst retaining important characteristics of the underlying data.CNNs are typically made up of: • A convolutional layer uses filters and kernels to extract features (i.e.underlying patterns) in the data.For the purposes of our analysis, the kernel can be viewed as the length of the input window of our time series, whilst the filter size represents the number of features in the data, with a single filter responsible for learning a single underlying pattern in the data.• A pooling layer further reduces the information produced by prior convolutional layers whilst still retaining important information.We have used a 1D CNN in this paper to deal with time series, as the kernel in effect moves in one direction (i.e., increases in time).Further, for the purposes of this paper, we have not included a pooling layer given the initial nature of our analysis.

Deeper Dive into Model Architectures -LSTM
LSTM was introduced in 1997 and builds on RNNs (where information is passed through the same layer multiple times before moving on to the next layer), with the aim of introducing some form of longer-term memory compared to an RNN, as well as addressing the vanishing gradient problem present in RNNs.An LSTM layer is a neural network layer that contains an LSTM cell.These cells have additional structures called gates to control the flow of information.Figure 7 shows the internal structure of an LSTM cell, where these three different gates are represented by the shaded, grey areas, though please note that there are many variants of an LSTM cell:

The components of an LSTM cell include:
• The cell state C t at time t which represents the long-term memory component: ) and C t-1 is the cell state at time t-1.
• The forget gate f t at time t decides which long-term memory component in the cell state is no longer needed and hence can be removed: • The input gate i t at time t decides which new information to add to the long-term component in the cell state: • The output gate o t at time t determines the value of the next hidden layer: In the above, W and b represent weight and bias vectors for each respective component, x t is the input vector at time t, and h t and h t-1 are the hidden state vectors at times t and t-1 respectively.

Deeper Dive into Model Architectures -GRU
Introduced in 2014, GRUs are similar to LSTMs where the flow of information is via internal gates.The internal GRU cell architecture is, however, simpler to an LSTM cell, with only two internal gates.This makes it potentially quicker to train compared to an LSTM model.Figure 8 shows the internal structure of a GRU cell, showing these two different gates (though please note that there are many variants of a GRU cell): The components of a GRU unit include: • The reset gate r t at time t decides which past information to forget: • The update gate z t at time t acts similarly as a combined forget and input gate of an LSTM unit i.e. decides which information to delete and which new information to add: In the above, W and b represent weight and bias vectors for each respective component, x t is the input vector at times t, and h t and h t-1 are the hidden state vectors at times t and t -1 respectively: British Actuarial Journal This uses the candidate activation vector ht = tanh (W h • [r t ⊙ h t-1 , x t ] + b h ).

Deeper Dive into Model Architectures -XGBoost
XGBoost uses a gradient-boosting algorithm that uses weak learners as building blocks.Weak learners can be viewed as simplified models that are improved upon during the iterative training process (see below).In the case of a time series, the weak learners are regression trees i.e., decision trees that output continuous variables.
XGBoost is an ensemble method, which takes the aggregate of results from multiple smaller weak learner models.Boosting refers to the fact that the model is built sequentially i.e. a model using values from prior model iterations to produce improved subsequent model iterations.Gradient refers to the fact that a gradient descent algorithm is used to reduce errors in sequential models.
Broadly, the process underlying a gradient-boosting algorithm is (Analytics Vidhya, 2018) and summarised in Figure 9 below: i.An initial model, F 1 (X), is defined to predict a target variable, which will result in associated residuals, r 1 = actual value (y)predicted value (ŷ).ii.A new model is fit to the residuals from the prior step, F' 1 (X).
iii.The models from steps i and ii are combined to produce an improved model, F 2 (X).iv.We repeat this process (steps i to iii) but using model F 2 (X) as the starting model, and so on.
The final model prediction F m X F m 1 X α m h m X; r m 1 , where α i and r i are the regularisation parameters and residuals for the i th tree, respectively, and h i is a function to predict residuals.XGBoost also incorporates parallel processing, tree pruning, handling missing values and regularisation to avoid overfitting (Morde, 2019).
Launched in 2014, XGBoost won the Higgs Machine Learning Challenge (ATLAS Collaboration, 2014).Further, the recent popularity of XGBoost is increasing, with implementations of the approach having won several Kaggle awards (Kaggle, n.d.).Though XGBoost is predominantly used for classification problems, it can also be extended to regression problems including time series analysis.

Input Windows & Output Horizons
As mentioned earlier, for the purposes of this paper, we will use a window of 1 day to predict a horizon of 1 day i.e., 1 day prior to predict today's value and repeat this method over the data set (broadly 2013 to 2023 inclusive).In subsequent papers, we will adjust both the window and horizon size, as well as introduce stationarity.

Loss History
The underlying variables for a neural network model architecture can be grouped into parameters and hyperparameters.Parameters are adjusted by the model itself during the training process, while hyperparameters are set/adjustable by the modeller.We go into further detail for both below.
We have used the Keras default Glorot Uniform Initialisation method to initialise the weights and biases in the neural network.Glorot and Bengio (2010) provide further details.These weights and biases are then updated during the training process via a combination of backpropagation and optimisation, with the overall aim of minimising the loss function and hence producing a betterfitting model.Backpropagation calculates the gradient of the loss for each weight and bias in the network, and the optimiser subsequently uses these gradients to update any weights and biases.
When training models, we aim to reduce the size of the loss function that measures how accurate/ inaccurate the model outputs are for, in our case, a batch of data.The loss information is indirectly fed back into the model via backpropagation for a neural network, and weight/biases are updated via an Adam optimiser (Brownlee, 2021a) for the purposes of this paper, for batch sizes of 128 consecutive index values.Any updates to parameters are made without intervention from the modeller.This process is then repeated a number of times, with the aim of producing an improved model, with an overall reduced loss function.
The loss function used in our analysis was set to MAE.The MAE takes the average of the absolute differences between actual versus predicted values for each batch of data i.e. 1 n where y is the actual value and ŷ is the predicted value.
For hyperparameter optimisation, we used Optuna (see https://optuna.org/).The hyperparameters optimised include: • The number of units (or neurons) in the neural network hidden layers.
• The number of iterations which information is passed through an LSTM cell or a GRU cell before moving on to the next layer.• The activation function used in each layer.
• The learning rate applied to the Adam optimiser.
• Filter size for CNN models.
• Return sequence for LSTM and GRU, which indicates if a single value or sequential information is outputted to the next layer.Please note that given the analysis in this paper (effectively 1 day in, 1 day out) we do not expect this hyperparameter to have any material impact on the results.
For XGBoost, the following hyperparameters were included in our search space whilst tuning via Optuna: • Eta, which represents the learning rate.
• Gamma, which is used to prune the branches.
• Max depth, which represents the maximum depth of each decision tree within the ensemble.
• L2 regularisation applied via a parameter called reg_lambda to leaf weights of individual decision trees.
Regularisation, Adam optimiser, Optuna and activation functions will be discussed in further detail in Sections 5.5-5.8.

Adam Optimisation in More Detail
Optimisers are algorithms that are used to update the parameters of a neural network (weights and biases) during the training process, with the overall aim of minimising a set loss function, utilising gradients calculated via the process called backpropogation.Optimisers include stochastic gradient descent, adaptive moment estimation (Adam), root mean square propagation and Adagrad, adaptive gradient algorithm (Duchi et al, 2011).
For our analysis, we have used an Adam optimiser as it potentially converges to a solution more efficiently than other methods such as stochastic gradient descent.
Introduced in 2014 (Kingma and Ba, 2014), the Adam optimiser is an algorithm for first-order gradient-based optimisation of stochastic objective functions, based on adaptive estimates of lowerorder moments.
The Adam optimiser combines the advantages of two other gradient descent methods (GeeksforGeeks, 2018) and (Musstafa, 2021): 1. momentumstoring exponentially weighted moving average of past gradients, to help overcome local minima and speed up convergence; and 2. root mean square propagation -storing the exponentially moving average of the past squared gradients, which helps in adapting the learning rates for each parameter individually.
The moving averages are used to calculate specific (adaptive) learning rates for each parameter (weight and bias).
Using similar notation as per the original published in 2014: Θ n1 Θ n α t mt p vt ε , where Θ n and Θ n1 are the parameters (a weight or bias in the neural network), at iteration n and n+1 respectively; mt is the bias-corrected first moment estimate; vt is the bias-corrected second raw moment estimate; α t is the learning rate, and ε is a small value used to prevent division by zero.
Expanding for the terms in the formulae above: where β 1 is a forgetting factor and used for decaying the running average of the gradient; g t is the gradient at time t along the parameter Θ. m t is the exponential average of gradients along the parameter Θ.
, where β 2 is a forgetting factor and used for decaying the running average square of the gradient; and g t is the gradient at time t along the parameter Θ. v t is the exponential average of squares of gradients along the parameter Θ.
is the bias-corrected second raw moment estimate.

L2 Regularisation in More Detail
We have to be conscious of overfitting when training a neural network i.e., when the neural network in effect memorises or closely matches the training data set to the point that it is unable to predict effectively on unseen test data.To avoid this, we can use regularisation techniques to dampen this effect.Figure 11 shows different categories of regularisation techniques which we can apply to reduce any overfitting when training our models.
In effect, L1 and L2 regularisation add in some form of penalty when training the model.For the purposes of our analysis, we have used L2 regularisation (also known as Ridge Regression) where we have allowed for adjustments to the weights in all non-baseline models via Keras's inbuilt regularisation methods (Keras, n.d.).
Given that the number of input features is minimal, and we do not have a requirement to minimise the number of input features, we have used L2 regularisation.
With L2 regularisation, a squared penalty term is added to the loss function.For example, if we adjust the MAE formula 1 n P n i1 y ŷ to allow for L2 regularisation, we obtain: , where y is the actual value and ŷ is the predicted value, and β is the penalty term, λ ∈ [0,1] is a regularisation parameter.

Optuna Hyperparameter Optimisation in More Detail
Hyperparameters, such as the number of neurons per layer, the activation function and learning rate, are set by the modeller and hence influence the overall architecture of a neural network.Parameters such as weights and biases are then adjusted by the network during the training process without intervention from the modeller.Various methods to find optimal hyperparameters exist including: 1. Manual tuning by the modeller.2. Grid search and random search approaches, which loop across a search space and try various candidates (combinations of hyperparameters), though search results do not feed into future searches.3. Bayesian optimisation where the aim is to produce a probability distribution of the objective function (i.e.loss function).Unlike grid and random searches, Bayesian optimisation techniques keep track of prior evaluations and use these values in future runs.
Hence, Bayesian optimisation techniques should produce solutions that converge more efficiently for more complex problems when compared to manual tuning and grid search approaches.
For our analysis, we used the open-source library Optuna to vary (tune) the hyperparameters.Optuna is an automatic hyperparameter optimisation software framework, particularly designed for machine learning (Optuna, n.d.).We used the Bayesian optimisation algorithm called Tree- structure Parzen Estimator, TPE which was introduced in 2011.Please see Bergstra et al. (2011) for more details on this.
Bayesian optimisation techniques involve: 1. Constructing a surrogate probability model of the objective function, which is a simplified version and hence computationally less expensive version of the actual probability distribution of the objective function.2. Using an acquisition function to choose the next set of candidates to evaluate.
For the purposes of our analysis, we used Sequential Model-Based Optimisation which is an iterative process that builds the surrogate probability model of the objective function using an acquisition function of TPE (Bergstra et al., 2011).
Introduced in 2013, TPE uses a mixture of 2 Gaussian distributions to set parameter values: i. for successful candidates, l(x); and ii.unsuccessful candidates, g(x): P(x|y) = l(x) if y < y*, or g(x) if y ≥ y*, where x is the single hyperparameter, y is the loss, y* is a threshold and P(x|y) is the probability of observing a single hyperparameter given a certain loss value.
The aim is to optimise an acquisition function -Expected Improvement (EI), which quantifies any potential gain in the objective function value from sampling a particular point in the parameter space.The aim of EI is to balance exploration (sampling points with uncertain outcomes) and exploitation (sampling points that are likely to improve the current best results).Using the same notation as per the original paper (Bergstra et al., 2011): where x, g(x), l(x), y and y* are as per above, and γ is some quantile of the observed y values.
For an introductory background to Optuna, please see Lim (2022) and Akiba et al. (2019).For more background to TPE, please see Watanabe (2023).For more details on algorithms for hyperparameter optimisation, please see Bergstra et al. (2011).

Activation Functions in More Detail
As mentioned earlier, the overall aim of activation functions is to introduce non-linearity to a neural network in a regression analysis such as a time series prediction model.Table 3 shows the set of activation functions used during hyperparameter optimisation in our analysis.

British Actuarial Journal
For more background on activation functions, please see GeeksforGeeks (2018) and Nwankpa et al. (2018).

Summary
In addition to the Baseline model, we ran multiple models from each of the five distinct categories, where we varied the model architecture within the same category e.g. by varying the number of hidden layers.Please see Section 3.3 for more details on the model categories used in our analysis as well as Appendix C.
In this section, we summarise the best-performing model from each model category and use the subscript best or suffix of _best to represent this.
Figure 12 shows the predicted value against expected value, over the test data set range from the Baseline model.
As can be seen, the overall Baseline model is a fairly good fit to the actual values over the test date range for this index, reflective of the fact that the index has a low daily volatility.Diving deeper, Table summarises the cost functions i.e., MAE and MAPE, for the best-performing model from each category analysed.Please note that we have not fully listed the results from all models in each category for ease of comparison.Also, Table 4 excludes XGBoost which is discussed in Section 6.2.
Figure 13 gives a graphical comparison of the results in order of performance, showing the best performing model from each model category excluding XGBoost.
We have adjusted the y-axis accordingly to highlight the differences in performance.
On both MAE and MAPE measures, the DNN and LSTM models marginally outperform the Baseline model.The Baseline model in turn marginally outperforms the CNN model, whilst the GRU model performs worst when compared to the other models.Each of the best-performing models from each model category perform within +/-0.1% accuracy based on a MAPE measure.Please note that these results are based on the data set examined along with the hyperparameter search ranges and model architecture (as described in Appendix C).
Different results may be obtained should we use, for example, different data ranges.If we exclude data from 2022 and rerun the analysis (assuming a similar training/validation/test split of 70%/20%/10% and hence the date ranges will now differ), we obtain similar conclusions: the models are close to the Baseline model with differences up to approximately +/-0.1% (based on a MAPE measure).However, the order of best-performing models and hence those which beat the Baseline model differ to those above with DNN, CNN and GRU now beating the Baseline model and the LSTM model performing worse.Again, given the magnitude of difference in performance between the Baseline model and each other model, the conclusions are not definitive.

A Note on XGBoost
Although XGBoost is typically used for categorisation problems, it can be extended to time series analysis.For example, please see Brownlee (2021b) and Kaggle (2019).
Using XGBoost on our data set produced unexpected results.As can be seen in Figure 14, the model gives a relatively poor fit compared with other models as it was unable to deal with the "troughs" in the test data set range.The predictions flat-lined and did not go below a level of 123 approximately.Similarly, if we look at the MAE and MAPE, we see that the XGBoost model is further off the Baseline model predictions when compared with the models earlier (Table 5).
It turns out that, although decision tree models such as XGBoost can be applied to time series problems, they struggle to extrapolate values and predict outside of the original training (and validation) data set range (where the minimum in the train data set is approximately 122 in this case).Please see Mavuduru (2020) for further details.

Conclusions
Based on our analysis, given the data range and training/validation/test splits of the S&P Green Bond Index, we can draw the following conclusions: 1.The Baseline model is fairly accurate.The forecasts produced by the Baseline model results in an error of approximately 0.5% MAPE over the test data set based on the date ranges used in the body of the paper.2. The neural network models analysed do not materially (if at all) beat the Baseline model.3.In saying this, the best-performing models came from the DNN and LSTM categories based on our analysis of this data set.The best-performing models from the CNN and GRU model categories performed worse than the Baseline model.4. All models differed by up to approximately +/-0.1% from the Baseline model when analysing their MAPE over the test data set range.  5.The XGBoost model was a poor fit compared to the other models analysed, especially over the period July 2022 to February 2023.This seems to be due to decision tree models being unable to predict values outside of the training (and validation) data values.6.Although the overall performance by the neural networks was not materially an improvement on the Baseline model, we hope that the reader can see that there is potential merit in using neural networks for tackling a time series problem in general i.e., beyond green bond applications.For example, for more complex scenarios, neural networks may provide a viable supplementary/alternative view to traditional techniques.We hope to explore and evidence this further in future papers.

Future Considerations
The aim of this paper was to introduce the initial stages of our analysis using non-traditional techniques including neural networks.Although we have not considered the points below in this paper, further areas we aim to explore in future papers include: 1. Extending our analysis to a wider data set beyond the S&P Green Bond Index, for example to Bloomberg Barclays MSCI's Green Bond Index as well as other GSS bonds.By widening the index analysed, it would be good to understand if similar results are produced as per Section 6 across different data sets and indices, and if techniques such as neural networks can successfully be applied to the wider GSS-bond universe.2. Extending our analysis to a multi-step projection model by increasing the input window and/or output horizon, where we compare results by producing predictions over the next week or month, say.Similarly, we aim to extend our analysis to use alternative windowing techniques such as cross-validation windowing, and pre-processing the data, for example via normalisation or lognormal techniques.In doing so, we can introduce the idea of stationarity into our analysis and build on ideas discussed in Peters et al. (2022) where a hybrid SARIMAX-LSTM model is used.We hope that this will improve the model and model outputs, as we are providing additional data to the model for it to learn from and hopefully interpret underlying patterns, which is not possible if only one day's prior data is fed into the model at a time.3. Exploring relationships with other indices and hence perform a multivariate analysis.For example, research paper by Gao et al. (2023) suggests that green bond indices are correlated with commodities such as oil.Hence, expanding the above analysis to include this could improve the model outputs.We would expect some form of wider influence and relationship with the general market.By exploring a univariate time series analysis, as we have done in this paper, we are ignoring any such potential relationships.4. Explore the concept of greenium further and see how this relates to other indices and varies over time for example pre-and post-COVID.For example, Cui et al. (2022) examines how green bonds have reacted to the COVID pandemic.5. Expand the neural network models used to more exotic models such as a CEEMDAN-LSTM architecture or N-BEATS architecture.For example, the research paper by Wang et al. (2022) suggests that a CEEMDAN-LSTM is more effective for time series analysis on green bonds compared to an LSTM model when analysing a time series on green bonds.The N-BEATS model is explored in Oreshkin et al. (2019) and has been designed with the specific aim of tackling time series problems.For example, it outperformed the Makridakis time series M4 competition model winner by 3% (Oreshkin et al., 2019).
Hence, in exploring the above, we hope that such models will provide improved results compared to the initial models chosen in this report as well as wider insight into GSS bonds and other time series problems in general.

Figure 3 .
Figure 3. S&P Green Bond Index data with training/validation/test splits highlighted.
Figure 4 is an example of a deep neural network, with an input layer (in green), fully connected hidden layers (in blue) and an output layer (in red).Each individual unit (circle below) is a neuron.

Figure 6 Figure 5 .
Figure6shows an example structure of a CNN architecture for a time series.

Figure 10
Figure 10 is an example of the training outputs from one of the models, showing the training and validation loss of MAE after each epoch.An epoch represents the period when the entire data set has been passed through a neural network during the training/validation process.As a model trains and converges to a solution, we would expect the loss to reduce after each epoch and hence the curve to decrease, as shown in Figure10.

Figure 10 .
Figure 10.Sample loss history curve from training one of the models.

Figure 12 .
Figure 12.Predictions from the Baseline model over the test range versus actual data.

Figure 14 .
Figure 14.Outputs from XGBoost model runs over the test range versus actual data.
, which explores time series and green bonds.2. Use of more complex models e.g. a variation of an LSTM model known as a Complete Ensemble Empirical Mode Decomposition with Adaptive Noise-LSTM (CEEMDAN-LSTM) model and Neural Basis Expansion Analysis for Interpretable Time Series (N-BEATS) model.The CEEMDAN-LSTM model is explored by Wang et al. (2022) on green bonds (see 4 below).The N-BEATS model is a time series model and is explored by Oreshkin et al. (2019).It outperformed the Makridakis time series M4 competition model winner by 3%.Both models are discussed further in Section 7 of this paper.3. Expanding the analysis to general GSS bonds.The analysis in this paper is based on a single green bond index.We will look to expand our analysis to the wider GSS-bond universe and over differing date ranges for the data to see if there are general underlying patterns.4. Expanding the analysis to include any potential relationships with the general market such as stock market and oil prices i.e. move to a multivariate analysis in subsequent papers.

Table 1 .
Summary information of the full data used, as well as the training, validation and test data sets

Table 2
highlights typical problems we would normally associate with each category of model.However, each model category can be extended to other areas such as time series, subject to the model producing sufficiently accurate results.Sections 4.4-4.8detailfurthergeneral model architectures for each category in Table2.

Table 2 .
Summary Table Showing Categories of Models Used in this Paper

Table 3 .
The Set of Activation Functions Used During Hyperparameter Optimisation

Table 4 .
Comparison of Performance of the Best Performing Model from Each Model Category against the Baseline Model Comparison of best-performing models using MAPE from each model category excluding XGBoost.

Table 5 .
Comparison of Performance of XGBoost against the Baseline Model