Relation analysis of ship speed & environmental conditions: Can historic AIS data form a baseline for autonomous determination of safe speed?

Abstract As no internationally agreed-upon method for determining safe speed values currently exists, collecting vast amounts of information on conventional ship behaviour could be used to train autonomous ship intelligence in determining safe speeds in different conditions. This requires speed data collected from conventional ships to resemble what can be described as safe speeds. To test this, the Automatic Identification System (AIS) and environmental data – namely visibility, mean wind speed and significant wave height – were collected and merged for two study areas in Norway in the period between 27 March 2014 and 1 January 2021. Regression analyses based on 47,490 unique vessel transits were conducted and supplemented by two graphical methods for revealing relationships between variables. Contrary to the contemporary understanding of safe speed, reduced visibility did not lead to significantly reduced transit speeds. Wind and waves caused a reduction in speed in the open ocean, but not in coastal waters. Transit speeds were lower in coastal waters than in the open ocean.


Introduction
Autonomous shipping has been one of the hot topics in shipping for the past few years. The topic has received widespread attention by academia, regulatory bodies, and private companies alike. With projects such as the Yara Birkeland, we now have actual cargo ships in operation that are online to operate fully autonomously by the year 2024 (Raza, 2022). The International Maritime Organization (IMO)the United Nations specialised agency with responsibility for the safety and security of shippinghas responded to the push for autonomy by conducting a regulatory scoping exercise on Maritime Autonomous Surface Ships (MASS), which was finalised in May 2021. With so much development happening in the field of autonomous shipping, the need for research in the area is as vital as ever.
A systematic review of the safety challenges for MASS published in 2019 (Dreyer and Oltedal, 2019) highlighted a number of areas that needed further research, among them the development of smart methods and criteria that support MASS compliance with the International Regulations for the Prevention of Collisions at Sea (COLREGs), which state the basis of agreed practices for avoiding collisions at sea. The need for smart methods and criteria lies in the nature of the COLREGs, which relies on a large number of qualitative terms [such as 'early' and 'substantial' (Porathe, 2019)], thereby delegating much of the rule-system to the interpretation of the navigator. This constant requirement to interpret qualitative terms included in the rules is exemplified by the requirement for all vessels to proceed at a safe speed at all times (IMO, 1972). The rules do not provide any quantification as to what speeds could be considered 'safe', and while attempts have been made at quantification, the IMO has not agreed upon an acceptable method for determining what value of speed could be considered 'safe' (Cockcroft and Lameĳer, 2012). It is unlikely that the rules will be amended in a way that removes these qualitative terms in the near future, for two reasons. Firstly, ambiguity is said to be the necessary price of applicability, as a completely prescriptive and rigid rule-system would be infinitely complicated (Taylor, 1990). Secondly, the IMO has stated in the recently published outcome of the regulatory scoping exercise on the use of MASS 'that COLREG in its current form is still the reference point and should retain as much of its current content as possible' (IMO, 2021, p. 86).
As collision avoidance is seen as a game of coordination where navigators on different vessels must independently choose mutually compatible strategies (Cannell, 1981), it is of utmost importance to ensure that MASS behave in a way that is coherent to human navigators. Already today, the interaction between traditional ships is seen as problematic (Porathe, 2019), and collisions do still occur. It is warned that autonomous ships following a machine interpretation of the COLREGs may lead to even more uncertainty in the future, possibly causing more navigational problems (Porathe, 2019).
A proposed solution to this problem is the utilisation of deep-learning in autonomous ship system intelligence. Under this approach, vast amounts of information on conventional ship behaviour -including vessel speed and external environmental conditions -is collected as big data sets that are used for training autonomous ship intelligence. Humans essentially train the autonomous vessels, causing them to exhibit similar behaviour in similar circumstances (Perera, 2018). The deep-learning solution is seen as promising, as a similar approach in driverless cars has achieved promising results in terms of navigating with the required safety levels (Liu et al., 2017). Note that the deep-learning approachwhich essentially envisions MASS mimicking conventional ship behaviour -hinges on conventional ship behaviour being both safe and legal. However, contemporary research on the application of deeplearning in autonomous ship intelligence commonly ignores this requirement. Instead, historic data is regularly utilised to build models of normalcy (Yan et al., 2020), where adherence to the model is seen as a sign of safety (Xu et al., 2019) and deviation is seen as a sign of high-risk behaviour (Yan et al., 2020). This paper therefore explores whether vessel speed data collected from conventional ships in various external environmental conditions actually resemble safe speeds, and can therefore be used for deeplearning purposes in MASS. This is done by comparing the data with accepted interpretations of what constitutes a safe speed.
The research questions this paper aims to answer are as follows: 1. What are the relationships between vessel speeds and visibility, and wind and waves in coastal waters and in the open ocean? 2. Do the observed speeds qualify as safe speeds under the contemporary theoretical understanding of safe speed?

Safe speed determination
As mentioned in the Introduction, rule 6 of the COLREGs requires that 'every vessel shall at all times proceed at a safe speed', without ever quantifying what speeds could be considered 'safe' in different conditions (IMO, 1972). Neither is there an internationally agreed-upon method for determining safe speed values. So, what constitutes a safe speed? The COLREGs themselves define it as a speed where a vessel 'can take proper and effective action to avoid collision and be stopped within a distance appropriate to the prevailing circumstances and conditions' (IMO, 1972). Examples of factors that shall be taken into account when evaluating the prevailing conditions include visibility, traffic density, manoeuvrability, background light and proximity of navigational hazards, as well as the state of wind, sea and current. Visibility is listed first among the factors to be taken into account (IMO, 1972). This apparent importance of visibility is reverberated in various available guides and commentary to the COLREGs. In his inquiry into safe speed, Kavanagh (2001) notes that there is a general rule of thumb where vessels are proceeding at a safe speed when they can be stopped within half the distance of the visibility. While he does not agree that this 'half-visibility' rule should be adopted as a starting point for assessing a safe speed, he does conclude with the statement that visibility is the primary consideration in determining safe speed. In their guide to the collision avoidance rules, Cockcroft and Lameĳer (2012) assert that 'visibility is obviously of major importance' (Cockcroft and Lameĳer, 2012, p. 20), and that it is 'in restricted visibility that the need to moderate the speed generally applies' (Cockcroft and Lameĳer, 2012, pp. 17-18). Rutkowski (2016) simply declared that it is dangerous to go fast when visibility is poor.
To get an understanding of what it means for visibility to be poor, the visibility classification of the national meteorological service of the United Kingdom -the Met Office -can be utilised. The definitions included in their marine forecasts glossary can be accessed in the Appendix, Table A1.
When it comes to other environmental factors -such as wind and waves -less guidance is available. In their comments to rule 6 of the COLREGs, Cockcroft and Lameĳer (2012) do not mention wind at all and sea state only in combination with visibility, as high waves may hinder the detection of other vessels by radar. Kavanagh (2001) sees the state of wind and sea as an important consideration in the determination of safe speed, but also couples these factors to visibility. In his legal inquiry, Kavanagh noted that precedent requires a reduction of speed in a hurricane, where waves reach up to 15 metres in height and visibility is reduced to zero due to spray and foam in the air (Kavanagh, 2001).
When looking at the contemporary guides, commentary and research on the COLREGs and safe speed, our current understanding of safe speed requires vessel speeds to adhere to the following general pattern: Safe vessel speeds have a strong correlation with the prevailing visibility conditions, and generally require a reduction of speed when visibility is restricted. The association between safe vessel speeds and the state of wind and sea is less transparent -while the importance of the state of wind and sea is said to be less than that of the state of visibility, vessel speeds should be reduced in conditions of strong winds and high seas to remain safe.

Description of research approach, study area and collected data
This section first discusses the research approach of this paper, then introduces the reader to the geographical areas for which data was collected, and finally provides an overview of the data collected.

Research approach
The wide availability of historic Automatic Information System (AIS) data has meant that these data have been used as the big data basis in research projects on MASS autonomous navigation (Gao et al., 2022). AIS is a communications system that provides automatic reporting between ships and to shore by exchanging information such as identity, position, time, course and speed (IALA, 2016). However, if speed data collected from conventional ships in various external environments are to be used to teach MASS how safe speed is determined, it must first be verified that the data themselves represent both safe and legal speeds. By analysing vessel speed data received from AIS with respect to data on the external environmental conditions, this paper looks closer at whether vessel speed data collected from AIS would contemporarily be considered safe speeds.
Dreyer (2021) collected AIS and visibility data in open waters off the Norwegian coast, and looked at whether the AIS and visibility data show a strong relationship between visibility and vessels speeds, and whether the AIS data shows a trend of vessels proceeding at a reduced speed in restricted visibility. In this paper, the visibility data collected offshore are supplemented by wind and wave data. Additionally, AIS, wind, and visibility data were collected for an additional location in a Norwegian sound, allowing for comparison of vessel speed behaviour in locations with different traffic densities and proximity to navigational hazards. This inclusion of additional data advances the previous research, as more factors that the COLREGs commands to be considered are included in the analysis. More information on the data collected, and where they were collected, is given in Sections 3.2 and 3.3.
The research data were handled in Microsoft Excel, and the tools available within the program were used to analyse the data. Analysis included both visual means in the form of graphs, and simple linear regression analyses for predicting vessel speeds based on different variables. Regression analysis is the study of relationships between two or more variables and is usually conducted when we either want to know whether any relationship between two or more variables exists or when we are interested in understanding the nature of the relationship between two or more variables (McIntosh et al., 2010). The result is a regression equation: where Y is the dependent variable, X is the independent variable, 0 is the Y intercept, and 1 is the slope coefficient. A regression equation was deemed to be significant when the calculated p-value 1 was less than 0 · 05. The data analysis is presented in Section 4, the results highlighted in Section 5 and a discussion follows in Section 6. In the discussion, the focus will be on determining whether our contemporary understanding of safe speed would consider the data to represent safe vessel speeds.

Study areas
This section introduces the two study areas in which AIS and external environmental data were collected.

Gjøa A
The first area, which is identical to the study area described in the previous research conducted by Dreyer (2021), is located approximately 18 nautical miles off the coast of Western Norway. This area was chosen due to its location in open sea close to normal shipping routes, combined with the availability of historic AIS and external environmental data. Due to its proximity to the 'Gjøa A' platform -where the historic external environmental data were measured -the area will be called the Gjøa A study area in this paper. Figure 1 depicts the location of the Gjøa A study area.
The Gjøa A study area is approximately 4 · 2 by 4 · 2 nautical miles in size, located to the east of the Gjøa A platform between the traffic separation scheme (TSS) Off Stad in the north and TSS Off Sotra in the south. As can be seen in Figure 1, the measuring station for external environmental data is located outside the Gjøa A study area. While this may have the negative consequence of the external environmental data measured at the measuring station differing slightly from the actual external environmental data within the Gjøa A study area, the decision to place the study area to the east of the platform was taken to ensure two things. First, the Gjøa A study area was chosen due to its location in open sea, and having a large platform located within the study area may cause disturbing effects that are difficult to control. Second, moving the study area to the east of the external environmental data measuring station ensures that the location of the Gjøa A study area is within a normal shipping lane. As can be seen from the AIS density plot overlay in Figure 1, the study area covers traffic transiting southbound along the Norwegian west coast, while avoiding most of the nontransit traffic around the Gjøa A platform. The water depth in the study area is approximately 350 metres. The dangerous waves that might be encountered at Vaerøygrunnen, which is approximately 10 nautical miles east of the Gjøa A study area, are unlikely to affect vessels navigating in the Gjøa A study area. This is because while the water depth at Vaerøygrunnen is rapidly decreasing to shallow waters, water depths in the Gjøa A study area are uniform and deep.

Sotra Bridge
The second area for which data were collected in this paper is an area centred around the Sotra Bridge, a suspension bridge that crosses the Knarreviksund in Western Norway. It was chosen because it covers normal shipping routes in coastal waters, with readily available AIS and external environmental data. As the area is centred around the Sotra Bridge, it will be called the Sotra Bridge study area in this paper. Figure 2 depicts the location of the Sotra Bridge study area. The Sotra Bridge study area is approximately 1 by 2 nautical miles in size, covering the 'Y-junction' between the Byfjord, Hjeltefjord and Raunefjord. As such, the area is crossed by vessels navigating between Bergen to the east, the Hjeltefjord to the north and the Raunefjord to the south. The measuring station for the external environmental data is on the Sotra Bridge, located in the centre of the study area. The AIS density plot overlay in Figure 2 show that the traffic pattern in the Sotra Bridge study area is more complex than that of the Gjøa A study area. Water depths in the Sotra Bridge study area vary depending on the distance from shore in the middle of the fairway; they are approximately 80 metres south of the bridge and 140 metres north of the bridge. Tidal currents in the area are described as not very strong (Kartverket Sjødivisjonen, 2018).

Collected data
This section introduces the type of data collected for the research in this paper. This includes AIS data providing the speeds of vessels transiting the study areas, as well as environmental data -including data on visibility, wind and waves -for the period from 27 March 2014 to 01 January 2021.

AIS data
The Norwegian national AIS network consists of both shore-and satellite-based AIS, where the shorebased AIS network consisting of about 90 base stations that monitor coastal traffic up until approximately 40 to 60 nautical miles from the coast (Norwegian Coastal Administration, 2022). The AIS data collected by the Norwegian Coastal Administration (NCA) include three types of information, namely dynamic (position, course, speed), static (identity, vessel type, dimensions) and voyage related (destination, estimated time of arrival, cargo, draught) and can be universally accessed via the NCA's Kystdatahuset service. Any data accessible here have been 'cleaned', meaning that datapoints that almost certainly are erroneous have automatically been removed (Kystdatahuset, 2022).

Figure 2. Location of study area: West of Bergen, in coastal waters of Western Norway. AIS density plot overlay (in orange) shows common shipping routes.
Even though the NCA automatically removes datapoints that most certainly are erroneous, it must be noted that since its inception, AIS data have become more accurate: Erroneous transmissions from vessels have decreased from 10 · 4% in 2004 to only 3 · 5% in 2007 (Harati- Mokhtari et al., 2007;Bailey et al., 2008;Shu et al., 2017). From the three types of information conveyed via AIS, dynamic vessel data were the most accurate, with errors in the transmission of speed over ground only making up 0 · 8% of the errors (Shu et al., 2017).
Two independent AIS datasets were collected from the Kystdatahuset service: One for the Gjøa A study area and one for the Sotra Bridge study area. The AIS dataset for the Gjøa A study area included a total of 38,820 datapoints between 27 March 2014 and 30 December 2020. The AIS dataset for the Sotra Bridge study area included a total of 187,581 datapoints between 15 March 2016 and 01 January 2021.
The AIS data were provided by the Kystdatahuset service of the NCA in a Microsoft Excel sheet, and included the following information for the timeframe in which each vessel was within the study area: Start and end time, Maritime Mobile Service Identity Number (MMSI) 2 , IMO Number 3 , ship name, ship type, gross tonnage (GT) 4 , length and draft, plus minimum, average, and maximum speed, and number of transmissions received.
The researcher scanned the dataset manually for any datapoints including missing/erroneous data, which were removed from the dataset. Furthermore, the ship type information was utilised to filter the dataset to only include cargo ships, such as bulk carriers, tankers, containerships, general cargo ships and ro-ro vessels 5 in the dataset. This resulted in the removal of other types of vessels, such as anchor handling vessels, cable layers, diving support ships, fishing vessels, dredgers and standby safety vessels, as these vessels are expected to be constrained more by the nature of their assignment than by external conditions, such as visibility. For example, an increase in visibility is not expected to result in a standby safety vessel increasing its speed while standing by next to a platform.
While most vessels had one datapoint for each time they passed the study area, this was not always the case: In some instances, a single passing would result in several datapoints being created. To prevent a skewed dataset, datapoints were merged in these instances, resulting in a dataset with a single datapoint for each unique transit of the study area. In practice, this meant that all AIS transmissions received from a vessel transiting the study area within a period of five hours were combined to give a single datapoint for the entire transit. This datapoint included information about the vessel and the average transit speed, as well as the times of when the transit started and ended. The final dataset included a total of 14,498 unique vessel transits by 3,475 unique cargo ships through the Gjøa A study area, and a total of 32,992 unique vessel transits by 1,004 unique cargo ships through the Sotra Bridge study area.

Environmental data
The Norwegian Centre for Climate Services (NCCS) provides historic data of observations and measurements from Norway's weather stations. Environmental data utilised in this study were collected at station number SN76954 (Gjøafeltet) for the Gjøa A study area and at station number SN50526 (RV555 Sotrabrua VInd) for the Sotra Bridge study area. More information on the weather stations is detailed in Table 1. Data for the following weather elements were collected in 10-minute intervals between 27 March 2014 and 31 December 2020 at both weather stations: Meteorological Optical Range (MOR) visibility 1 min 6 and mean wind speed 7 . In addition, data for significant wave height 8 were collected in 10minute intervals in the same timeframe only at station number SN76954 (Gjøafeltet), as this weather element was not recorded at station number SN50526 (RV555 Sotrabrua VInd). The final database of environmental data was made up of 354,563 datapoints collected from station number SN76954 (Gjøafeltet) and 206,733 datapoints collected from station number SN50526 (RV555 Sotrabrua VInd).

Merging of research data
As each AIS datapoint was provided with both a start and end time, it was possible to look up the average environmental conditions for each vessel transit through the study areas from the environmental dataset. This allowed for the AIS dataset and the environmental dataset to be merged into one dataset. To ensure a smooth dataset, any vessel transits for which no or faulty environmental data were available were removed from the final dataset.
The final dataset included 14,498 vessel transits with available environmental data through the Gjøa A study area, and 32,992 transits with available environmental data through the Sotra Bridge study area.

Data analysis
This section presents the data analysis of this study, intitially providing an overview of the dataset in Table 2.
Histograms representing the distribution of gross tonnage, visibility, mean wind speed, significant wave height and transit speed can be accessed in the Appendix, Figures A1-A9. It is noteworthy that the average transit speed histograms for both study areas seem to be close to normally distributed.
The analysis of the effect of environmental factors on average transit speeds will be presented by utilising visual means and statistical analysis. For the visual means, scatterplots are employed and supplemented by a red-line graph showing the average transit speeds in different environmental conditions. To achieve this, the dataset was divided into different groups based on the environmental conditions present during transit. Numerical data, including information on the total number of transits and quartiles in each environmental range, can be accessed in the Appendix, Tables A4-A8 and Figure  A10-A14. In this regard, note that the number of datapoints used to calculate the average transit speeds vary. Where average transits speeds are based on a larger sample size, greater precision can be expected. The calculated regression equations are illustrated as a dashed-green line in the scatterplots, and more detailed information on the results of the regression analyses can be accessed in the Appendix, Tables A9-A12.

Visibility
This section presents the analysis of the relationship between visibility and average transit speeds. A simple linear regression analysis, with average speed as the dependent and visibility as the independent variable, was conducted for both study areas. The significant regression equation with an R 2 value of 3 · 3% for the Gjøa A study area is provided in Equation (2), while the significant regression equation with an R 2 value of 0 · 0% for the Sotra Bridge study area is provided in Equation (3): where Y is average speed estimated in knots, and 1 is meteorological optical range measured in kilometres. The Pearson correlation coefficient was calculated to be 0 · 18 for the Gjøa A study area, and 0 · 02 for the Sotra Bridge study area (Figures 3 and 4).

Mean wind speed
This section presents the analysis of the relationship between mean wind speed and average transit speeds. A simple linear regression, with average speed as the dependent and mean wind speed as the independent variable, was conducted for both study areas. The significant regression equation with an R 2 value of 9 · 7% for the Gjøa A study area is provided in Equation (4), while the significant regression equation with an R 2 value of 0 · 0% for the Sotra Bridge study area is provided in Equation (5): where Y is average speed estimated in knots and 2 is mean wind speed measured in metres/second. The Pearson correlation coefficient was calculated to be −0 · 31 for the Gjøa A study area, and −0 · 01 for the Sotra Bridge study area (Figures 5 and 6).

Significant wave height
This section presents the analysis of the relationship between significant wave height and average transit speeds. A simple linear regression, with average speed as the dependent and significant wave height as the independent variable, was conducted only for the Gjøa A study area, as no data on significant wave height was available for the Sotra Bridge study area. The significant regression equation with an R 2   (6).
where Y is average speed estimated in knots, and 3 is significant wave height measured in metres. The Pearson correlation coefficient was calculated to be −0 · 31 (Figure 7).

Combination of different environmental factors
In addition to the simple linear regressions reported, multiple linear regressions were used to test whether the different environmental factors can be combined to predict average transit speeds through the study areas. For the Gjøa A study area, the multiple linear regression included visibility, mean wind speed and significant wave height, while the multiple linear regression for the Sotra Bridge study area only included visibility and mean wind speed. The resulting significant regression equations with an R 2 value of 13 · 1% for the Gjøa A study area, and an R 2 value of 0 · 0% for the Sotra Bridge study area are provided in Equations (7) and (8), respectively: = 12 · 13 + 0 · 04 1 − 0 · 10 2 − 0 · 33 3 (7) = 10 · 37 + 0 · 01 1 − 0 · 01 2 (8) where Y is average speed estimated in knots and 1 is meteorological optical range measured in kilometres; 2 is mean wind speed measured in metres/second; and 3 is significant wave height measured in metres. It was found that for the Gjøa A study area, all three independent variables (visibility, mean wind speed and significant wave height) significantly predicted average transit speed when presented in the same combined model. However, when presenting visibility and mean wind speed in the same combined model for the Sotra Bridge study area, only visibility was found to significantly predict average transit speed. Mean wind speed on the other hand was found to not significantly predict average transit speed.

Comparison of the Gjøa A and Sotra Bridge study areas
The average transit speeds through the Gjøa A and Sotra Bridge study areas were recorded to be 11 · 18 knots (standard deviation: 2 · 4 knots) and 10 · 50 knots (standard deviation: 2 · 4 knots), respectively, a difference of 0 · 68 knots. A two-sample t-test was performed to compare the average transit speeds through the Gjøa A and Sotra Bridge study areas. There was a significant difference in average transit speeds between the Gjøa A study area and the Sotra Bridge study area; t(47,488) = 1 · 960, p = <0 · 0001.

Results
This section briefly summarises the results from the data analysis presented in Section 4. Visibility does not have a large influence on vessel speeds. When looked at in isolation, visibility explains only 3 · 3% and virtually nothing (0 · 0%) of the variation in speed in the Gjøa A and Sotra Bridge study areas, respectively. While the significant linear regression equations were found in both areas, these regression equations predict a reduction in vessel speeds of only 0 · 08 and 0 · 01 knots for each kilometre visibility deteriorates in the Gjøa A and Sotra Bridge study areas, respectively. The graphical representation of the relationship between visibility and average transit speeds show that average transit speeds do not decrease significantly in restricted visibility.
The influence of mean wind speed on vessel speeds was vastly different in the two study areas. When considered in isolation, the mean wind speed explains 9 · 7% of the variation in speed in the Gjøa A study area, but virtually nothing (0 · 0%) in the Sotra Bridge study area. Significant linear regression equations were found in both study areas, but the magnitude of the slope coefficient differed considerably. An increase in mean wind speed of 1 metre/second is predicted to decrease transit speeds by 0 · 19 knots in the Gjøa A study area, but only 0 · 01 knots in the Sotra Bridge study area. This difference in the effect of mean wind speed on average transit speeds is also apparent in the graphical representations of the relationship between mean wind speed and average transit speeds in the two study areas. In the Gjøa A study area, an increase in mean wind speed shows a clear reduction in average transit speeds, but in the Sotra Bridge study area, the average transit speed remains virtually unchanged throughout all mean wind speed ranges. Common for both study areas is the large variation in transit speeds in the same wind conditions. For example, the scatter plot shows that transit speeds at mean wind speeds of approximately 7 metres/second were between roughly 6 and 19 knots in the Gjøa A study area, and 4 to 17 knots in the Sotra Bridge study area.
Like mean wind speed in the Gjøa A study area, significant wave height had a clear influence on average transit speeds. When looked at in isolation, significant wave height explains 9 · 5% of the variation in average transit speed. The significant linear regression equation predicts a decrease of 0 · 51 knots in average transit speed for each metre increase in significant wave height. This clear reduction in average transit speeds in higher wave conditions can also be seen on the graphical representation of the relationship between significant wave height and average transit speeds. However, it must be said that for mean wind speed, the variation in transit speeds in the same wave conditions is quite highthescatter plot shows that transit speeds at significant wave heights of approximately 3 metres were roughly between 6 and 18 knots.
When combining the different influencing variables together, visibility, wind and waves explain 13 · 1% of the variation in vessels speeds through the Gjøa A study area. For the Sotra Bridge study area, visibility and wind combined has virtually no (0 · 0%) explanatory power for the variation in vessel speeds through the area.
Finally, it was found that the average transit speed through the coastal Sotra Bridge study area was 0 · 68 knots lower than the average transit speed through the Gjøa A study area in open waters. This difference was statistically significant.

Discussion
This paper set out to explore whether vessel speed data collected from conventional ships in various external environmental conditions actually resemble safe speeds by comparing the data with accepted interpretations of what constitutes a safe speed. As was highlighted in Section 2, the COLREGS lists visibility, traffic density, manoeuvrability, blackground light and proximity of navigational hazards as well as the state of wind, sea and current as factors to be taken into account when determining safe speed. Contemporary guides and commentary to the COLREGs highlight visibility as being the most important factor when it comes to safe speed. The data analysis and results presented in Sections 4 and 5 show the average transit speeds of conventional vessels in different visibility, wind and wave conditions. More indirectly, the effect of traffic density and proximity of navigational hazards on average transit speeds can be seen in the difference of average transit speeds through the Gjøa A study area in the open ocean, with less traffic in a more structured traffic pattern, and the Sotra Bridge study area in inland waters with higher traffic in a more abstruse pattern.

Scatterplots
Various scatterplots visualising the relationship between average transit speeds and external environmental conditions were presented for both the Gjøa A and the Sotra Bridge study areas. None of these scatterplots showed a precise relationship between the factor and average transit speed through the study area. While the scatterplots for wave height and mean wind speed in the Gjøa A study area show a reduction of spread in the average transit speeds from approximately 2-20 knots in the lower ranges to 2-15 knots in the higher ranges, these ranges are still too large to be used by a MASS to indicate an acceptable safe speed range. The scatterplots for visibility in both study areas and the scatterplots for mean wind speed in the Sotra Bridge study area showed no clear pattern at all. This interpretation is supported by the calculated Pearson's correlation coefficients shown in Table 3. While a positive value of Pearson's correlation coefficient generally indicates a positive correlation between the two variables, and a negative value of Pearson's correlation coefficient generally indicates a negative correlation between the two variables, the strength of the relationship is generally judged to be nonexistent or very weak when it is below 0 · 3, and weak when between 0 · 3 and 0 · 5 (Moore et al., 2021).
While the correlation coefficients in the Gjøa A study area are low and imply very weak relationships, the correlation coefficients in the Sotra Bridge study area are virtually zero. After presenting their paper on Safe Speed for Maritime Autonomous Surface Ships at ESReL 2021, Dreyer (2021) received the feedback that the very weak relationship between visibility and speed in the Gjøa A study area may be due to the well-structured traffic pattern in the area combined with the low likelihood of a close encounter with another ship, and that this very weak relationship may be stronger in coastal waters where the traffic pattern is confused. However, the results of this paper show that in the Sotra Bridge study area -an area in coastal waters with confused traffic patterns and high likelihood of close quarter encounters with both commercial and leisure vessels -there is virtually no correlation between visibility and average transit speeds.

Regression analyses
The following two subsections discuss the results of the conducted simple and multiple linear regression analyses.

Simple linear regressions
In contrast to the ambiguous scatterplots and Pearson's correlation coefficients, significant regression equations were found for the simple linear regressions calculated for each of the environmental factors in both study areas. It must be noted, however, that the R 2 values of these regression equations are quite small, as can be seen in Table 4. R 2 is the fraction by which the variance of the errors in the model is less than the variance of the dependent variable, meaning that it indicates the percent of variance explained by the model (Nau, 2020). This means that only 3 · 3%, 9 · 5% and 9 · 7% of variation in average speed in the Gjøa A study area can be explained by the variation in visibility, wave and mean wind speed, respectively. More surprisingly, variations in visibility and mean wind speed explain 0 · 0% of the variation in average speed in the Sotra Bridge area.

Multiple linear regressions
The simple linear regressions discussed are only useful for estimating the relationship between a dependent variable and a singular explanatory variable in isolation. Multiple linear regressions on the other hand are carried out to analyse the relationship between a dependent variable and multiple explanatory variables. As average transit speed is dependent on more than just one singular factor, multiple linear regressions were calculated for both study areas. The final multiple linear regression for the Gjøa A study area was a statistically significant regression where visibility, mean wind speed and significant wave height all significantly predicted average transit speed. However, the R 2 value indicates that only 13 · 1% of the variation in average speed can be explained by the variation of these three factors. For the Sotra Bridge study area, the multiple linear regression analysis highlighted that only visibility significantly predicted average transit speeds, albeit the R 2 value indicating that literally no variation in average speed in the Sotra Bridge study area can be explained by variations in visibility or mean wind speed.
In other words, there must be other, more influential factors influencing the speeds of vessels. These could be other factors related to the goal of achieving a safe speed, but it could also be that other factors unrelated to the goal of proceeding at a safe speed have a large influence.
From research into road safety, we know that almost all drivers want to drive faster than the speed that they themselves consider to be a safe speed (Goldenbeld and van Schagen, 2007). Reasons for speeding in a road context are diverse and include -among others -temporary motives (such as being in a hurry or adapting the speed to the general traffic stream) and permanent personality characteristics (such as proneness to risk taking or general enjoyment of driving fast) (European Commission, 2018). Human perceptual skills and limitations play a role as well, with some situations making it easy to underestimate one's own driving speed. These include situations when a high speed has been maintained for a long period, as well as situations where there is little peripheral visual information (ETSC, 1995;Martens et al., 1997;Elliott et al., 2003). It is easy to find maritime examples for situations that provide little peripheral information, such as navigating in the open sea, at night or in fog.
Additionally, we have learned from Rasmussen (1997) that 'human behavior in any work system is shaped by objectives and constraints which must be respected by the actors for work performance to be successful'. The navigators setting the speed on the different vessels are not only bound by safety-related constraints, but by administrative and functional constraints, as well. The decision at which speed a vessel will proceed is therefore not only influenced by factors relating to safety, but by factors relating to efficiency and reduction of effort as well. Speed decisions made by navigators onboard a vessel can be seen as being under immense outside pressure, with standard ocean shipping contracts requiring vessels to proceed at 'utmost dispatch', and first-come, first-served berthing policies adding additional incentives for navigators to proceed at full speed (Alvarez et al., 2010).
When looking at the coefficients of the final multiple linear regression in the Gjøa A study area, we see that vessel speed is predicted to increase by 0 · 04 knots for each kilometre of increased visibility, decrease by 0 · 10 knots for each metre/second increase in mean wind speed, and decrease by 0 · 33 knots for each metre increase in significant wave height.
With the difference between what the Met Office describes as good and very poor visibility being 8 · 26 kilometres (Met Office, 2021b), this means that the regression equation predicts a vessel experiencing a deterioration of visibility from good to very poor to reduce its speed by only approximately 0 · 3 knots (0 · 04 × 8 · 26).
Likewise, a change from calm to gale force winds of 17 metres/second is predicted to decrease vessel speeds by approximately 1 · 7 knots (0 · 10 × 17), and a change from what the Met Office (Met Office, 2021b) describes as a smooth sea state of waves less than 0 · 5 metres to a very rough sea state of waves between 4 to 6 metres is predicted to decrease vessel speeds by approximately 1 · 1 knots (0 · 33 × 3 · 5).
The regression equation of the only statistically significant predictor for average transit speeds in the Sotra Bridge study area -visibility -had a coefficient which predicts an increase of 0 · 01 knots for each kilometre of increased visibility. This converts to a predicted reduction of speed of less than 0 · 1 knots (0 · 01 × 8 · 26) by a vessel experiencing a degradation of visibility from good to very poor in the Sotra Bridge study area.
To compare this data with our current understanding of safe speed, it will now be compared with a specific example from commentary related to safe speed. Cockcroft and Lameĳer (2012), whose Guide to the Collision Avoidance Rules is described as the essential reference to safe operation of all vessels at sea, provide an example on safe speed in restricted visibility from the legal case of the collision between the ships Hagen and Boulgaria. Here it was stated that a radar-equipped vessel normally capable of proceeding at 13 · 5 knots would be expected to reduce its speed to about 8 to 9 knots when proceeding in visibility of approximately 1 · 1 kilometres. Note that this expected speed reduction was stated for a vessel equipped with radar (i.e. a vessel that was not solely reliant on human senses, such as sight and hearing but could instead utilise technology to perceive its environment). This example is therefore well-suited for application to MASS, which will also rely on technology -and not on human sensesto perceive their surroundings. When comparing this expected speed reduction of 4 · 5-5 · 5 knots with the 0 · 3/0 · 1 knots expected by the regression equation of the AIS dataset, it becomes clear that the reduction of speed in reduced visibility observed in the AIS data is not nearly enough to be classified as sufficient by our current understanding of safe speed.

Average speeds in different environmental condition ranges
The graphs illustrating average transit speeds in different environmental condition ranges are markedly different in each study area. While the graphs in the Sotra Bridge study areas are virtually flat and indicate similar average transit speeds in the different environmental condition ranges, the graphs for the Gjøa A study area show changes in average speeds in different environmental conditions.
Commentary on the COLREGs states that the need to moderate speed generally applies in restricted visibility and that it is dangerous to go fast when visibility is poor. The results of this paper show that conventional ships do not behave that way. Figure 4 shows that there is no decrease in average transit speeds of vessels passing through the Sotra Bridge study area in poor visibility, and -curiously - Figure 3 shows that average transit speeds of vessels passing through the Gjøa A study area in very poor visibility conditions was higher than that of any other visibility range. Indeed, when MOR is less than 4 kilometres, average transit speeds seem to be increasing as visibility deteriorates. This might be explained by the sharply reduced mean wind speeds and significant wave heights experienced by vessels transiting the study area in low visibilities. As can be seen in the Appendix, Figure A15 and Table  A13, the average mean wind speeds and average significant wave heights for transits that occurred in visibilities of 0 to 1 kilometres were 6 · 0 metres/second and 1 · 5 metres, respectively. This a reduction of approximately 50% when compared to the average mean wind speeds and average significant wave heights of 11 · 2 metres/second and 3 · 1 metres, respectively, for transits that occurred in visibilities of 2 to 3 kilometres.
When it comes to average transit speeds in different wave and mean wind speed conditions in the Gjøa A study area, the results do not seem surprising. The data shows that average transit speeds decrease as waves get larger and winds pick up. At first glance, the sharp increase in average transit speeds at extremely high wave and wind conditions is surprising. However, the increased average transit speeds at extremely high wave and wind conditions are based on a very low number of transits and are, therefore, considered to be erratic outliers.
The same observation was not done at the Sotra Bridge study area -here average transit speeds remained stable throughout all wind ranges. A possible explanation for this may be the sheltered nature of the study area. When in the open ocean, added resistance due to waves is one of the major components that affect ship performance. The magnitude of added resistance is about 15-30% of calmwater resistance, meaning that a ship's forward speed decreases, compared to that in calm sea, when encountering waves (Seo et al., 2013). Wave development is significantly affected by not only wind speed but also fetch -the distance that wind travels over open water. As the Sotra Bridge study area is located in coastal waters sheltered from the open ocean, strong winds likely do not cause the same high waves in the Sotra Bridge study area as they would in the open Gjøa A study area. This, in turn, would mean less added resistance -and less speed reduction -for ships passing through the Sotra Bridge study area in stronger winds. However, due to the absence of wave height data for the Sotra Bridge study area, this hypothesis was not tested in this paper.

Difference in transit speed through the Gjøa A and Sotra Bridge study areas
There was a significant difference in average transit speeds between the Gjøa A and Sotra Bridge study areas. At 10 · 50 knots, the average transit speed through the Sotra Bridge study area was 0 · 68 knots lower than the 11 · 18 knots average transit speed through the Gjøa A study area.
As mentioned in the descriptions of the study areas, the Gjøa A study area is characterised by its location in open ocean, in an area of structured traffic. The Sotra Bridge area, on the other hand, is located in coastal waters, with completely encircled by shoreline. There is more traffic in this area, which is also less structured. One could, therefore, argue that of the factors to be taken into account when determining safe speed mentioned in the COLREGs, the factors of traffic density, background light at night and proximity to navigational hazards are more pronounced in the Sotra Bridge study area. These differences may explain the 6% difference in average transit speeds through the two study areas.

Conclusion
This paper investigated whether vessel speed data collected from conventional ships in various external environmental conditions actually resembles safe speeds, and can therefore be used for deep-learning purposes in MASS. This was done by comparing the data with accepted interpretations if what constitutes a safe speed.
Contemporary commentary to the COLREGs consider visibility the dominant factor to be considered when determining safe speed and acknowledge that poor visibility demands reduced vessel speeds. However, the analysis of the AIS data show that ships do not actually behave as anticipated. While the regression analyses, with speed as the dependent and visibility as the independent variable, found significant regression equations, both the coefficients and R 2 values were small to negligible. The problem of quantifying the safe speed of a vessel in different conditions, therefore, does not seem to be easily solvable by simply using historic AIS data to create a model of normalcy which a MASS can follow. The regression equations predict a speed reduction of 0 · 1 to 0 · 3 knots when visibility deteriorates from good to very poor, and the low R 2 values mean that only 0 to 3 · 3% of the variation in speed can be explained by the variation in visibility. Note that the effect of visibility on transit speeds was even less pronounced in the coastal waters study area of the Sotra Bridge, a finding that directly contradicts the expectations of some experts in the field.
While the speed reductions observed in higher wind and wave conditions in the Gjøa A study area fall into what may be expected, these speed reductions were not observed in the Sotra Bridge study area. Again, this seems to indicate that there are combination effects that are not fully understood yet.
It can, therefore, be concluded that there is a difference between the predicted changes in vessel speeds that are based on our contemporary theoretical understanding of safe speed, and the actual differences in vessel speeds observed in different environmental conditions. Contrary to contemporary understanding of safe speed, reduced visibility did not lead to significantly reduced transit speeds.
This difference may be either due to our contemporary understanding of safe speed being flawed, or because speed data taken from AIS does not represent safe speeds in all conditions. This is because the speed of vessels is not only influenced by factors relating to safety, but by factors relating to efficiency and reduction of effort as well.
The problem of quantifying the safe speed of a vessel in different conditions, therefore, does not seem to be easily solvable by simply using historic AIS data to create a model of normalcy which a MASS can follow. More research in this area is necessary to gain a deeper understanding of what a safe speed constitutes and how this knowledge can be transferred to any MASS sailing the seas in the future.

Limitations and further research
The data collected and analysed in this paper shows that vessels behave markedly differently in similar conditions. Since all vessel data collected in this study was combined for the analysis, a limitation of this research is the fact that differences between different vessel types and sizes were not considered. Further research is warranted to investigate whether vessel type and size influences vessel speeds in different environmental conditions. Furthermore, the possibility of smaller vessels choosing different paths when the weather is unfavourable should also be explored.
The analysis of the effect of wind and waves on vessels speeds conducted in this paper did not consider the relative direction of wind and waves to the vessels. Since different hazards are posed to the vessel depending on the angle in which waves interact with the vessel, further research that includes the relative wind and wave directions in the analysis is encouraged.

Funding. Norwegian Ministry of Education and Research.
A. Appendix Table A1. Qualitative description of visibility (Met Office, 2021b).

Term Meaning
Very poor Visibility less than 1,000 metres Poor Visibility between 1,000 metres and 2 nautical miles (3,704 metres) Moderate Visibility between 2 and 5 nautical miles (3,704 metres and 9,260 metres) Good Visibility more than 5 nautical miles (9,260 metres)   (Met Office, 2021b).