AIS-based near-collision database generation and analysis of real collision avoidance manoeuvres

Abstract Economic and technological development has increased the amount, density and complexity of maritime traffic, which has resulted in new challenges. One challenge is conforming to the distinct evasion manoeuvres required by vessels entering into near-collision situations (NCSs). Existing rules are vague and do not precisely dictate which, when and how collision avoidance manoeuvres (CAMs) should be executed. The automatic identification system (AIS) is widely used for vessel monitoring and traffic control. This paper presents an efficient, scalable method for processing large-scale raw AIS data using the closest point of approach (CPA) framework. NCSs are identified to create a database of historical traffic data. Important features describing CAMs are defined, estimated and analysed. Applications on a high-quality real-world data set show promising results for a subset of the identified situations. Future applications may play a significant role in the maritime regulatory framework, navigation protocol compliance evaluation, risk assessment, automatic collision avoidance, and algorithm design and testing for autonomous vessels.


Introduction
Maritime safety is of utmost importance, but real-world collision data are scarce. This prompts the researcher to ask what can be learned from situations where collision is imminent, but an adverse outcome avoided. Large-scale automatic identification system (AIS) data sets keep track of many such situations, and may be used to create a database of such near-collision situations (NCSs). This may further enable analysis and new insights for researchers and maritime safety practitioners.
With the rising importance of maritime traffic, collision avoidance has already developed as one of the most important concerns for maritime safety (Ozturk and Cicek, 2019). With the emergence of autonomous surface vessels, there is also an emerging need for developing technical standards and methods for autonomous collision avoidance. The pre-eminent role of maritime transport, as well as the continuous strive for safety improvements, has recently led to the development of several theoretical and practical approaches to ensuring higher levels of safety and efficiency in maritime navigation. These methods allow for extensions to AIS applications.
One of these directions focuses on the concept of smart navigation, such as decision support systems based on automatic radar plotting aids as presented by Ożoga and Montewka (2018), AIS-based maritime spatial planning (Le Tixerant et al., 2018) and route planning (Jeong et al., 2019). Furthermore, new methodology has recently been introduced for autonomous navigation (Naeem et al., 2016) and There exists a body of literature on NCS detection, which in particular relates to real-time applications with the goal of providing increased warning time for sailors for evasion manoeuvres at sea. One method for vessel collision-candidate detection was proposed by Chen et al. (2018), and uses a temporally discrete nonlinear velocity obstacle algorithm, modelling the vessel encounter as a process. Another method proposed by Zhang et al. (2015) describes near-miss collision detection through the use of vessel conflict ranking operators. 1 This method was further improved by Zhang et al. (2016Zhang et al. ( , 2017. The literature also encompasses methods based on simulation models, such as those described by Fang et al. (2018).
The primary drawbacks of the existing NCS detection methods is the lack of automation and the fact that they are not computationally efficient enough to be extended to large-scale data sets. We introduce a framework for handling large AIS data in a computationally feasible way, which allows both for large-scale extension and automation.
An NCS is defined by the vessel pair, which includes the position, pose and speed of the vessels. The method further indicates the risk of collision measured at each timestamp, and allows for inclusion of further characteristics in the analysis, such as vessel type, size and dimensions.
The aggregate analysis at first proceeds by filtering the identified situations in terms of severity. The observed AIS transmissions from the filtered situations form the basis for analysing various aspects of CAM patterns, as presented in Table 1. Second, CAMs must be distinguished from normal routefollowing and track-keeping actions imposed by traffic separation schemes and natural obstacles. The proposed framework uses speed and course patterns, as well as vessel position, to incorporate the closest point of approach (CPA) algorithm, as described by Sang et al. (2016). This determines an estimated time to CPA and an estimated distance at CPA for every vessel at every timestamp.
The third challenge is detecting critical timestamps during a particular NCS that correspond to when the CAM is initiated and when the NCS is resolved. The change-point detection is difficult owing to the noisy environment and the presence of other route-following actions, which lead to the vessel speed and course to seldom be in a steady state. Other aspects are used to analyse how CAMs are conducted in a practical sense in light of COLREGs rules, vessel characteristics (dimensions, type and pose), which vessel executed a manoeuvre and what manoeuvre was executed. For situations where CAMs are detected, the manoeuvres are described by the magnitude of the steady-state speed and course alteration, relative distance, speed and course at manoeuvre start, as well as the actual passing distance.
Finally, aspects are estimated for all situations in the NCS database, and presented with empirical distributions under various COLREGs conditions, for the subset of vessels in unrestricted waters. The obtained statistics are sensitive to noise in the NCS database as well as the accuracy of the aspect estimators. However, the obtained results have potential applications in future work, such as informing selection of safety limits for risk analysis, ensuring proper tuning of parameters in automatic collision avoidance rule design and as reference benchmarks for comparing and evaluating protocol compliance and safety performance of autonomous vessels.
The rest of this paper is structured as follows. Section 2 presents an overview of the high-resolution AIS data source and presents the raw data that are used in this work. Section 3 presents the suggested framework for identifying NCSs, and discusses situation filtering and manoeuvre identification, as well as outlining various CAM aspects. Section 4 presents examples of situations retrieved by the framework and descriptive statistics for a subset of the identified situations, and discusses challenges and possible applications. The paper concludes in Section 5 with relevant areas for future research and extensions.

Data
The present paper uses a high-quality traffic data-source consisting of AIS transmissions from 13 days covering the Norwegian exclusive economic zone (EEZ). AIS data dimensionality varies, but most sources provide traffic data for speed over ground (SOG), course over ground (COG), position (latitude, longitude), as well as unique vessel identifiers. The present data source additionally provides information on vessel type and dimensions. A typical AIS data set of the present quality and geographical coverage contains 18·5 million registered AIS transmissions per day, with a mean temporal resolution of 9·6 s. Rates of transmission vary with vessel speed and degree of course change, allowing for particularly high resolution when vessels are in manoeuvre. This temporal resolution further allows for detailed analysis of the aspects describing manoeuvring patterns, whereas the spatial coverage allows for analysing the manoeuvres over various traffic zones.
The data source has low rates of erroneous or missing records, at below 0·1% for SOG and COG. Approximately 6·0% of records are erroneously registered outside the area of analysis. To focus analysis on commercial vessels, records for which vessel category was not identified, or where vessels were identified as non-commercial, were not considered. In addition, tugs and pilot vessels were not considered, as such vessels by nature engage in close-quarters situations which are difficult to discern from NCSs. The ensuing typical filtered data set consists of 10 million daily records, which entails a sizeable number of vessel-to-vessel interactions to be analysed at each timestamp.
Using AIS data for analysis with uniquely identified vessels further allows for combination with other data sets, such as weather or vessel characteristics. The present paper focuses on suggesting a framework and highlighting some selected manoeuvre aspects, but the versatility of AIS data lends itself to numerous extensions.

A note on computational resources
The algorithm which identifies NCSs, as presented in Section 3, involves calculation of identifying parameters for all close-vessel interactions. Some parameters in the data allow for tuning the balance between temporal resolution and computational efficiency.
The present data source recorded approximately 90% of vessels at least once within each 20 s window, with increased resolution during manoeuvres. The NCS identification algorithm starts by observing all candidate situations once within a moving time window of 20 s, as a substitute for computing on the fullresolution data. In terms of the NCS identification algorithm, this reduces the number of observations by 50% -greatly reducing computational cost -while retaining a satisfactory temporal resolution. The decrease in temporal resolution significantly increases computational efficiency, but does not have a negative effect on the process of identification. The algorithm for pattern analysis of CAMs -based on the list of identified situations -uses the full-resolution data.
Furthermore, as NCSs by definition are confined to vessels within the range of sight of one other, computations are restricted to pairs of vessels within a 2 × 2 km 2 square. Computing distances by such a (box) metric is computationally more efficient than calculating the Euclidean or spheric distance, and as it is a filtering mechanism for further computations, it does not skew the database, as long as the threshold is set sufficiently high.
The suggested framework is implemented in Python, using the Pandas, Numpy and Numba libraries. Computations scale linearly in time and may be computed in parallel. The computations for the present paper have been run on a MacBook Pro (13-inch, 2016) with a 2 GHz Intel Core i5 processor and 8 GB memory.

Methodology
The following discusses the identification of real NCSs for an NCS database based on historical traffic data from AIS. The CPA framework is established by Sang et al. (2016) as a method for analysing the collision behaviour of two objects in motion. CPA is defined as the closest point two objects will arrive at if speed and course are unaltered. Distance between vessels at CPA (DCPA) indicates the severity of the hypothetical situation. Time to CPA (TCPA) is the remaining time for the two objects to reach CPA at constant speed and course. Negative TCPA indicates objects moving away from each other. An NCS is defined as a situation where two vessels will, in the near future, come within an unsafe distance of each other. NCSs are identified for each candidate situation by an unsafe DCPA -below a set threshold -and a limited positive TCPA.
AIS data provide positions, speed and course for a vessel at timestamp 0 , and hence provides the information where ( ( 0 ), ( 0 )) is given in decimal degrees, SOG ( 0 ) in knots and COG ( 0 ) in degrees. The velocity vector is given by where all variables are evaluated at 0 . The TCPA , ( 0 ) can be determined for a pair of close vessels ( , ) at time 0 based on their relative velocity Δ , ( 0 ) and recorded relative distance Δ 0 , ( 0 ). The TCPA , ( 0 ) is determined as − 0 for a such that the future distance Δ , ( ) is minimised, for ≥ 0 . The relative distances are calculated as where = 111,319·9 is the decimal degree-to-metre conversion rate, longitudes are curvature corrected and all expressions are evaluated at 0 . Relative velocity is calculated as This gives TCPA , as If 0 ≤ TCPA , ( 0 ) ≤ TMAX, where the present paper lets TMAX = 1,200 s, we say vessels ( , ) will be closest at = 0 + TCPA , ( 0 ) given present speed and course, and have and, further, the DCPA as Otherwise, we define DCPA , = ∞. A version of a function to determine TCPA , for two vessels may be implemented as Listing 1. A simple function to determine TCPA for two vessels. For vessels at time 0 , represented as explained in Section 2, the algorithm applies the function above to every pair of vessels using vectors instead of scalars. The function looks the same, and may be implemented as 2 Listing 2. Vectorised function to determine TCPA for vessels. where capital letters denote vectors.
The algorithm determines TCPA , and DCPA , for all vessel pairs ( , ). An NCS is identified when DCPA , is found to be below a set (unsafe) threshold. The present paper identifies situations where the minimum DCPA is lower than three times the sum of the vessel lengths, i.e.
where , are the vessel lengths. 3 The algorithm identifies this on the record level for each vessel pair. A database can be constructed in several ways. The present paper lets each NCS between a pair of vessels be identified by the record denoting the lowest DCPA , ( 0 ) for the vessels.
For every identified situation, full-resolution traffic data are retrieved from the original data set for both vessels at all timestamps preceding and succeeding the timestamp of minimal DCPA , by a sufficient time span, to further analyse the CPA and manoeuvre patterns during the NCS. Choice of time span must be sufficiently long to encompass the situation, and must be informed by the size and type of the vessels. The present paper retrieved records 60 min prior to and following the point of minimal DCPA , .
Transmission times differ, and hence traffic information must be synchronised for each pair of vessels participating in a situation to analyse CAMs. Records are synchronised by a process where each record, with timestamp 0, , is coupled with an equivalent 0, in the other vessel's timestamp vector such that | 0, , − 0, , | < 5 and 0, = arg min 0, ∈ 0 | 0, − 0, , |, where 0 is the vector of timestamps for vessel . Records which have no unique equivalent are discarded. The traffic information synchronisation is implemented as where the records are organised such that T0j is the longest timestamp vector of the two.
The synchronised time for each record is taken to be the mean timestamp of the coupled records. This introduces imprecision, but the common time scale is not used for computations. At this stage, both vessels are observed at approximately identical timestamps. The synchronised traffic data for each situation are then used to analyse the traffic patterns during CAMs. First, the timestamp when the CAM is initiated is detected as 1, , and the timestamp when the NCS is resolved is detected as , , . The evacuation time window is defined as [ 1, , , , , ] for each situation, according to the following rules.
1. If DCPA , does not pass below 10 m, the time window is initiated at the minimum DCPA , . 4 This is the normal situation where a CAM is initiated to ensure a safe passing distance by increasing the (predicted) DCPA. 2. If DCPA , passes below 10 m, the time window starts at the initial point where DCPA , passes below 10 m for which TCPA , has not turned negative. For such NCSs, the calculated values are assumed to be subject to noise, and the first crossing of 10 m is considered the global minimum. 3. If DCPA , passes below 10 m, but TCPA , is negative at this point, the time window starts when DCPA , is at its minimum while TCPA , has not turned negative. 4. If DCPA , never passes below a threshold DMAX metres (in the present paper set as three times the sum of the vessel lengths), as explained previously, the situation is discarded from the NCS database.
If the situation is confirmed as an NCS, it is taken to be resolved at , , , the first point after 1 when TCPA , turns negative. This implies that the two vessels involved in the NCS are no longer approaching one another. The time interval , , − 1, , is an important manoeuvre aspect as it determines how 'early' (as emphasised in COLREGs) CAMs were initiated, and how the linguistic variable 'early' from COLREGs is interpreted in real situations.
An observed CAM is defined for a vessel if, in a situation, the vessel alters her course or speed by a significant (observable) amount during the evacuation time window. For each situation, we observe either no vessel, one vessel or both vessels engaged in a CAM. In an ideal application, this determines which vessel should take responsibility to give-way and which vessel practised the right-of-way and remained in a stand-on state, as per COLREGs. Furthermore, for each vessel, we may observe no CAM, a course alteration, a speed alteration or both a speed and course alteration. If one of the latter three scenarios is recorded, the vessel is defined as having implemented a CAM. Each of the latter three scenarios defines a type of CAM. Situations where one or more vessels engage in a CAM are the NCSs of interest. Some speed change is also observed during course alteration manoeuvres, as shown in Figure 1. In practice, large vessels more often engage in course-change manoeuvres rather than incremental speed-change manoeuvres. Combining this assumption with Figure 1, one may deduce that a course-change CAM will often entail a detectable speed change and, hence, be registered as a combined manoeuvre. This understanding should enter into the researcher's analysis of the results.
Classification of manoeuvre type is done vessel-by-vessel, by analysing the individual patterns of changes in COG and SOG over the entire CAM time window, as shown in Figure 2. A course-change CAM is classified as such if the time derivative of the COG deviates significantly from the measurement noise inherent in the time series. The same method is used for classifying speed-change CAMs, by exchanging COG with SOG. In this regard, the threshold for 'measurement noise' is defined as 25% above the maximum absolute value of the COG time derivative for the 60 s prior to the start of the evacuation window. If the COG time derivative passes above this measurement noise threshold, the vessel is classified as having engaged in a course-change CAM. The application is illustrated for a representative vessel in Figure 2(a). The figure shows how a threshold is defined based on the pre-1 -period, and that a course-change CAM is defined for this vessel, as the within-window COG time derivative passes above this threshold. In addition to being useful for analysing the practice of the COLREGs rules, these aspects allow for the analysis of which and how many types of manoeuvres are executed to avoid collision in each situation.
This method is not without its limitations. Measurement noise is amplified when calculating derivatives, and the choice of 25% and 60 s as tuning parameters should be subject to refinement. As seen in Figure 2(b), if the time derivative is consistently and sufficiently close to zero for the duration of the situation, the measurement noise threshold may be set too low, and a CAM may incorrectly be recorded. In particular, misclassifications should be expected for vessels where the total speed change is close to zero. To correct for such misclassifications, vessels for which no observable steady-state course change is recorded are classified as not having performed a course-change CAM. For both vessels, a course-change manoeuvre is initially registered. In the first example, this is also the final classification. The grey-shaded field indicates the time window within which the noise threshold is defined. In the second example, our methodology recognises that the total registered course change is too small to indicate an actual course-change manoeuvre, and the final classification is corrected to 'No manoeuvre'. The time series is taken from the NCS example presented in Figure 7.
The steady-state course change for a vessel is defined as the difference between the mean COG for 11 observations surrounding the start of ( 1 ) and end of ( ) the evacuation time window. 5 We define the observed course change as being observationally zero if it is below 1·5 • . This roughly conforms to the lower 30th percentile of the distribution of observed steady-state course change for all vessels. The same approach is undertaken with respect to speed change, but where the threshold for what is considered observationally zero is set at below 0·15 knots of change, which conforms approximately to the lower 30th percentile of the distribution. 6 These corrections result in a change in the course-change CAM status for 6·2% of all NCS-involved vessels, and for 9·6% of all vessels with regards to the detection of a speed-change CAM. The distribution of situation types, differentiated by manoeuvre status, is presented in Figure 3. The observed steady-state change in COG and SOG are also important indicators of how 'substantial' the course and/or speed alterations are in real situations. This allows for evaluating the real interpretation of 'substantial' in terms of COLREGs compliance for each situation.
The full comprehension of CAMs further requires analysis of the conditions at which the manoeuvres are initiated and their effect on solving the NCS. Several manoeuvre aspects may be of interest. The present paper estimates the relative approach speed, the distance between vessels at 1 and passing distance, in addition to the evacuation time, and the steady-state course and speed change for different situations in aggregate. The aspects are defined as follows.
• Relative approach speed is computed as the absolute value of the relative velocity, | − |, between the two vessels in the first half of the evacuation time window, [ 1 , ] (where = ( 1 + 2 )/2). This indicates the speed with which the vessels are approaching each other.
• Vessel distance at 1 , | ( 1 ) − ( 1 )|, is computed as the actual distance between the two vessels at the start of the CAM, 1 , and it measures how close vessels are before initiating a CAM. • Passing distance, | ( ) − ( )|, is computed as the actual distance between vessels at the end of the evacuation time window, . This measures the effect of the CAM in avoiding close-quarters situations. • Evacuation time, | − 1 |, is calculated as the duration from the start of the CAM to the end of the NCS. This indicates how early the manoeuvre was initiated. • Steady-state course (respectively speed) change is calculated for each vessel as the difference between the mean COG (respectively SOG) for 11 observations around the start and end of the evacuation time window.
To analyse situations with respect to practised CAMs, all NCSs are classified based on which COLREGs rule applies for the vessels. In particular, COLREGs rules 13, 14 and 15 stipulate actions to be taken by the participating vessels in situations where a vessel is overtaking, crossing or in a head-on encounter (Ventura, 2005).
COLREGs rule 13(b) specifies that 'A vessel shall be deemed to be overtaking when coming up with another vessel from a direction more than 22·5 • abaft her beam (. . . )', whereas COLREGs rule 14(a) specifies that a head-on situation takes place 'When two power-driven vessels are meeting on reciprocal or nearly reciprocal courses so as to involve risk of collision (. . . )' and 14(b) that 'Such a situation shall be deemed to exist when a vessel sees the other ahead or nearly ahead (. . . )'. Conditions for when vessels are in a crossing situation (with actions as required by rule 15) are not specified, but it is generally understood to be situations which do not fulfil the criteria of rules 13 and 14.
To classify NCSs in light of COLREGs, it is necessary to mathematically define the terms used in the different rules. As any subsequent alteration of course or bearing should not change the class of a situation, a situation is classified based on its traffic features prior to 1 , i.e. before the vessels initiate any CAMs. Data on vessel heading are often recorded as part of AIS, but are in the present data considered to involve too many erroneous and missing records to be used as part of a robust measure. Hence, the heading of a vessel at 1 is approximated as the mean course in the observations preceding 1 , COG = mean(COG ( ))) s.t. 1 − 20 ≤ ≤ 1 (7) Taking the mean over several records makes the measure more robust against noise and contributes to the accuracy of the classification. In any encounter between two vessels and , the relative angle of approach, henceforth denoted ΔCOG, is determined as the difference, translated to the interval [0 • , 360 • ), The relative bearing between vessels and with respect to is calculated as where angle(Δ 1 , ) is the angle of the relative distance vector starting from the position of vessel and ending at the position of vessel at 1 , as defined in Equation (2). This angle is the absolute bearing and is measured with respect to the north in a clockwise direction.
Correspondingly, is the relative bearing with respect to vessel and is calculated by switching the indices and in Equation (9). If vessel is own ship and vessel is target ship, and are generally understood to be the bearing angle and contact angle, respectively, see Woerner et al. (2019).
NCSs are classified based on , and ΔCOG, calculated at 1 . Starting with rule 13(b), a vessel is overtaking vessel if has a relative bearing of more than behind the beam of vessel , equivalently 90 • + off her centreline, where = 22·5 • is clearly defined in COLREGs. Rule 14(a) specifies that two vessels are head-on if ΔCOG is in the interval 180 • ± ℎ . Here ΔCOG = 180 • indicate reciprocal courses, and ℎ is an added tolerance for 'nearly' reciprocal. The ℎ further accounts for the qualification in rule 14(c) that 'When a vessel is in any doubt as to whether such a situation exists, she shall assume that it does exist and act accordingly'.
A head-on situation might equivalently be classified using , and rule 14(b). In this case, both relative bearings should be close to 0 • , such that the vessels see each other 'ahead or nearly ahead'. The former approach is chosen in the following analysis.
Neither part of rule 14 specifies numerically the magnitude of ℎ . Court rulings indicate a convergence towards understanding 'nearly reciprocal' as meaning ℎ ∈ [5, 6]. Recent additions to the literature are however not unanimous as to how ℎ should be interpreted. Woerner et al. (2019) use ℎ = 13 • as their default tolerance angle for 'reciprocal or nearly reciprocal courses', whereas Wang et al. (2018) apply ℎ = 15 • when studying obstacle avoidance. At the upper end, Li et al. (2020) and Cho et al. (2020) apply ℎ = 22·5 • in their analysis. It is outside of the scope of this paper to conclude as to the true value of ℎ . However, the noisy properties of AIS data argue against opting for the narrowest definition. In the following, ℎ = 10 • is applied in the classification algorithm. 7 A situation is mathematically defined by the following conditions: If a situation is an NCS (where collision risk exists), we have the rule applying as

Results
This section presents some identified NCSs, and discusses characteristics of such situations, as well as strengths and flaws of the presented framework. Summary statistics for the different classifications are presented, as well as the empirical distributions and mean and median estimates for important manoeuvre aspects. The analysis is focused on situations outside of the Norwegian baseline (NB), which are not subject to fixed obstacles, traffic separation schemes, and shallow and congested waters. The sample of NCSs, as shown in Figure 4, still mostly take place in the Norwegian EEZ, which is more congested than other parts of the open sea. Application on the described data sample identified a total of 9,180 situations of which 1,055 took place outside the NB. There are 2,110 separate vessels involved in such situations. Of these, CAMs were detected in 645 situations .   Figure 5 shows a correctly classified situation where two tankers meet in an overtaking-type encounter. As shown by Figure 5(a), 5(c) and 5(f), vessel A and vessel B are on the same course, with vessel B closing in on vessel A from behind at a higher SOG. During the evacuation time window, vessel B engages in a course-change CAM. As shown by Figure 5(d), this increases the distance at CPA, and the vessels clear the NCS. TCPA becomes negative when the situation is resolved. The manoeuvre aspects presented in Figure 5(g) underline these observations. Vessels pass within one kilometre of each other when the situation is resolved. Figure 6 shows a situation where a cargo vessel (A) and an offshore vessel (B) meet under the crossing rule. Vessel A makes a course-change manoeuvre, leaves the NCS and returns to its prior course. As shown by Figure 6(d), the course alteration by B increases DCPA substantially during the CAM, while, as Figure 6(e) shows, TCPA decreases. As seen in Figure 6(g), the detected course change for vessel A is somewhat smaller than what the track plot indicated. This is likely owing to the vessel correcting back to its initial course directly after having cleared the situation. Vessel B makes a small course adjustment during the evacuation window, which is recorded by the algorithm. 8 Figure 7 depicts a situation where a cargo vessel (A) and a tanker (B) meet under head-on rules. Figure 7(b) shows that there is substantial difference in speed between the vessels. Vessel A engages in a course-change manoeuvre, increasing DCPA and decreasing TCPA. As Figure 7(d) shows, DCPA varies substantially when the vessels are still far apart. In this instance, it may be argued that both vessels engage in a manoeuvre, although the manoeuvre of vessel A is more pronounced. This likely stems from the fact that the measurement of the course change is smaller than what the track plot indicates, as the vessels correct back to their original course directly after clearing the NCS.

Manoeuvre aspects
The suggested CAM aspects encompass time, space, situational and navigational aspects. The spatial and temporal aspects may be classified as either pre-or post-manoeuvre aspects. In the following, these aspects, as well as navigational aspects, are presented for the subset of situations where a CAM was detected in an NCS outside of the NB. Figure 8 presents the empirical distribution of relative approach speed and Figure 9 presents the vessel distance at 1 , which are both pre-manoeuvre aspects, for situations governed by different COLREGs rules. The relative approach speed estimates are as expected, relative to both the COLREGs rule and vessel type, although there is some particular clustering of passenger vessels in head-on situations.
The average vessel distance at 1 is different in NCSs of different encounter types, as Figure 9 shows. Overtaking-type NCSs on average have a lower vessel distance at 1 than crossing situations, which again on average have a lower vessel distance at 1 than head-on-type situations. Figure 9 further shows that the distributions of vessel distance at 1 differ between the different vessel categories, conceivably owing to the vessel length. Figure 10 presents the empirical distributions for the passing distance and Figure 11 for the evacuation time, which are both post-manoeuvre aspects, for various NCSs. As Figure 10 shows, there are only minor differences in the mean and median when comparing crossing-type situations to overtaking and head-on situations. It is, however, evident that the distributions vary with vessel type, and that passenger vessels in particular, and to some extent fishing vessels, skew lower than cargo vessels in terms of the observed passing distance, as is expected.
Evacuation time is an aspect that indicates how early the CAM was initiated prior to the situation being resolved. In terms of the average evacuation time, Figure 11 shows that although there is little difference in the mean and median between the different overtaking and crossing situations, both statistics are substantially lower for head-on situations. For all NCS types, the distributions of evacuation time have long upper tails, which is related to the vessel speed and available space. Figures 12 and 13 present the empirical distributions of steady-state course change and steadystate speed change, respectively, which are navigational aspects of CAMs. These aspects indicate the magnitude of implemented manoeuvres, which measure how 'large' and 'substantial' (as per COLREGs) are these alterations. The aspect distributions are presented for different vessel types, and only for vessels which engage in the respective manoeuvre types. Figure 12 shows that the observed average coursechange CAM is on average equally significant for vessels in overtaking and crossing situations, which both are more significant than in head-on situations. The distribution of the magnitude of course-change CAMs has a considerable spread in overtaking and crossing situations. Figure 13 shows the distribution of the absolute value of speed change during CAMs in various NCSs. The observed average speed-change is on average larger in overtaking situations than in crossing situations, which again is larger than in head-on situations. There are some differences with respect to the tails of the distributions when disaggregating by vessel type.

Effect of pre-manoeuvre aspects on navigational outcomes
Estimating manoeuvre aspects allow for presenting the bivariate distributions of the aspects. In particular, the effect of pre-manoeuvre aspects, such as approach speed and vessel distance at 1 , on navigational aspects are of interest, as these may conceivably affect the intended actions on the bridge and, hence, also the observed aspects in the data. Figures 14 and 15 show the distribution of steady-state course and speed change, contingent on the distribution of approach speeds. Figure 14 shows that vessels undertaking a course-change CAM  undertake course changes of different magnitude contingent on the approach speed involved in the situation. Figure 15 show that the same result applies for the speed change. Figures 16 and 17 show the distribution of steady-state course and speed change, respectively, contingent on the distribution of vessel distance at 1 . The distribution of steady-state course change increases with vessel distance at 1 for all vessels undertaking a course-change manoeuvre. The same take-away message applies with regards to steady-state speed change.

Misclassification and the importance of tuning parameters
Identifying NCSs and classifying CAMs is not straightforward and, in particular, both the NCS identification algorithm and the CAM classification algorithm depend on tuning parameters.
The main tuning parameters for the CAM classification algorithm are the padding of the measurement noise threshold (in the present paper, taken at 25% above the observed pre-1 maximum observation) and the length of the window in which the pre-1 maximum observation is estimated. With regards to the NCS identification algorithm, the choice of methodology for pinpointing 1 is crucial, as is the size of the window for estimating incoming and outgoing COG and SOG.
A wrong choice with regards to the former may lead to a later misclassification of the CAM. A wrong choice with regards to the latter may lead to situations being misclassified or defined as being  under different COLREGs rules if the vessel undertakes a manoeuvre just prior to 1 , which will produce wrong estimates for the , and ΔCOG, and possibly incorrect COLREGs rule classification.
One example of a misidentification of 1 is given in Figure 18, which shows a situation where two vessels approach each other under a crossing regime (in this case, correctly classified). Vessel B undertakes a course-change manoeuvre, clears the situation and returns to its former course. Instead of correctly identifying 1 , the algorithm wrongly identifies 1 just after the CAM is concluded and instead classifies B as not having undertaken any CAM. The situation still enters the data because the algorithm misclassifies a course-following manoeuvre by vessel A as a CAM. Misclassifications such as this may be avoided by developing the algorithm for defining 1 .

Difference between observed and intended CAMs
The ideal framework for NCS and CAM analysis would allow the analyst to record the actions and intentions of the crew and captain on the bridge of every vessel. Such a framework would allow for the analysis of choices and could incorporate situational analysis. The AIS-based researcher is restricted to what may be observed from the data.  AIS data are a strong source of information, in that they are abundant and omnipresent. The proposed framework suggests algorithms which may enable the analyst to learn from these data. However, it may not inform about which actions are taken on the bridge. Furthermore, there may be other restricting ( f) vessel distance; (g) manoeuvre aspects. In this situation, the algorithm defines 1 too late in the situation, which leads the algorithm to overlook the actual manoeuvre (taken by vessel B prior to the recorded 1 ), and instead designates a course-following manoeuvre taken by vessel A as the CAM. The table gives the mis-estimated CAM aspects.
factors at play. The present framework does not take account of sailing patterns and intended destination and route.
An example of a situation where it is difficult to ascertain whether a situation is in fact an NCS is presented in Figure 19. In this situation, a cargo vessel (A) and a fishing vessel (B) approach each other under crossing rules. Vessel B adjusts its speed and course to clear the situation. Nevertheless, vessel B does not heed the direction under crossing rules of passing due starboard of the vessel A. This may be a finding in and of itself, but truly understanding the factors which drive such actions necessitates extending the framework further to incorporate weather, water depth, sailing patterns and sea lanes.

AIS NCS framework as a platform for CAM analysis and further applications
The NCS identification algorithm for creating a reference NCS database provides a useful platform for intelligent and safe maritime transport applications and analysis.
The presented results are useful with respect to the regulatory framework, to analyse how navigation rules, such as COLREGs, are practised in real situations, e.g. whether rules are followed, as well as which manoeuvres are taken by vessels involved in an NCS. The descriptive statistics allow for evaluating the interpretation of linguistic variables in navigation rules such as 'early', 'substantial', 'far', 'large' and 'safe distance'. Providing technical standards or amendments to such rules is crucially dependent on fully comprehending the degree of compliance with current rules and which actions are taken by vessels in situations governed by different sets of rules. For the application in risk analysis, the detection of risky manoeuvres can be made easier in comparison with the identified NCS database, and the evaluation of collision risk in real time can be improved using the provided statistics for common CAMs. For example, the presented statistics for evacuation time and passing distance may inform the selection of safety limits, with confidence intervals, for time to CPA and distance at CPA quantities, respectively.
The presented statistics provide useful inputs for the design of automatic collision avoidance algorithms, which are based on fixed rules designed for decision making to determine which actions should be taken, as well as when and how they should be executed in a particular situation. This has applications with regards to RCOs and safety assessment. In addition to risk evaluation, applications in performance evaluation of maritime autonomous surface vessels, which considers protocol compliance and humanrobot interactions between unmanned and manned vessels, are relevant. Both challenges require explicit descriptions of 'standard' collision avoidance protocols and the expected behaviour of manned vessels in NCSs.

Conclusion
This paper has presented a framework for identifying NCSs using widely available AIS traffic data and the CPA algorithm.
The paper proves that the framework may be implemented in a simple, scalable and computationally efficient way, and that identification of such situations and creation of an NCS database may be used to analyse the execution of CAMs.
Future work will consider further refinements necessary for implementation in real-world applications. In particular, the framework needs to be adapted for use in restricted waters to accommodate for several vessels and stationary obstacles.
By the nature of geographically identified data, the framework lends itself to a range of further extensions, such as classification by weather aspects, in particular wave height, wind and sea surface currents, as well as the presence of restricted traffic zones, natural obstacles, recommended routes and traffic separation schemes. The framework may also be extended to include more detailed vessel characteristics.