Neural Network Analysis of Dynamic Fracture in a Layered Material

Dynamic fracture of a two-dimensional MoWSe 2 membrane is studied with molecular dynamics (MD) simulation. The system consists of a random distribution of WSe 2 patches in a pre-cracked matrix of MoSe 2 . Under strain, the system shows toughening due to crack branching, crack closure and strain-induced structural phase transformation from 2H to 1T crystal structures. Different structures generated during MD simulation are analyzed using a three-layer, feed-forward neural network (NN) model. A training data set of 36,000 atoms is created where each atom is represented by a 50-dimension feature vector consisting of radial and angular symmetry functions. Hyper parameters of the symmetry functions and network architecture are tuned to minimize model complexity with high predictive power using feature learning, which shows an increase in model accuracy from 67% to 95%. The NN model classifies each atom in one of the six phases which are either as transition metal or chalcogen atoms in 2H phase, 1T phase and defects. Further t-SNE analyses of learned representation of these phases in the hidden layers of the NN model show that separation of all phases become clearer in the third layer than in layers 1 and 2.


INTRODUCTION
Molecular dynamics (MD) simulation of various physical and chemical phenomena of materials requires complex data analysis of the simulation results so as to identify different phases, chemical reaction and defects generated during the simulation. For example, during nanoindentation simulation, plastic deformation occurs inside the material due to dislocation nucleation. [1] Similarly, under stress, materials like SiC, AlN show phase transformation. [2,3] Identification of different phases and defects generated during MD simulation requires complex structural analysis. These analyses range from calculation of nearest neighbours to shortest circuit analysis of atoms. In general, there is no single evaluation technique that can be applied to all simulation data since these analyses are system dependent. However, we can observe that structures generated in MD simulations are complex functional forms of their local environment. Hence, we can create a machine learning (ML) model that can automatically learn this functional form directly from the simulation data.
Recently ML methods in the material science domain have shown phenomenal success for high-throughput screening and property prediction. [4][5][6][7][8] For example, statistical models built using a smaller fraction of the crystal structures (called training data) can accurately predict a wide range of material properties like band gap, [9,10] dielectric breakdown strength [11,12] and melting point [10] for remaining crystal structure without doing any quantum molecular dynamics (QMD) simulation for them. ML methods like support vector regression [9,10] , neural network [13] and kernel ridge regression [14] are shown to accurately predict the bandgap of double perovskites, polymers [15,16] and defects in FCC and disordered solid. [17] Neural network and gaussian processes model (GP) are also used to create data driven force field for materials, where parameters of the model are learned using the QMD simulation trajectory. [18][19][20] In this work, we have studied mode 1 fracture of MoWSe 2 monolayer using MD simulation. At the onset of crack propagation, the system shows strain-induced phase transformation. To identify these phases, we have trained a three-layer feed-forward neural network (NN) model. The training data set consists of 36,000 atomic data examples, where each atom is represented by a 50-dimension feature vector consisting of radial and angular symmetry functions. The model classifies each atom in one of the six phases, which are either transition metal or chalcogen atoms in 2H phase, 1T phase and defects.
Traditional approach for structural analysis like common neighborhood analysis and centro-symmetry parameter calculation only work for mono-atomic systems like FCC, BCC and HCP crystals, do not distinguish 2H and 1T crystalline phases in transition-metal dichalcogenide layers considered here. Similarly, both 2H and 1T structures in our simulation have the same number of nearest neighbors (nn) -each Mo atoms has 6 Se nn and Se atom as 3 Mo nn -due to which nearest-number analysis is not able to distinguish these structures either. Compared to these traditional approaches, our NN model is able to define an unique order parameter that identifies these phases with high accuracy, is highly scalable and can used to analyse large data set quickly.

A. Mode1 fracture of MoWSe 2 hetero-structure
Molecular dynamics (MD) simulations are performed to study mode 1 fracture of MoSe 2 monolayer consisting of a pre-crack and a single triangular patch of WSe 2 in front of the pre-crack. The system is first relaxed at 100 K and then subjected to tensile force perpendicular to the crack front. The system fractures at a strain of 2.16 %, which matches with the experimental value [21] At the onset of crack propagation, we observe strain induced 2H to 1T phase transformation as shown in figure 1. [21] The transformed 1T region also contains defects, which is shown in green colour. Movie S1 in supplementary material shows this phase transformation during crack propagation MoWSe 2 monolayer.

B. Feature Vector for Neural Network model for structural analysis
To build a NN model, we need to define a mathematical representation for each atom (feature vector) that captures its local environment. There are many choices for feature vector such as electronegativity (EN), ionization potential (IP), atomic displacement, atomic mass, local stress, radial and angular symmetry functions. In general, the choice of correct feature vector is highly problem specific. For example, EN, IP and atomic mass are suitable for build regression models for band gap for different materials, [9,10] whereas radial and annular symmetry functions are suitable to create NN and GP force fields. [18][19][20] To build a NN model for structural analysis, we have represented each atom by 436-dimension feature vector, which consists of combinations of radial and angular symmetry functions. The equations for radial and angular feature are shown in equations 1 and 2 respectively.
In equations 1-3, η 1 , η 2 ζ, λ and R cut are tunable parameters, which are also known as hyperparameters of the model. For each atom i, contribution due to all its N neighbors that are within the cutoff distance R cut (8Å) is computed for a specific value of radial (η 1 ) and angular (η 2 , ζ, λ) features, respectively. Effect of neighbor atoms that are further away from the central atom is

C. Neural Network model for structural analysis
A neural network model with three hidden layers is trained to identify the different phases of MoWSe 2 monolayer (see figure 2). Output layer of the NN model consists of 6 classes, which are Mo in 2H phase, Mo in 1T phase, Mo as defect, Se in 1H phase, Se in 1T phase and Se as defect. Here, input to NN model is the feature vector of the atoms, and the output is their probabilities in these six classes, and we choses the class with maximum probability as the phase label for the atoms. The first, second and third hidden layer has 350, 100 and 50 hidden units. In the first and second hidden layer, we have used Relu   Figure 3 shows the learned representation of the data set by the first and third layer of the neural network and the original feature vector using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). [22] Both PCA and t-SNE are dimensionality reduction technique and represents high-dimensional data in lower dimension. However, PCA preserves the global information like the distance between different clusters, nearest neighbor of clusters but in doing so loses the local variation among data. For example, in figure 3a-b, we observe that PCA is able to distinguish the two major phases, which are transition metal (Mo, W) and chalcogen atoms (Se) but it fails to capture the local variation in their structural representation, which are 2H crystal, 1T crystal and defects. The separation between 2H and 1T crystal structures become clearer in the third hidden layer as it contains four clusters compared to hidden layer 1,2 and the original feature space who contains only two clusters. Here, the axis of figure 3a-c corresponds to the largest two eigenvalues of the feature vector of the training data set. Figure 3d-f shows the t-SNE representation of training data set. We observe that separation between all the six phases become clearer in the third hidden layer as compared to in hidden layer 1 and the original feature space. In the original feature vector representation of atoms, each atom is represented as some combination of radial and angular symmetry functions, which does not consider the non-linear interaction between these symmetry functions. Due to the absence of this nonlinear interaction among symmetry functions, there is no clear separation between all phases in the original feature vector (see figure 3d). Hidden layer 1 considers this nonlinear interaction between the symmetry function, as result of which it is able to understand the representation of 2H and 1T crystal structure (figure 3e). However, it is still not able to distinguish between defects and 1T crystal structure. Addition of more layers alleviate this problem, and the third layer of the NN model is able to distinguish all the six phases (figure 3f).

Neural Network training and Feature learning
The training accuracy of the model after 200 epoch is 93%, where each atom is represented by a 436-dimension vector. Since the hyper-parameters of the symmetry function are randomly chosen, many attributes of the feature vector will be highly correlated or irrelevant. Removing these irrelevant features will not only make the computation faster but also it will make the model more compact and increase its interpretability.
For multi-layer neural network, relevance of each feature can be computed by calculating its gradient with respect to the input feature. Figure 4a-b shows the gradient of the 10 features whose gradient either changed the most (figure 4a) or the least (figure 4b). We have re-trained another model using only a 50-dimension feature, which consists of only the most significant features with maximum gradient change or the least significant ones. The training accuracy of these two models is shown in figure 4c-d, where we can see that model constructed using only the 50 most significant features have an accuracy of 91% which is just 2% less than model accuracy created using entire feature vector. On the other hand, model constructed using less significant features have only a training accuracy of 58%.  Figure 5a visualizes the fracture data set where structure type for each atom is predicted by the NN model. We observe that it classifies the 1T and 2H crystal structure accurately; however, it mislabels some of the defects as 1T and vice versa. As we mentioned earlier, in the t-SNE visualization of the hidden layer (figure 3f), clusters of 2H and 1T crystal structures are completely separated whereas the boundary between the 1T phase and defects is not clear. Namely, for some atoms, their local neighbor environment is similar in 1T phase and defects, which causes this misclassification. For example, for some defects Mo-Se-Mo bond angle (BA) does not deviate much from the mean BA value of crystals as Mo-Se bond length is slightly above bond length of crystals, which is 2.9 Å. Since, in this model a cutoff distance of 8 Å is used for symmetry function calculation, and hence damping factor will be very small for these defects. This creates noise or ambiguity in the angular symmetry function value as it will exactly same in the 1T phase and the defects. Figure 5b visualizes the same fracture data but using a reduced cutoff distance (3 Å) for angular symmetry function. This tuning of the cutoff distance increased the model accuracy up to 95% as we can observe that misclassification between 1T phase and defects has significantly been reduced.

CONCLUSIONS
In summary, we observed that neural network-based analysis code can automatically learn the relevant information from the simulation data. Multiple hidden layers help better separate the phases of MoWSe 2 compared to shallow network. A careful tuning of various model hyperparameters (e.g. number of radial and angular symmetry functions and cutoff distance for symmetry-function calculation) increases the model performance up to 95% while keeping the model complexity very small. The analysis of the learned features of the network gives us further information about the parameters of the symmetry functions that are important to separate different phases.