## Universal fragment descriptor predicts materials properties

Luck as a means for scientific discovery is highly inefficient given that the possible number of materials is estimated to be around 10,100. Also, enormous piles of data are currently stored in vast repositories with no meaningful connections to each other. A group of researchers from The University of North Carolina at Chapel Hill (UNC) and Duke University has taken a significant step toward realizing a knowledge-based structure–property relationship that can predict properties given a few fundamental parameters of a material. They do this by applying machine learning techniques to such data, as reported in a recent issue of *Nature Communications*.

At the heart of their approach is what they call a “universal fragment descriptor,” which is essentially a labeled graphical representation of the unit cell of an inorganic material. For a given crystal, all the nearest neighbors are identified and a graph is constructed with atoms as the nodes and the bonds as edges. This infinite graph is then broken down to the simplest fragments that capture the local topology in a matrix. Combined with several chemical and physical properties of each atom, these graphs form property-labeled materials fragments (PLMFs) or a “colored graph” in graph theory terminology. The schematic for this construction is given in the Figure.

“Methodologically, we could apply this technique to any material, even amorphous solids,” Olexandr Isayev of UNC, one of the researchers, told *MRS Bulletin*. “So far we are cautious to limit this method to stoichiometric materials. We are working now to extend this to include vacancies and doping, for example.”

To test the predictive power of this new approach, the researchers used rigorous fivefold cross-validation as well as perspective prediction conformation with density functional theory calculations and experiments. In cross-validation, a data set is randomly partitioned into five groups. Four of these groups are subjected to machine learning techniques to learn the best relationship or a “rule” that will predict a set of properties (e.g., Debye temperature, metal/nonmetal, and specific heat) from the corresponding fingerprint. This rule is tested on the fifth group to see how well the predictions match the observations. By rotating the five groups, all data get to be in the training set and the testing set.

PLMF was tested on eight predictive models: a binary classification model to predict if a material is metal or nonmetal and seven regression models that predict the band energy, bulk and shear modulus, Debye temperature, heat capacities, and coefficient of thermal expansion. What distinguishes the team’s work from other machine learning approaches is the high accuracy of these predictions. For example, the values for bulk and shear modulus are 99% accurate for the data set. The metal/nonmetal classification is 86% accurate for a sample set of 26,674 materials. In other words, only 3621 materials were misclassified in this case.

Keith Butler of the University of Bath complimented the team on the novel fingerprint technique: “The work of Isayev et al. presents one of the most convincing approaches for turning the structure and composition of a crystal into a form that is sensible to a machine learning algorithm. It’s a big advance and it opens the door for future applications of machine learning for materials design.”

Johannes Hachmann of the University at Buffalo, The State University of New York concurs: “One of the key challenges to making machine learning a viable proposition in this application domain is the availability of a suitable numerical representation of compounds in materials space. The proposed universal fragment descriptors offer an exciting new direction on this issue and promise to support easier interpretation of the resulting models and rational design based on these insights.”

The research team plans to continue developing better algorithms and perhaps a unified model that will one day work for any materials system.

Originally published in the August 2017 issue of *MRS Bulletin.*