Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-10T09:03:18.093Z Has data issue: false hasContentIssue false

New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships

Published online by Cambridge University Press:  01 April 2016

Anubhav Jain*
Affiliation:
Energy and Environmental Technologies Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
Geoffroy Hautier
Affiliation:
Institute of Condensed Matter and Nanosciences (IMCN), Université catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
Shyue Ping Ong
Affiliation:
Department of NanoEngineering, University of California San Diego, La Jolla, California 92093, USA
Kristin Persson
Affiliation:
Energy and Environmental Technologies Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; and Materials Science and Engineering, University of California Berkeley, Berkeley, California 94720, USA
*
a) Address all correspondence to this author. e-mail: ajain@lbl.edu

Abstract

Data mining has revolutionized sectors as diverse as pharmaceutical drug discovery, finance, medicine, and marketing, and has the potential to similarly advance materials science. In this paper, we describe advances in simulation-based materials databases, open-source software tools, and machine learning algorithms that are converging to create new opportunities for materials informatics. We discuss the data mining techniques of exploratory data analysis, clustering, linear models, kernel ridge regression, tree-based regression, and recommendation engines. We present these techniques in the context of several materials application areas, including compound prediction, Li-ion battery design, piezoelectric materials, photocatalysts, and thermoelectric materials. Finally, we demonstrate how new data and tools are making it easier and more accessible than ever to perform data mining through a new analysis that learns trends in the valence and conduction band character of compounds in the Materials Project database using data on over 2500 compounds.

Information

Type
Articles
Copyright
Copyright © Materials Research Society 2016 
Figure 0

FIG. 1. An example of a structure map for the A1 B1 composition. Each symbol indicates a specific crystal structure prototype. The axis refers to a “chemical scale” attributing a number to each element based on its position in the periodic table (Mendeleev number). Image from Ref. 20. ©IOP publishing, all rights reserved. Reprinted with permission.

Figure 1

FIG. 2. An example of the URL structure of the MAPI. Figure reprinted from Ref. 59 with permission from Elsevier.

Figure 2

FIG. 3. Predicted versus experimental melting temperature over a data set of 248 compounds for two models: one including DFT descriptors and one without. Image from Ref. 73. Reprinted with permission, ©American Physical Society.

Figure 3

FIG. 4. Temperature of O2 release versus voltage for a large set of cathode materials. Higher temperatures are associated with greater “safety” of the cathode material. A clear correlation between higher voltage and lower temperatures for releasing oxygen gas is observed. The figure on the right depicts a linear least-squares regression fit to the data for different chemistries (oxides, sulfates, borates, etc.). While all cathode materials have a similar tendency to be less safe for higher voltage, there is a clear difference between different (poly)anions. Image from Ref. 77. Reproduced by permission of the PCCP Owner Societies.

Figure 4

FIG. 5. Elements forming stable oxide perovskites in the A and B sites; the gap is represented by the color, and elements are ordered by size of gap and grouped by similarity in gap. The dendrogram trees for A and B sites are plotted at the right and top of the image, respectively. Image from Ref. 83. ©IOP publishing, all rights reserved. Reprinted with permission.

Figure 5

FIG. 6. Predicted versus experimental Curie temperature for a series of piezoelectric materials. Blue points are the training set, green triangle the test set and red cross predicted new piezoelectrics. Image from Ref. 95. ©The Royal Society, reprinted with permission.

Figure 6

FIG. 7. Results from a decision tree algorithm on a data set of 75 half heusler compounds; labels above the arrows represent decisions, and nodes are divided into number of compounds and fraction of compounds remaining that possess high ZT (the thermoelectric figure-of-merit). The decision tree highlights the most important factors leading to high ZT materials for two different operating temperatures (300 and 1000 K). Image from Ref. 103. ©Wiley-VCH Verlag GmbH & Co.,reprinted with permission.

Figure 7

FIG. 8. Map of the pair correlation for two ions to substitute within the ICSD database. Positive values indicate a tendency to substitute, whereas negative values indicate a tendency to not substitute. The symmetry of the pair correlation (gab = gba) is reflected in the symmetry of the matrix. Image from Ref. 66. ©The American Chemical Society, reprinted with permission.

Figure 8

FIG. 9. Example of projected DOS used in the study (MoO3, entry mp-18856 in the Materials Project38). The states near the valence band are dominated by O2−:p, whereas those near the conduction band are dominated by Mo6+:d. Over 2500 such materials are used to assess statistics on valence and conduction band character in this study.

Figure 9

FIG. 10. Pairwise probability for the ionic orbital listed to the left to have a greater contribution to a band edge then the ionic orbital listed at the bottom. The VBM edge data are listed to the left, and the CBM data to the right. Only a selected set of ionic orbitals are depicted; the full data set is in the Supplementary Information.

Supplementary material: File

Jain supplementary material

Jain supplementary material 1

Download Jain supplementary material(File)
File 181.5 KB