Abstract
In the data-driven discovery of high-performance thermoelectric (TE) materials, the lack of high-quality data remains a key bottleneck. Addressing this issue, we introduce the Systematically Verified Thermoelectric (sysTEm) dataset. Leveraging the physical relationships between transport properties, we curated and validated over 8,400 experimental data points, spanning more than 1,400 unique TE materials and 70 elements. Each entry includes the composition, temperature, and up to seven key transport properties: the figure of merit (zT), Seebeck coefficient (S), electrical conductivity (σ), power factor (PF), total (κ), electronic (κe) and lattice thermal conductivities ( κl). The dataset is formatted as a data table, with little preprocessing needed for machine learning. Initial analysis shows that doped materials, defined via a compositional threshold, exhibit a larger number of high-performing zT outliers compared to undoped materials, highlighting that doping can improve TE performance. Overall, the publicly available sysTEm dataset and its accompanying code is intended to accelerate data-driven TE research and benchmarking.
Supplementary weblinks
Title
sysTEm Dataset and Accompanying Code
Description
Link to the mentioned dataset and code used to clean and format the data. Code used to generate the figures in this work are shared as well.
Actions
View 


![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)