Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-06T13:14:24.787Z Has data issue: false hasContentIssue false

Natural language access point to digital metal–organic polyhedra chemistry in The World Avatar

Published online by Cambridge University Press:  11 March 2025

Simon D. Rihm
Affiliation:
Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
Dan N. Tran
Affiliation:
CARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore
Aleksandar Kondinski
Affiliation:
Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
Laura Pascazio
Affiliation:
CARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore
Fabio Saluz
Affiliation:
Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK Department of Mechanical and Process Engineering, ETH Zurich, Zurich, Switzerland
Xinhong Deng
Affiliation:
CARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore
Sebastian Mosbach
Affiliation:
Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK CARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore CMCL, Cambridge, UK
Jethro Akroyd
Affiliation:
Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK CARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore CMCL, Cambridge, UK
Markus Kraft*
Affiliation:
Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK CARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore CMCL, Cambridge, UK
*
Corresponding author: Markus Kraft; Email: mk306@cam.ac.uk

Abstract

Metal–organic polyhedra (MOPs) are discrete, porous metal–organic assemblies known for their wide-ranging applications in separation, drug delivery, and catalysis. As part of The World Avatar (TWA) project—a universal and interoperable knowledge model—we have previously systematized known MOPs and expanded the explorable MOP space with novel targets. Although these data are available via a complex query language, a more user-friendly interface is desirable to enhance accessibility. To address a similar challenge in other chemistry domains, the natural language question-answering system “Marie” has been developed; however, its scalability is limited due to its reliance on supervised fine-tuning, which hinders its adaptability to new knowledge domains. In this article, we introduce an enhanced database of MOPs and a first-of-its-kind question-answering system tailored for MOP chemistry. By augmenting TWA’s MOP database with geometry data, we enable the visualization of not just empirically verified MOP structures but also machine-predicted ones. In addition, we renovated Marie’s semantic parser to adopt in-context few-shot learning, allowing seamless interaction with TWA’s extensive MOP repository. These advancements significantly improve the accessibility and versatility of TWA, marking an important step toward accelerating and automating the development of reticular materials with the aid of digital assistants.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Illustration of TWA’s digital infrastructure that enables the retrieval of structured and validated MOP data via natural language requests.

Figure 1

Figure 2. Illustration of the terminological component (TBox) of the MOP chemistry domain in TWA and its related ontologies. Core concepts are shown in bold.

Figure 2

Figure 3. Architecture of “Marie,” comprising one offline indexing stage and three online stages, namely input rewriting, semantic parsing, and response generation.

Figure 3

Figure 4. Processing steps to respond to a natural language question in the MOP chemistry domain as implemented in Marie. These steps are displayed on the Marie page and can be retraced for every question.

Figure 4

Figure 5. Example of a multilayered response by Marie, combining a natural language summary of data retrieved from the knowledge graph with 3D visualization of chemical structures.

Figure 5

Figure 6. Example of a conversation with Marie via chained questions.

Supplementary material: File

Rihm et al. supplementary material

Rihm et al. supplementary material
Download Rihm et al. supplementary material(File)
File 305.2 KB
Submit a response

Comments

No Comments have been published for this article.