Hostname: page-component-77f85d65b8-2tv5m Total loading time: 0 Render date: 2026-04-18T14:20:28.336Z Has data issue: false hasContentIssue false

Language models for the analysis of and interaction with climate change documents

Published online by Cambridge University Press:  12 December 2025

Elena Volkanovska*
Affiliation:
Institute of Linguistics and Literary Studies, TU Darmstadt , Germany

Abstract

Language models (LMs) have attracted the attention of researchers from the natural language processing (NLP) and machine learning (ML) communities working in specialized domains, including climate change. NLP and ML practitioners have been making efforts to reap the benefits of LMs of various sizes, including large language models, in order to both simplify and accelerate the processing of large collections of text data, and in doing so, help climate change stakeholders to gain a better understanding of past and current climate-related developments, thereby staying on top of both ongoing changes and increasing amounts of data. This paper presents a brief history of language models and ties LMs’ beginnings to them becoming an emerging technology for analysing and interacting with texts in the specialized domain of climate change. The paper reviews existing domain-specific LMs and systems based on general-purpose large language models for analysing climate change data, with special attention being paid to the LMs’ and LM-based systems’ functionalities, intended use and audience, architecture, the data used in their development, the applied evaluation methods, and their accessibility. The paper concludes with a brief overview of potential avenues for future research vis-à-vis the advantages and disadvantages of deploying LMs and LM-based solutions in a high-stakes scenario such as climate change research. For the convenience of readers, explanations of specialized terms used in NLP and ML are provided.

Information

Type
Survey Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Classification of LMs and LM-based systems described in this survey. LMs and LM-based systems that are not given specific names by the paper authors are referred to descriptively.

Figure 1

Table 1. Performance comparison of different ClimateGPT models on climate-specific and general benchmarks (weighted averages of accuracy)

Figure 2

Table 2. ClimateBERT baselines and performance (loss / F1). Validation means that loss has been calculated for the validation dataset

Figure 3

Table 3. Paragraph classification tasks: baseline models and F1 scores versus ClimateBERTF

Figure 4

Table 4. Sentence classification tasks: baseline models and F1 scores versus fine-tuned ClimateBERTF, evaluated on the test data split

Figure 5

Table 5. Number of all-time downloads for 13 models of the ClimateBERT family, in descending order for each LM type

Figure 6

Table 6. Performance comparison of different models on sentiment analysis (F1) and fact-checking (macro F1)

Figure 7

Table 7. Model, model size, GPU type and time (in hours or days) for model training

Figure 8

Table 8. Average response accuracy score for each system on a total of 11 questions (responses to questions 1 and 2 are not available)

Figure 9

Table 9. Relative performance of question-answering (QA) systems (LLM+RAG)

Figure 10

Table A1. Summary table for families of language models

Figure 11

Table A2. Summary table for single domain-specific LMs

Figure 12

Table A3. Summary table for systems using generic LMs

Author comment: Language models for the analysis of and interaction with climate change documents — R0/PR1

Comments

Elena Volkanovska, M.A.

Corresponding Author

elena.volkanovska@tu-darmstadt.de

Institute of Linguistics and Literary Studies

Technical University of Darmstadt

Date: 22.11.2024

Attn.: Claire Monteleoni, Editor-in-Chief of Environmental Data Science

Dear Claire Monteleoni,

I would like to submit my survey article entitled: “Language Models for the Analysis and Interaction with Climate Change Documents” for consideration in the special collection entitled “Tackling Climate Change with Machine Learning” by the journal Environmental Data Science. I confirm that this work is original and has neither been published elsewhere, nor is currently under consideration for publication elsewhere. The final paragraph of this letter entails three submission notes that I am kindly asking you to consider.

In this paper, I review 22 language models developed using climate-specific text data and two systems for climate change question-answering built by using a generic large language model and external climate-relevant resources. I believe that this manuscript is appropriate for publication in the special collection by the journal Environmental Data Science, because it presents an overview of how language models and natural language processing methods are being used to process or interact with climate change texts, which is within the scope of interests of the special collection.

To my mind, an overview of existing works in this field is needed given the emerging use of language models (LMs) for text analysis and interaction with vast collections of documents in many specialised domains, one of which is climate change. This paper aims to explain the inner workings of existing LMs and LM-based systems that have been utilized to either classify texts or draw insights from climate-relevant documents, by describing the system architecture, the training approach, and the data used in LM/LM-based system development. Moreover, the paper pinpoints the intended use and audience for each LM and LM-based system, and discusses their uptake, accessibility, and transparency in terms of data provenance and carbon footprint. To the best of my knowledge, there is no existing review of LMs and LM-based systems developed for the climate change domain that provides a similar overview of these tools. Therefore, I believe the readership of the journal would benefit from a systematic review of existing works at the intersection of climate change and language models.

I am the sole author of this paper and declare no conflict of interest. I conceptualised the project, designed the methodology, conducted the analysis, and wrote the paper. Parts of the manuscripts have been read by two domain-related experts, who are mentioned in the section “Acknowledgements”. The Principal Investigator of my research group, in the context of which this manuscript was written, is also mentioned in the “Acknowledgements” section.

Submission notes:

1. Data Availability Statement: On 21. November 2024, your Editorial Office communicated with me that the manuscript needs to be resubmitted with a Data Availability Statement. To this end, I reworded the existing Data Availability Statement to make it clearer that this is a survey paper and that reviewed papers and data are referenced and available in their respective repositories.

2. GitHub repository: I created a private GitHub repository with information about the reviewed language models. To make it available to the reviewers, I had to anonymise it and include this link in the article. The link to the repository will be updated accordingly if the article is considered for publication.

3. Footer text on the first page: In the footer text, 2020 is listed as the publication date. I downloaded the latest LaTeX template files from the section Author Instruction: Preparing your Materials, Environmental Data Science’s website within Cambridge University Press (https://www.cambridge.org/core/journals/environmental-data-science/information/author-instructions/preparing-your-materials#fndtn-new.) I would be happy to follow your instructions and find a solution to this if necessary.

Yours sincerely,

Elena Volkanovska

Review: Language models for the analysis of and interaction with climate change documents — R0/PR2

Conflict of interest statement

None

Comments

Dear Authors,

I think it is a good time to publish a summary of research that aims to assess climate change documents. You do a good job of outlining the past developments and motivating the overview for their audience.

However, I think the paper could profit from both more comprehensiveness and structure. In terms of comprehensiveness, I think the paper cuts off a bit too early in the progress of papers. I think the “snapshot” idea makes a lot of sense, but it would be better if the “snapshot” was up to date to 2025. I would recommend the following papers or streams of papers:

- Classifiers: In my opinion, there are some more projects to consider. Besides the mentioned project by Mehra et al. (2022), there are two more variations of using BERT for ESG with FinBERT (https://onlinelibrary.wiley.com/doi/full/10.1111/1911-3846.12832) and the pretrained ESG-BERT (https://www.sciencedirect.com/science/article/pii/S1544612324000096) with some more applications in the direction of nature (https://arxiv.org/abs/2312.17337). Besides, ClimateFinanceBERT (https://www.nature.com/articles/s41558-022-01482-7) offers a specific method in the classification space as well.

- RAG for climate: I think one topic that is touched on in 5.1.1 and 5.1.2 is RAG for analysing climate disclosures. I would see this as the current state of the art in analysing documents. Thus, it would be good if this finds more representation. RAG was used for both QA as well as analysing entire documents. Example projects in the QA space could be ChatNetZero (https://aclanthology.org/2024.climatenlp-1.6/) or My Climate Advisor (https://aclanthology.org/2024.climatenlp-1.3/). Examples in analysing company reports would be ChatReport (https://aclanthology.org/2023.emnlp-demo.3/), more applied ones for transition plans (https://iopscience.iop.org/article/10.1088/2515-7620/ad9e88/meta), law and policy documents (https://arxiv.org/abs/2410.23902), energy transitions (https://www.sciencedirect.com/science/article/pii/S1544612324000096), or nature-related aspects (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4860331). There is also Project Gaia by the European Central Banks (https://www.bis.org/about/bisih/topics/suptech_regtech/gaia.htm) or specific resources for improving RAG for climate documents (https://aclanthology.org/2024.emnlp-main.969/ ).

- Fact-checking of climate claims with LMs: This could fall under RAG, but also be its own topic. You cited Stammbach et al. (2023), but more advanced versions could be found in LLM-based systems (https://aclanthology.org/2024.climatenlp-1.9/), even using advanced methodologies (https://www.nature.com/articles/s44168-025-00215-8). Specific claims like SDG targets have also already been looked at (https://aclanthology.org/2024.climatenlp-1.19/).

In terms of structure: In my eyes, it is ok to structure around individual projects. However, you could also combine more projects together to allow the reader to grasp an idea of what has been done in every direction. It is your design choice, but with more projects coming, it may make sense to combine, for example, all classification models or all RAG models on the topic in one section and compare them instead of describing one project after another. This could enhance the understandability and usability of the project.

In conclusion, I would recommend extending the contents and potentially developing the paper to bring all these projects into one easily understandable comparison/overview framework.

Review: Language models for the analysis of and interaction with climate change documents — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

This survey paper provides a comprehensive overview of the development and application of language models in the climate change domain. It explains how these models are used for various tasks such as text classification, question answering and text summarization. Here are the main contributions:

1- Categorization of domain-specific models (e.g ClimateBERT and ClimateGPT) and systems integrating general-purpose LLMs (e.g ChatClimate)

2- Detailed comparison of the models in terms of architecture, data sources, tasks, evaluation methods and accessibility

3- Differentiating between fine-tuned and from-scratch language models, and examination of the model performance, computational efficiency and carbon footprint.

4- A special focus on transparency, reproducibility and sustainability in model development

5- A GitHub repository for community updates to the list of language models for climate change

The paper is in line with the goals of the journal. It provides new insights into the domain-specific adaptations of language models, which is a current trend in environmental data science. It reviews data-intensive methods such as transformer-based models, instruction-tuning, RAG, etc. It would be a valuable resource for interdisciplinary researchers who want to apply language models to environmental contexts.

The paper is technically correct and scientifically sound. Any methodological errors or scientific inaccuracies were not identified.

The paper is written clearly and well-structured. Given the scope of the survey paper, its length can be considered appropriate. However, some sections (e.g model architecture and training details) could be written more concisely.

- General suggestions for improving the paper:

Further discussion on how these domain-specific language models and systems utilising generic LMs impact climate policy, adaptation, and mitigation efforts, for stronger integration with climate-specific use cases.

A comparative table summarizing the key benchmark performances per task, to enhance the structure and increase the clarity.

A section for comparison of domain-specific and general-purpose LMs, in terms of tradeoffs in performance, accessibility, and robustness

Expanded discussion of the sustainability topic. For example, whether language models optimized for carbon efficiency, such as TinyBERT ot DistilBERT, could be suitable for climate science use cases.

- Detailed suggestions:

Section 4

Simplification of the architecture and training descriptions; current details may be overwhelming for a survey. Especially in sections 4.1.1 and 4.3.1.

A small summary table, for each domain-specific model, listing the training/fine-tuning time, hardware used for training/fine-tuning, and model size (e.g., number of parameters), along with the availability of training/fine-tuning data. This helps researchers understand the cost and effort behind each model, what kind of computation resources are needed, and whether adapting or reusing a model is practical. In addition, it provides context for performance comparisons.

Summarization of the evaluation results in a tabular format with clear win/loss comparisons.

In ClimateBERT and ClimateGPT-2 sections, it is not clear what datasets were used for evaluation (not for training or fine-tuning). Adding details about the datasets, such as the size and properties of the datasets, would be considered. Are they publicly available? How were they built? etc.

Section 5

Enlarge the content of section 5, general purpose LM-based tools such as ChatClimate etc., (and also ClimateGPT section mentions RAG), particularly on their retrieval mechanisms. What were the methods and processes they used for retrieval? How did they measure the retrieval and generation performance of their systems separately?

Section 7

While the paper summarizes how human evaluations were used in some of the studies, it doesn’t discuss their limitations. Most evaluations rely on a small number of students or experts, and the scoring methods differ across studies. This makes results hard to compare. It is not mentioned whether evaluators checked if answers were based on reliable sources, or how subjectivity and bias were addressed. The paper also does not mention the lack of standardized guidelines for human evaluations in this domain. Adding a short discussion on these gaps would improve the survey and highlight the need for more consistent, transparent, and trustworthy evaluation practices, especially since these tools may be used to support real-world climate decisions.

Small typos and writing errors

Page 6 line 36: …. at three levels: the final training process, the final training process and all other …..

Page 20 line 24: … to construe an answer….

Recommendation: Language models for the analysis of and interaction with climate change documents — R0/PR4

Comments

Thank you again for submitting your manuscript to EDS! It took us quite a long time to secure two independent reviews - sorry for letting you wait for an answer for so long. But the result is very positive - both reviewers recommend only minor revisions including some additional projects and improvements to the structure and additional synthesis. Please take these comments very seriously and include as much as possible into your manuscript. I hope that you can implement these changes smoothly and are looking forward to the submission of the revised manuscript.

Decision: Language models for the analysis of and interaction with climate change documents — R0/PR5

Comments

No accompanying comment.

Author comment: Language models for the analysis of and interaction with climate change documents — R1/PR6

Comments

Elena Volkanovska, M.A.

Corresponding Author

elena.volkanovska@tu-darmstadt.de

Institute of Linguistics and Literary Studies

Technical University of Darmstadt

Date: 31.08.2025

Attn.: Claire Monteleoni, Editor-in-Chief of Environmental Data Science

Dear Prof. Claire Monteleoni,

I would like to submit the revision of my survey article titled: “Language Models for the Analysis and Interaction with Climate Change Documents” for consideration in the special collection entitled “Tackling Climate Change with Machine Learning” by the journal Environmental Data Science. I confirm that this work is original and has neither been published elsewhere, nor is currently under consideration for publication elsewhere.

In this paper, I review 22 language models developed using climate-specific text data and six systems for climate change question-answering built by using a generic large language model and external climate-relevant resources. I believe that this manuscript is appropriate for publication in the special collection by the journal Environmental Data Science, because it presents an overview of how language models and natural language processing methods are being used to process or interact with climate change texts, which is within the scope of interests of the special collection. I have followed the review suggestions given by the two reviewers closely and incorporated them in the paper.

To my mind, an overview of existing works in this field is needed in view of the emerging use of language models (LMs) for text analysis and interaction with vast collections of documents in many specialised domains, one of which is climate change. This paper aims to explain the inner workings of existing LMs and LM-based systems that have been utilised to either classify texts or draw insights from climate-relevant documents, by describing the system architecture, the training approach, and the data used in LM/LM-based system development. Moreover, the paper pinpoints the intended use and audience for each LM and LM-based system, and discusses their uptake, accessibility, and transparency in terms of data provenance and carbon footprint. It is also pointed out that there exist no standardised processed for evaluation of such systems, despite them becoming more frequent than ever.

To the best of my knowledge, there is no existing review of LMs and LM-based systems developed for the climate change domain that provides a similar overview of these tools. Therefore, I believe the readership of the journal would benefit from a systematic review of existing works at the intersection of climate change and language models.

I am the sole author of this paper and declare no conflict of interest. I conceptualised the project, designed the methodology, conducted the analysis, and wrote the paper. Parts of the manuscripts have been read by two domain-related experts, who are mentioned in the section “Acknowledgements”. The Principal Investigator of my research group, in the context of which this manuscript was written, is also mentioned in the “Acknowledgements” section.

Submission notes:

1. Data Availability Statement: On 21. November 2024, your Editorial Office communicated with me that the manuscript needs to be resubmitted with a Data Availability Statement. As the originally submitted manuscript already contained a Data Availability Statement, I reworded the Statement to make it clearer that this is a survey paper and that reviewed papers and data are referenced and available in their respective repositories.

2. GitHub repository: I created a GitHub repository with information about the reviewed language models. To make it available to the reviewers, I had to anonymise it and include this link in the article. The link to the repository will be updated accordingly if the article is considered for publication.

3. Footer text on the first page: In the footer, 2020 is listed as the publication date. I downloaded the latest LaTeX template files from the section Author Instruction: Preparing your Materials, Environmental Data Science’s website within Cambridge University Press (https://www.cambridge.org/core/journals/environmental-data-science/information/author-instructions/preparing-your-materials#fndtn-new.) I would be happy to follow your instructions and find a solution to this if necessary.

Yours sincerely,

Elena Volkanovska

Review: Language models for the analysis of and interaction with climate change documents — R1/PR7

Conflict of interest statement

Reviewer declares none.

Comments

Dear authors,

I think the revision has improved the paper. Specifically, adding a wider scope of projects is beneficial. This paper now delivers a broad overview of projects at the intersection of climate change and NLP.

While I think it would have been a good choice to group projects into topics, I acknowledge your argumentation for going with your structure.

Given that you have responded to my comments, I do not have major concerns anymore that hinder a publication.

Best,

Reviewer

Review: Language models for the analysis of and interaction with climate change documents — R1/PR8

Conflict of interest statement

Reviewer declares none.

Comments

The author has satisfactorily addressed the main concerns raised in my previous review. The revisions have improved the clarity and overall quality of the manuscript. I have no additional major comments at this point.

Recommendation: Language models for the analysis of and interaction with climate change documents — R1/PR9

Comments

Thank you for the revised version of the article. All reviewers have now recommended to accept the publication of your article, and I am happy to follow their judgment. Just a small note for future responses to reviewer comments: It would have been easier for me as an editor to grasp the extend to which you engaged with reviewer comments if you would have provided a point-by-point response to the reviewers that also cites the original comments and provides more details on what has been changed in the manuscript (e.g. citing page or line numbers and highlighting key added/edited texts). For the final publication, please make sure you double check the correct formatting and completeness of all references during the proofing stage. Please also make sure that the linked table in the Data Availability statement (https://github.com/

volkanovska/Language-models-for-climate-change-texts) is publicly accessible upon publication.

Decision: Language models for the analysis of and interaction with climate change documents — R1/PR10

Comments

No accompanying comment.