Hostname: page-component-5db58dd55d-8lnk4 Total loading time: 0 Render date: 2026-05-30T14:45:57.166Z Has data issue: false hasContentIssue false

Knowledge-driven neural models for extraction and analysis of sustainability events from the web

Published online by Cambridge University Press:  02 January 2025

Abir Naskar
Affiliation:
TCS Research, India.
Tushar Goel
Affiliation:
TCS Research, India.
Vipul Chauhan
Affiliation:
TCS Research, India.
Ishan Verma
Affiliation:
TCS Research, India.
Tirthankar Dasgupta*
Affiliation:
TCS Research, India.
Lipika Dey
Affiliation:
TCS Research, India.
*
Corresponding author: Tirthankar Dasgupta; Email: dasgupta.tirthankar@tcs.com

Abstract

Sustainability practices of a company reflect its commitments to the environment, societal good, and good governance. Institutional investors take these into account for decision-making purposes, since these factors are known to affect public opinion and thereby the stock indices of companies. Though sustainability score is usually derived from information available in self-published reports, News articles published by regulatory agencies and social media posts also contain critical information that may affect the image of a company. Language technologies have a critical role to play in the analytics process. In this paper, we present an event detection model for detecting sustainability-related incidents and violations from reports published by various monitoring and regulatory agencies. The proposed model uses a multi-tasking sequence labeling architecture that works with transformer-based document embeddings. We have created a large annotated corpus containing relevant articles published over three years (2015–2018) for training and evaluating the model. Knowledge about sustainability practices and reporting incidents using the Global Reporting Initiative (GRI) standards have been used for the above task. The proposed event detection model achieves high accuracy in detecting sustainability incidents and violations reported about an organization, as measured using cross-validation techniques. The model is thereafter applied to articles published from 2019 to 2022, and insights obtained through aggregated analysis of incidents identified from them are also presented in the paper. The proposed model is envisaged to play a significant role in sustainability monitoring by detecting organizational violations as soon as they are reported by regulatory agencies and thereby supplement the Environmental, Social, and Governance (ESG) scores issued by third-party agencies.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. A schematic diagram of a unified sustainability information analytics.

Figure 1

Figure 2. ESG incident knowledge schema illustrated along with example sentences from text containing relevant entities. Different events and entities in text are marked in bold.

Figure 2

Table 1. Sample sustainability News texts with the respective annotated entities and events. Note that all the target organization names were intentionally masked by the token [ORGName] to maintain anonymity

Figure 3

Figure 3. BERT-based multi-tasking network for ESG sequence classification and labeling.

Figure 4

Figure 4. A comprehensive running time comparison between different models.

Figure 5

Table 2. Results reporting accuracy of classifying sustainability events as awards or violations

Figure 6

Table 3. Results reporting the accuracy of sustainability event and entity extraction task

Figure 7

Table 4. Data Statistics

Figure 8

Table 5. Sample violation events picked up from News articles of different categories and are mapped across GRI Standards. Note that the target organization names were masked by the token [ORGName] to maintain anonymity.

Figure 9

Figure 5. Violation trends by volume for 28 GRI indicators (2020–2022).

Figure 10

Figure 6. Distribution of Incidents of discrimination (GRI 406) across organizational sectors for the Year 2020.

Figure 11

Figure 7. Distribution (in %) of discrimination violations across years.

Figure 12

Figure 8. Distribution of violation clusters in environmental sector.

Figure 13

Figure 9. Distribution of violation clusters in occupational health and safety sector.