Hostname: page-component-77f85d65b8-6bnxx Total loading time: 0 Render date: 2026-04-19T19:59:11.557Z Has data issue: false hasContentIssue false

A new approach to impact case study analytics

Published online by Cambridge University Press:  28 September 2022

Jiajie Zhang
Affiliation:
School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom
Paul Watson*
Affiliation:
School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom The Alan Turing Institute, London, United Kingdom
Barry Hodgson
Affiliation:
School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom National Innovation Centre for Data, Newcastle upon Tyne, United Kingdom
*
*Corresponding author. E-mail: paul.watson@newcastle.ac.uk

Abstract

The 2014 Research Excellence Framework (REF) assessed the quality of university research in the UK. 20% of the assessment was allocated according to peer review of the impact of research, reflecting the growing importance of impact in UK government policy. Beyond academia, impact is defined as a change or benefit to the economy, society, culture, public policy or services, health, the environment, or quality of life. Each institution submitted a set of four-page impact case studies. These are predominantly free-form descriptions and evidences of the impact of study. Numerous analyses of these case studies have been conducted, but they have utilised either qualitative methods or primary forms of text searching. These approaches have limitations, including the time required to manually analyse the data and the frequently inferior quality of the answers provided by applying computational analysis to unstructured, context-less free text data. This paper describes a new system to address these problems. At its core is a structured, queryable representation of the case study data. We describe the ontology design used to structure the information and how semantic web related technologies are used to store and query the data. Experiments show that this gives two significant advantages over existing techniques: improved accuracy in question answering and the capability to answer a broader range of questions, by integrating data from external sources. Then we investigate whether machine learning can predict each case study’s grade using this structured representation. The results provide accurate predictions for computer science impact case studies.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Figure 1. Example of an impact case study’s summary of impact section.

Figure 1

Figure 2. The research impact ontology. Purple nodes relate to organizations, cyan to activities, and green to research outputs. The list of nodes on the left is subclasses of activities and research outputs.

Figure 2

Figure 3. Funding class (id: 21791).

Figure 3

Figure 4. Collaboration class (id: 21791).

Figure 4

Figure 5. An example of the graph representation of an Impact Case Study (id: 21791).

Figure 5

Table 1. How was IBM involved in the case studies?

Figure 6

Table 2. Impact case study research funded by EPSRC

Figure 7

Table 3. Total EPSRC funding

Figure 8

Table 4. Open source applications

Figure 9

Table 5. Spin-off companies included in case studies

Figure 10

Table 6. Companies that acquired spin-offs

Figure 11

Table 7. Countries in which companies included in impact case studies are based, as measured by the count of unique company entities in “Unnderpinning research,” “Details of the impact,” and “Sources to corroborate the impact” found in Sections 2, 4, and 5 of the case studies

Figure 12

Figure 6. The global reach of industrial impacts arising from research undertaken in UoA-11.

Figure 13

Figure 7. The impact of using different variance thresholds to predict scores.

Figure 14

Figure 8. Grade distribution of different parts of the impact case studies.

Figure 15

Table 8. Unit of assessment features

Figure 16

Table 9. Case study features

Figure 17

Figure 9. Feature selection using the deterministic wrappers method.

Figure 18

Figure 10. Heatmap visualization for selected features.

Figure 19

Table 10. The confusion matrix of classification

Figure 20

Table 11. Results for the full and low variance profiles using only UoA features

Figure 21

Figure 11. The learning curve of three different feature sets using the case studies within the low variance threshold.

Figure 22

Table 12. XGBoost and SVM’s classification performance (Precision Recall and F1-Score) with 5 and 9 classes. The bold numbers indicate the best result for each feature set in weighted and macro average.

Figure 23

Table 13. XGBoost and SVM regression performance. The bold numbers indicate the best result for each feature set.

Figure 24

Figure 12. Classification report using XGBoost with three different features.

Figure 25

Table 14. XGBoost AUC and Hamming loss with 5 and 9 classes. The bold numbers indicate the best result for each feature set.

Figure 26

Figure 13. Validation curves of classifiers using UoA and all features in 9-classes classification.

Figure 27

Figure 14. Tree visualization of the decision tree for grading case studies.

Figure 28

Figure 15. Feature importance (Weight, coverage, and total gain).

Figure 29

Figure 16. SHAP force plot.

Figure 30

Figure 17. SHAP summary plot.

Supplementary material: PDF

Zhang et al. supplementary material

Appendix A

Download Zhang et al. supplementary material(PDF)
PDF 135.7 KB
Submit a response

Comments

No Comments have been published for this article.