Hostname: page-component-7c8c6479df-ph5wq Total loading time: 0 Render date: 2024-03-28T02:44:19.642Z Has data issue: false hasContentIssue false

Citation function, polarity and influence classification

Published online by Cambridge University Press:  09 April 2017

MYRIAM HERNÁNDEZ-ALVAREZ
Affiliation:
Escuela Politécnica Nacional, Facultad de Ingeniería de, Sistemas, Quito, Ecuador e-mail: myriam.hernandez@epn.edu.ec
JOSÉ M. GOMEZ SORIANO
Affiliation:
Dpto. de Lenguajes y, Sistemas Informáticos, Universidad de Alicante, Alicante, España e-mails: jmgomez@ua.es; patricio@dlsi.ua.es
PATRICIO MARTÍNEZ-BARCO
Affiliation:
Dpto. de Lenguajes y, Sistemas Informáticos, Universidad de Alicante, Alicante, España e-mails: jmgomez@ua.es; patricio@dlsi.ua.es

Abstract

Current methods for assessing the impact of authors and scientific media employ tools such as H-Index, Co-Citation and PageRank. These tools are primarily based on citation counting, which considers all citations to be equal. This type of methods can produce perverse incentives to publish controversial or incomplete papers, as mixed or negative reviews often generate larger citation counts and better indexes, regardless of whether the citations were critical or exerted minimal influence on the citing document. Passing citations that are employed to establish background, which do not have a real impact on the citing paper, are common in scientific literature. However, these citations have equal weight in impact evaluations. Notable researchers have emphasized the need to correct this situation by developing estimation methods that consider the different roles of quotations in citing papers. To accomplish this type of evaluation, a context citation analysis should be applied to determine the nature of the citations. We propose that citations should be categorized using four dimensions – FUNCTION, POLARITY, ASPECTS and INFLUENCE – as these dimensions provide adequate information that can be employed toward the generation of a qualitative method to measure the impact of a given publication in a citing paper. In this paper, we used interchangeably the words influence and impact. We present a method for obtaining this information using our proposed classification scheme and manually annotated corpus, which is marked with meaningful keywords and labels to help identify the characteristics or properties that constitute what we call ASPECTS. We develop a classification scheme which considers purpose definition shared by previous works. Our contribution is to abstract purpose classes from several other schemes and divide a complex structure in more manageable parts, to attain a simple system that combines low granularity dimensions but nevertheless produces a fine-grained classification. For annotators, the classification process is simple because in a first step, the coders distinguish only four primary classes, and in a second pass, they add the information contained in ASPECTS keyword and labels to obtain the more specific functions. This way, we gain a high granularity labeling that gives enough information about the citations to characterize and classify them, and we achieve this detailed coding with a straightforward process where the level of human error could be minimized.

Type
Articles
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abu-Jbara, A., Ezra, J., and Radev, D., 2013. Purpose and polarity of citation: towards NLP-based bibliometrics. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL, Atlanta, GA, pp. 596606.Google Scholar
Artstein, R., and Poesio, M., 2008. Inter-coder agreement for computational linguistics. Computational Linguistics 34 (4): 555–96.Google Scholar
Athar, A. 2014. Sentiment analysis of scientific citations. Technical Report (UCAM-CL-TR-856), University of Cambridge, Computer Laboratoy.Google Scholar
Cano, V., 1989. Citation behavior: classification, utility, and location. Journal of the American Society for Information Science 40 (4): 284–90.3.0.CO;2-Z>CrossRefGoogle Scholar
Ciancarini, P., DiAAAAIorio, A., Nuzzolese, A. G., Peroni, S., and Vitali, F. 2014. Evaluating citation functions in CiTO: Cognitive issues. In Presutti, V., Stankovic, M., Cambria, E., Cantador, I., DiAAAAIorio, A., DiAAAANoia, T., Lange, C., Recupero, D. R., and Tordai, A. (eds.), Semantic Web: Trends and Challenges, pp. 580–94. Berlin: Springer International Publishing.Google Scholar
Ciancarini, P., Di Iorio, A., Nuzzolese, A. G., Peroni, S., and Vitali, F. 2013. Semantic annotation of scholarly documents and citations. In Baldoni, M., Baroglio, C., Boella, G., and Micalizio, R. (eds.), AI*IA 2013: Advances in Artificial Intelligence, 8249: pp. 336–47. Berlin: Springer.Google Scholar
Cortes, C., and Vapnik, V. 1995. Support-vector networks. Machine Learning, 20 (3): 273–97.Google Scholar
Di Iorio, A., Nuzzolese, A. G., & Peroni, S., 2013. Characterising citations in scholarly documents: The CiTalO framework. In Extended Semantic Web Conference, Springer, Berlin, pp. 6677.Google Scholar
Dong, C., and Schäfer, U., 2011. Ensemble-style self-training on citation classification. In Proceedings of 5th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 623–31.Google Scholar
Fleiss, Joseph L. 1971. Measuring nominal scale agreement among many raters. In Psychological Bulletin, 76 (5): 378–82.Google Scholar
Garzone, M. A. 1997. Automated classification of citations using linguistic semantic grammars. Master’s Thesis. The University of Western Ontario. Available at http://www.collectionscanada.gc.ca/obj/s4/f2/dsk2/ftp04/mq28570.pdf Google Scholar
Garzone, M. and Mercer, R. E., 2000. Towards an automated citation classifier. In Advances in Artificial Intelligence, Springer, Berlin, pp. 337–46.Google Scholar
Geertzen, J. 2012. Inter-Rater Agreement with multiple raters and variables. Retrieved October 8, 2014, from https://nlp-ml.io/jg/software/ira/.Google Scholar
Herlach, G., 1978. Can retrieval of information from citation indexes be simplified? Multiple mention of a reference as a characteristic of the link between cited and citing article. Journal of the American Society for Information Science 29 (6): 308–10.Google Scholar
Hernández–Alvarez, M. and Gómez, J.M. 2015a. Survey about citation context Analysis: tasks, techniques, and resources. Natural Language Engineering. Available on CJO 2015 doi: 10.1017/S1351324915000388.CrossRefGoogle Scholar
Hernández–Alvarez, M. and Gómez, J.M., 2015b. Esquema de anotación para categorización de citas en bibliografía científica. Procesamiento del Lenguaje Natural 54: 4552.Google Scholar
Hirsch, J. E., 2005. An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences of the United States of America, United States of America 102 (46): 16569–72.Google Scholar
Hyland, K. 1998. Hedging in Scientific Research Articles, Vol. 54. Amsterdam: John Benjamins Publishing.Google Scholar
Hyland, K., 1996. Writing without conviction? Hedging in science research articles. Applied Linguistics 17: 433–54.Google Scholar
Di Iorio, A., Nuzzolese, A. G., and Peroni, S. 2013. Towards the automatic identification of the nature of citations. In García, A., Lange, C., Lord, P. and Stevens, R. (eds.), SePublica, pp. 6374. Montpellier, France: SePublica.Google Scholar
Jochim, C., and Schütze, H., 2012. Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING’12, Mumbai, India, pp. 1343–58.Google Scholar
Kataria, S., Mitra, P., and Bhatia, S., 2010. Utilizing Context in Generative Bayesian Models for Linked Corpus. In AAAI Conference on Artificial Intelligence, Atlanta, Georgia, USA, pp. 1340–45.Google Scholar
Krippendorff, K., 2004. Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research 30 (3): 411–33.Google Scholar
Landis, J. R., and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33: 159–74.Google Scholar
Li, X., He, Y., Meyers, A., and Grishman, R., 2013. Towards fine-grained citation function classification. In Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 402–7.Google Scholar
Liakata, M., Saha, S., Dobnik, S., Batchelor, C., and Rebholz-Schuhmann, D., 2012. Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28: 9911000.Google Scholar
Marder, E., Kettenmann, H., and Grillner, S., 2010. Impacting our young. Proceedings of the National Academy of Sciences of the United States of America 107: 21233.Google Scholar
McCain, K. W., and Turner, K., 1989. Citation context analysis and aging patterns of journal articles in molecular genetics. Scientometrics 17 (1): 127–63.Google Scholar
McKeown, K., Daume, H., Chaturvedi, S., Paparrizos, J., Thadani, K., Barrio, P., Biran, O., Bothe, S., Collins, M., Fleischmann, K. R., Gravano, L., Jha, R., King, B., McInerney, K., Moon, T., Neelakantan, A., O’Seaghdha, D., Radev, D., Templeton, C. and Teufel, S. 2016. Predicting the impact of scientific concepts using full-text features. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23612.Google Scholar
Mercer, R. E., Di Marco, C., and Kroon, F. W., 2004. The frequency of hedging cues in citation contexts in scientific writing. In Advances in Artificial Intelligence, Springer, Berlin, pp. 7588.Google Scholar
Meyers, A., 2013. Contrasting and corroborating citations in journal articles. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, Hissar, Bulgaria, pp. 460–66.Google Scholar
Moravcsik, M. J., and Murugesan, P., 1975. Some results on the function and quality of citations. Social Studies of Science 5 (1): 8692.Google Scholar
Page, L., Brin, S., Motwani, R., and Winograd, T. 1999. The PageRank citation ranking: bringing order to the web. Technical Report (SIDL-WP-1999-0120), Stanford InfoLab, Stanford University.Google Scholar
Prabha, C. G. 1983. Some aspects of citation behavior: a pilot study in business administration. Journal of the American Society for Information Science, 34 (3): 202–6.Google Scholar
Radicchi, F., 2012. In science “there is no bad publicity”: papers criticized in comments have high scientific impact. Nature Scientific Reports 2: 815.CrossRefGoogle ScholarPubMed
Sample, I. 2013. Nobel winner declares boycott of top science journals. The Guardian. http://www.theguardian.com/science/2013/dec/09/nobel-winner-boycott-science-journals.Google Scholar
Small, H., 1973. Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of the American Society for Information Science 24: 265–69.Google Scholar
Sollaci, L. B., and Pereira, M. G., 2004. The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey. Journal of the Medical Library Association 92 (3): 364.Google Scholar
Swales, J., 1990. Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press.Google Scholar
Teufel, S. 1999. Argumentative zoning: information extraction from scientific text. Doctoral dissertation, School of Cognitive Science, University of Edinburgh, UK. http://www.cl.cam.ac.uk/~sht25/thesis/t1.pdf.Google Scholar
Teufel, S., Siddharthan, A., and Tidhar, D., 2006. Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ACL, Stroudsburg, PA, pp. 103–10.Google Scholar
Teufel, S., Siddharthan, A., and Tidhar, D., 2009. An annotation scheme for citation function. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, ACL, Stroudsburg, PA, pp. 80–7.Google Scholar
Verlic, M., Stiglic, G., Kocbek, S., and Kokol, P. 2008. Sentiment in Science - a case study of CBMS contributions in years 2003 to 2007. In 2008 21st IEEE International Symposium on Computer-Based Medical Systems, Finland: University of Jyväskylä, pp. 138–43.Google Scholar