Skip to main content

Citation function, polarity and influence classification


Current methods for assessing the impact of authors and scientific media employ tools such as H-Index, Co-Citation and PageRank. These tools are primarily based on citation counting, which considers all citations to be equal. This type of methods can produce perverse incentives to publish controversial or incomplete papers, as mixed or negative reviews often generate larger citation counts and better indexes, regardless of whether the citations were critical or exerted minimal influence on the citing document. Passing citations that are employed to establish background, which do not have a real impact on the citing paper, are common in scientific literature. However, these citations have equal weight in impact evaluations. Notable researchers have emphasized the need to correct this situation by developing estimation methods that consider the different roles of quotations in citing papers. To accomplish this type of evaluation, a context citation analysis should be applied to determine the nature of the citations. We propose that citations should be categorized using four dimensions – FUNCTION, POLARITY, ASPECTS and INFLUENCE – as these dimensions provide adequate information that can be employed toward the generation of a qualitative method to measure the impact of a given publication in a citing paper. In this paper, we used interchangeably the words influence and impact. We present a method for obtaining this information using our proposed classification scheme and manually annotated corpus, which is marked with meaningful keywords and labels to help identify the characteristics or properties that constitute what we call ASPECTS. We develop a classification scheme which considers purpose definition shared by previous works. Our contribution is to abstract purpose classes from several other schemes and divide a complex structure in more manageable parts, to attain a simple system that combines low granularity dimensions but nevertheless produces a fine-grained classification. For annotators, the classification process is simple because in a first step, the coders distinguish only four primary classes, and in a second pass, they add the information contained in ASPECTS keyword and labels to obtain the more specific functions. This way, we gain a high granularity labeling that gives enough information about the citations to characterize and classify them, and we achieve this detailed coding with a straightforward process where the level of human error could be minimized.

Hide All
Abu-Jbara A., Ezra J., and Radev D., 2013. Purpose and polarity of citation: towards NLP-based bibliometrics. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL, Atlanta, GA, pp. 596606.
Artstein R., and Poesio M., 2008. Inter-coder agreement for computational linguistics. Computational Linguistics 34 (4): 555–96.
Athar A. 2014. Sentiment analysis of scientific citations. Technical Report (UCAM-CL-TR-856), University of Cambridge, Computer Laboratoy.
Cano V., 1989. Citation behavior: classification, utility, and location. Journal of the American Society for Information Science 40 (4): 284–90.
Ciancarini P., DiAAAAIorio A., Nuzzolese A. G., Peroni S., and Vitali F. 2014. Evaluating citation functions in CiTO: Cognitive issues. In Presutti V., Stankovic M., Cambria E., Cantador I., DiAAAAIorio A., DiAAAANoia T., Lange C., Recupero D. R., and Tordai A. (eds.), Semantic Web: Trends and Challenges, pp. 580–94. Berlin: Springer International Publishing.
Ciancarini P., Di Iorio A., Nuzzolese A. G., Peroni S., and Vitali F. 2013. Semantic annotation of scholarly documents and citations. In Baldoni M., Baroglio C., Boella G., and Micalizio R. (eds.), AI*IA 2013: Advances in Artificial Intelligence, 8249: pp. 336–47. Berlin: Springer.
Cortes C., and Vapnik V. 1995. Support-vector networks. Machine Learning, 20 (3): 273–97.
Di Iorio A., Nuzzolese A. G., & Peroni S., 2013. Characterising citations in scholarly documents: The CiTalO framework. In Extended Semantic Web Conference, Springer, Berlin, pp. 6677.
Dong C., and Schäfer U., 2011. Ensemble-style self-training on citation classification. In Proceedings of 5th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 623–31.
Fleiss Joseph L. 1971. Measuring nominal scale agreement among many raters. In Psychological Bulletin, 76 (5): 378–82.
Garzone M. A. 1997. Automated classification of citations using linguistic semantic grammars. Master’s Thesis. The University of Western Ontario. Available at
Garzone M. and Mercer R. E., 2000. Towards an automated citation classifier. In Advances in Artificial Intelligence, Springer, Berlin, pp. 337–46.
Geertzen J. 2012. Inter-Rater Agreement with multiple raters and variables. Retrieved October 8, 2014, from
Herlach G., 1978. Can retrieval of information from citation indexes be simplified? Multiple mention of a reference as a characteristic of the link between cited and citing article. Journal of the American Society for Information Science 29 (6): 308–10.
Hernández–Alvarez M. and Gómez J.M. 2015a. Survey about citation context Analysis: tasks, techniques, and resources. Natural Language Engineering. Available on CJO 2015 doi: 10.1017/S1351324915000388.
Hernández–Alvarez M. and Gómez J.M., 2015b. Esquema de anotación para categorización de citas en bibliografía científica. Procesamiento del Lenguaje Natural 54: 4552.
Hirsch J. E., 2005. An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences of the United States of America, United States of America 102 (46): 16569–72.
Hyland K. 1998. Hedging in Scientific Research Articles, Vol. 54. Amsterdam: John Benjamins Publishing.
Hyland K., 1996. Writing without conviction? Hedging in science research articles. Applied Linguistics 17: 433–54.
Di Iorio A., Nuzzolese A. G., and Peroni S. 2013. Towards the automatic identification of the nature of citations. In García A., Lange C., Lord P. and Stevens R. (eds.), SePublica, pp. 6374. Montpellier, France: SePublica.
Jochim C., and Schütze H., 2012. Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING’12, Mumbai, India, pp. 1343–58.
Kataria S., Mitra P., and Bhatia S., 2010. Utilizing Context in Generative Bayesian Models for Linked Corpus. In AAAI Conference on Artificial Intelligence, Atlanta, Georgia, USA, pp. 1340–45.
Krippendorff K., 2004. Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research 30 (3): 411–33.
Landis J. R., and Koch G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33: 159–74.
Li X., He Y., Meyers A., and Grishman R., 2013. Towards fine-grained citation function classification. In Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 402–7.
Liakata M., Saha S., Dobnik S., Batchelor C., and Rebholz-Schuhmann D., 2012. Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28: 9911000.
Marder E., Kettenmann H., and Grillner S., 2010. Impacting our young. Proceedings of the National Academy of Sciences of the United States of America 107: 21233.
McCain K. W., and Turner K., 1989. Citation context analysis and aging patterns of journal articles in molecular genetics. Scientometrics 17 (1): 127–63.
McKeown K., Daume H., Chaturvedi S., Paparrizos J., Thadani K., Barrio P., Biran O., Bothe S., Collins M., Fleischmann K. R., Gravano L., Jha R., King B., McInerney K., Moon T., Neelakantan A., O’Seaghdha D., Radev D., Templeton C. and Teufel S. 2016. Predicting the impact of scientific concepts using full-text features. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23612.
Mercer R. E., Di Marco C., and Kroon F. W., 2004. The frequency of hedging cues in citation contexts in scientific writing. In Advances in Artificial Intelligence, Springer, Berlin, pp. 7588.
Meyers A., 2013. Contrasting and corroborating citations in journal articles. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, Hissar, Bulgaria, pp. 460–66.
Moravcsik M. J., and Murugesan P., 1975. Some results on the function and quality of citations. Social Studies of Science 5 (1): 8692.
Page L., Brin S., Motwani R., and Winograd T. 1999. The PageRank citation ranking: bringing order to the web. Technical Report (SIDL-WP-1999-0120), Stanford InfoLab, Stanford University.
Prabha C. G. 1983. Some aspects of citation behavior: a pilot study in business administration. Journal of the American Society for Information Science, 34 (3): 202–6.
Radicchi F., 2012. In science “there is no bad publicity”: papers criticized in comments have high scientific impact. Nature Scientific Reports 2: 815.
Sample I. 2013. Nobel winner declares boycott of top science journals. The Guardian.
Small H., 1973. Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of the American Society for Information Science 24: 265–69.
Sollaci L. B., and Pereira M. G., 2004. The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey. Journal of the Medical Library Association 92 (3): 364.
Swales J., 1990. Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press.
Teufel S. 1999. Argumentative zoning: information extraction from scientific text. Doctoral dissertation, School of Cognitive Science, University of Edinburgh, UK.
Teufel S., Siddharthan A., and Tidhar D., 2006. Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ACL, Stroudsburg, PA, pp. 103–10.
Teufel S., Siddharthan A., and Tidhar D., 2009. An annotation scheme for citation function. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, ACL, Stroudsburg, PA, pp. 80–7.
Verlic M., Stiglic G., Kocbek S., and Kokol P. 2008. Sentiment in Science - a case study of CBMS contributions in years 2003 to 2007. In 2008 21st IEEE International Symposium on Computer-Based Medical Systems, Finland: University of Jyväskylä, pp. 138–43.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 8
Total number of PDF views: 61 *
Loading metrics...

Abstract views

Total abstract views: 329 *
Loading metrics...

* Views captured on Cambridge Core between 9th April 2017 - 24th November 2017. This data will be updated every 24 hours.