Skip to main content
    • Aa
    • Aa

Citation function, polarity and influence classification


Current methods for assessing the impact of authors and scientific media employ tools such as H-Index, Co-Citation and PageRank. These tools are primarily based on citation counting, which considers all citations to be equal. This type of methods can produce perverse incentives to publish controversial or incomplete papers, as mixed or negative reviews often generate larger citation counts and better indexes, regardless of whether the citations were critical or exerted minimal influence on the citing document. Passing citations that are employed to establish background, which do not have a real impact on the citing paper, are common in scientific literature. However, these citations have equal weight in impact evaluations. Notable researchers have emphasized the need to correct this situation by developing estimation methods that consider the different roles of quotations in citing papers. To accomplish this type of evaluation, a context citation analysis should be applied to determine the nature of the citations. We propose that citations should be categorized using four dimensions – FUNCTION, POLARITY, ASPECTS and INFLUENCE – as these dimensions provide adequate information that can be employed toward the generation of a qualitative method to measure the impact of a given publication in a citing paper. In this paper, we used interchangeably the words influence and impact. We present a method for obtaining this information using our proposed classification scheme and manually annotated corpus, which is marked with meaningful keywords and labels to help identify the characteristics or properties that constitute what we call ASPECTS. We develop a classification scheme which considers purpose definition shared by previous works. Our contribution is to abstract purpose classes from several other schemes and divide a complex structure in more manageable parts, to attain a simple system that combines low granularity dimensions but nevertheless produces a fine-grained classification. For annotators, the classification process is simple because in a first step, the coders distinguish only four primary classes, and in a second pass, they add the information contained in ASPECTS keyword and labels to obtain the more specific functions. This way, we gain a high granularity labeling that gives enough information about the citations to characterize and classify them, and we achieve this detailed coding with a straightforward process where the level of human error could be minimized.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

R. Artstein , and M. Poesio , 2008. Inter-coder agreement for computational linguistics. Computational Linguistics 34 (4): 555–96.

V. Cano , 1989. Citation behavior: classification, utility, and location. Journal of the American Society for Information Science 40 (4): 284–90.

C. Cortes , and V. Vapnik 1995. Support-vector networks. Machine Learning, 20 (3): 273–97.

Joseph L. Fleiss 1971. Measuring nominal scale agreement among many raters. In Psychological Bulletin, 76 (5): 378–82.

G. Herlach , 1978. Can retrieval of information from citation indexes be simplified? Multiple mention of a reference as a characteristic of the link between cited and citing article. Journal of the American Society for Information Science 29 (6): 308–10.

J. E. Hirsch , 2005. An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences of the United States of America, United States of America 102 (46): 16569–72.

K. Hyland 1998. Hedging in Scientific Research Articles, Vol. 54. Amsterdam: John Benjamins Publishing.

K. Hyland , 1996. Writing without conviction? Hedging in science research articles. Applied Linguistics 17: 433–54.

K. Krippendorff , 2004. Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research 30 (3): 411–33.

J. R. Landis , and G. G. Koch (1977). The measurement of observer agreement for categorical data. Biometrics 33: 159–74.

M. Liakata , S. Saha , S. Dobnik , C. Batchelor , and D. Rebholz-Schuhmann , 2012. Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28: 9911000.

E. Marder , H. Kettenmann , and S. Grillner , 2010. Impacting our young. Proceedings of the National Academy of Sciences of the United States of America 107: 21233.

K. W. McCain , and K. Turner , 1989. Citation context analysis and aging patterns of journal articles in molecular genetics. Scientometrics 17 (1): 127–63.

K. McKeown , H. Daume , S. Chaturvedi , J. Paparrizos , K. Thadani , P. Barrio , O. Biran , S. Bothe , M. Collins , K. R. Fleischmann , L. Gravano , R. Jha , B. King , K. McInerney , T. Moon , A. Neelakantan , D. O’Seaghdha , D. Radev , C. Templeton and S. Teufel 2016. Predicting the impact of scientific concepts using full-text features. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23612.

M. J. Moravcsik , and P. Murugesan , 1975. Some results on the function and quality of citations. Social Studies of Science 5 (1): 8692.

C. G. Prabha 1983. Some aspects of citation behavior: a pilot study in business administration. Journal of the American Society for Information Science, 34 (3): 202–6.

H. Small , 1973. Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of the American Society for Information Science 24: 265–69.

S. Teufel , A. Siddharthan , and D. Tidhar , 2006. Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ACL, Stroudsburg, PA, pp. 103–10.

M. Verlic , G. Stiglic , S. Kocbek , and P. Kokol 2008. Sentiment in Science - a case study of CBMS contributions in years 2003 to 2007. In 2008 21st IEEE International Symposium on Computer-Based Medical Systems, Finland: University of Jyväskylä, pp. 138–43.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 2
Total number of PDF views: 48 *
Loading metrics...

Abstract views

Total abstract views: 276 *
Loading metrics...

* Views captured on Cambridge Core between 9th April 2017 - 22nd September 2017. This data will be updated every 24 hours.