Skip to main content Accessibility help
×
Home

NLP-driven citation analysis for scientometrics

  • RAHUL JHA (a1), AMJAD-ABU JBARA (a1), VAHED QAZVINIAN (a2) and DRAGOMIR R. RADEV (a3)

Abstract

This paper summarizes ongoing research in Natural-Language-Processing-driven citation analysis and describes experiments and motivating examples of how this work can be used to enhance traditional scientometrics analysis that is based on simply treating citations as a ‘vote’ from the citing paper to cited paper. In particular, we describe our dataset for citation polarity and citation purpose, present experimental results on the automatic detection of these indicators, and demonstrate the use of such annotations for studying research dynamics and scientific summarization. We also look at two complementary problems that show up in Natural-Language-Processing-driven citation analysis for a specific target paper. The first problem is extracting citation context, the implicit citation sentences that do not contain explicit anchors to the target paper. The second problem is extracting reference scope, the target relevant segment of a complicated citing sentence that cites multiple papers. We show how these tasks can be helpful in improving sentiment analysis and citation-based summarization.

Copyright

References

Hide All
Abu-Jbara, A., Ezra, J., and Radev, D. R., 2013. Purpose and polarity of citation: Towards nlp-based bibliometrics. In HLT-NAACL, Atlanta, Georgia, USA, Association for Computational Linguistics, pp. 596606.
Abu-Jbara, A., and Radev, D., 2011. Coherent citation-based summarization of scientific papers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, Association for Computational Linguistics, pp. 500–9.
Abu Jbara, A., and Radev, D., 2012. Reference scope identification in citing sentences. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montréal, Canada, Association for Computational Linguistics, pp. 8090.
Angrosh, M. A., Cranefield, S., and Stanger, N., 2013. Conditional random field based sentence context identification: enhancing citation services for the research community. In Proceedings of the First Australasian Web Conference - Volume 144, AWC ’13, Darlinghurst, Australia, Australia, Australian Computer Society, Inc, pp. 5968.
Athar, A., 2011. Sentiment analysis of citations using sentence structure-based features. In Proceedings of the ACL 2011 Student Session, Portland, OR, USA, Association for Computational Linguistics, pp. 81–7.
Athar, A., and Teufel, S., 2012a. Detection of implicit citations for sentiment detection. In Proceedings of the Workshop on Detecting Structure in Scholarly Discourse, Jeju Island, Korea, Association for Computational Linguistics, pp. 1826.
Athar, A., and Teufel, S., 2012b. Context-enhanced citation sentiment detection. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT ’12, Montréal, Canada, Association for Computational Linguistics, pp. 597601.
Bergstrom, C. E., 2007. Measuring the value and prestige of scholarly journals. Coll Res Libr News 68 (5): 314316.
Bergstrom, C. T., West, J. D., and Wiseman, M. A., 2008. The EigenfactorTM metrics. Journal of Neuroscience 28 (45): 11433–4.
Biber, D. 1988. Variation Across Speech and Writing. Cambridge, Cambridge University Press.
Bletsas, A., and Sahalos, J. N., 2009. Hirsch index rankings require scaling and higher moment. Journal of the American Society for Information Science and Technology 60 (12): 2577–86.
Bonzi, S., 1982. Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science 33 (4): 208–16.
Bonzi, S., and Snyder, H. W., 1991. Motivations for citation: a comparison of self citation and citation to others. Scientometrics 21 (2): 245–54.
Bornmann, L., and Marx, W. 2013. Standards for the application of bibliometrics in the evaluation of individual researchers working in the natural sciences. ArXiv e-prints.
Bornmann, L., and Marx, W., 2014. The wisdom of citing scientists. Journal of the Association for Information Science and Technology 65 (6): 1288–92.
Bradshaw, S. 2003. Reference directed indexing: redeeming relevance for subject search in citation indexes. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries, Trondheim, Norway.
Braun, T., Bujdosó, E., and Schubert, A. 1987. Literature of Analytical Chemistry: A Scientometric Evaluation. Boca Raton, FL: CRC Press.
Braun, T., Glänzel, W., and Schubert, A., 2006. A hirsch-type index for journals. Scientometrics 69 (1): 169173.
Brody, T., Harnad, S., and Carr, L., 2006. Earlier web usage statistics as predictors of later citation impact. Journal of the American Society for Information Science and Technology 57 (8): 1060–72.
Bunescu, R., and Mooney, R., 2005. A shortest path dependency kernel for relation extraction. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, Association for Computational Linguistics, pp. 724–31.
Cheang, B., Chu, S. K. W., Li, C., and Lim, A., 2014. A multidimensional approach to evaluating management journals: refining pagerank via the differentiation of citation types and identifying the roles that management journals play. Journal of the Association for Information Science and Technology 65 (12): 2581–91.
Chubin, D. E., and Moitra, S. D. 1975. Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science 5 (4): 423–41.
Church, K. W. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, pp. 136–43, Austin, Texas, USA. Association for Computational Linguistics.
Cohen, J., 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin 70 (4): 213–20.
Cormode, G., Ma, Q., Muthukrishnan, S., and Thompson, B., 2013. Socializing the h-index. Journal of Informetrics 7 (3): 718–21.
Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., and Zhai, C., 2014. Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology 65 (9): 1820–33.
Egghe, L., 2014. A good normalized impact and concentration measure. Journal of the Association for Information Science and Technology 65 (10): 2152–54.
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., and Radev, D. 2008. Blind men and elephants: What do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59 (1): 5162.
Erkan, G., and Radev, D. R. 2004. Lexrank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), 22:457–79.
Eysenbach, G. 2011. Can tweets predict citations? metrics of social impact based on twitter and correlation with traditional metrics of scientific impact. Journal of Medical Internet Research, 13 (4).
Ferrara, E., and Romero, A. E., 2013. Scientific impact evaluation and the effect of self-citations: Mitigating the bias by discounting the h-index. Journal of the American Society for Information Science and Technology 64 (11): 2332–39.
Frandsen, T. F., and Nicolaisen, J., 2013. The ripple effect: citation chain reactions of a nobel prize. Journal of the American Society for Information Science and Technology 64 (3): 437–47.
Garfield, E. 1964. Can citation indexing be automated? Statistical Assoc. Methods for Mechanized Documentation, Symposium Proceedings. Washington, US.
Garfield, E. 2006. Citation indexes for science. a new dimension in documentation through association of ideas. International Journal of Epidemiology 35 (5):1123–27.
Garfield, E., Sher, I. H., and Torpie, R. J. 1984. The Use of Citation Data in Writing the History of Science. Institute for Scientific Information Inc., Philadelphia, Pennsylvania, USA.
Gorraiz, J., Gumpenberger, C., and Schlögl, C., 2014. Usage versus citation behaviours in four subject areas. Scientometrics 101 (2): 1077–95.
Halevi, G., and Moed, H. F., 2013. The thematic and conceptual flow of disciplinary research: A citation context analysis of the journal of informetrics, 2007. Journal of the American Society for Information Science and Technology 64 (9): 19031913.
Haustein, S., Peters, I., Sugimoto, C. R., Thelwall, M., and Larivière, V., 2014. Tweeting biomedicine: An analysis of tweets and citations in the biomedical literature. Journal of the Association for Information Science and Technology 65 (4): 656–69.
Heneberg, P., 2013. Lifting the fog of scientometric research artifacts: On the scientometric analysis of environmental tobacco smoke research. Journal of the American Society for Information Science and Technology 64 (2): 334–44.
Hodges, T. L. 1972. Citation Indexing-its Theory and Application in Science, Technology, and Humanities. Ph.D. Thesis, University of California at Berkeley.
Hou, W.-R., Li, M., and Niu, D.-K., 2011. Counting citations in texts rather than reference lists to improve the accuracy of assessing scientific contribution: citation frequency of individual articles in other papers more fairly measures their scientific contribution than mere presence in refere. BioEssays : News and Reviews in Molecular, Cellular and Developmental Biology 33 (10): 724–7.
Jonkers, K., Derrick, G. E., Lopez-Illescas, C., and Van den Besselaar, P. 2014. Measuring the scientific impact of e-research infrastructures: a citation based approach? Scientometrics 101 (2): 1179–94.
Kaplan, D., Iida, R., and Tokunaga, T., 2009. Automatic extraction of citation contexts for research paper summarization: A coreference-chain based approach. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, Suntec City, Singapore, Association for Computational Linguistics, pp. 8895.
Kim, H. D., and Zhai, C., 2009. Generating comparative summaries of contradictory opinions in text. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, New York, NY, USA, ACM, pp. 385394.
Klosik, D. F., and Bornholdt, S. 2014. The citation wake of publications detects Nobel laureates’ papers. PLoS ONE 9 (12): e113184. doi: 10.1371/journal.pone.0113184.
Kostoff, R. N., del Rio, J. A., Humenik, J. A., Garcia, E. O., and Ramirez, A. M., 2001. Citation mining: Integrating text mining and bibliometrics for research user profiling. Journal of the American Society for Information Science and Technology 52 (13): 1148–56.
Lafferty, J. D., McCallum, A., and Pereira, F. C. N., 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc, pp. 282–89.
Landis, J. R., and Koch, G. G., 1977. The measurement of observer agreement for categorical data. Biometrics 33 (1): 159174.
Li, R., Chambers, T., Ding, Y., Zhang, G., and Meng, L., 2014. Patent citation analysis: calculating science linkage based on citing motivation. Journal of the Association for Information Science and Technology 65 (5): 1007–17.
Liu, J. S., Chen, H.-H., Ho, M. H.-C., and Li, Y.-C., 2014a. Citations with different levels of relevancy: tracing the main paths of legal opinions. Journal of the Association for Information Science and Technology 65 (12): 2479–88.
Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., and Lin, Y., 2014b. Literature retrieval based on citation context. Scientometrics 101 (2): 1293–307.
Liu, Y., and Rousseau, R., 2014. Citation analysis and the development of science: a case study using articles by some Nobel prize winners. Journal of the Association for Information Science and Technology 65 (2): 281–9.
MacRoberts, M. H., and MacRoberts, B. R., 1984. The negational reference: Or the art of dissembling. Social Studies of Science 14 (1): 91–4.
Magerman, D. M., 1995. Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, ACL ’95, Cambridge, Massachusetts, Association for Computational Linguistics, pp. 276–83.
Milard, B., 2014. The social circles behind scientific references: relationships between citing and cited authors in chemistry publications. Journal of the Association for Information Science and Technology 65 (12): 2459–68.
Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D., and Zajic, D., 2009. Using citations to generate surveys of scientific paradigms. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09, Boulder, Colorado, Association for Computational Linguistics, pp. 584–92.
Morante, R., and Blanco, E., 2012. *sem 2012 shared task: resolving the scope and focus of negation. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, Montréal, Canada, Association for Computational Linguistics, pp. 265–74.
Moravcsik, M. J., and Murugesan, P., 1975. Some results on the function and quality of citations. Social Studies of Science 5 : 8692.
Nakov, P. I., Schwartz, A. S., and Hearst, M. A. 2004. Citances: citation sentences for semantic analysis of bioscience text. In Proceedings of the SIGIR’04 workshop on Search and Discovery in Bioinformatics, Sheffield, UK.
Nanba, H., Kando, N., and Okumura, M., 2004. Classification of research papers using citation links and citation types: towards automatic review article generation. In Proceedings of the 11th SIG Classification Research Workshop, Chicago, USA, pp. 117–34.
Nanba, H., and Okumura, M., 1999. Towards multi-paper summarization using reference information. In IJCAI ’99: Proceedings of the 16th International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc, pp. 926–31.
Nenkova, A., and Passonneau, R. 2004. Evaluating content selection in summarization: the pyramid method. In Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (HLT-NAACL ’04), Boston, Massachusetts.
Och, F. J., and Ney, H., 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29 (1): 1951.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J., 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, Association for Computational Linguistics, pp. 311–18.
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., and Webber, B. 2008. The penn discourse treebank 2.0. In Proceedings of LREC, 2008, Marrakech, Morocco.
Prathap, G., 2014. A three-class, three-dimensional bibliometric performance indicator. Journal of the Association for Information Science and Technology 65 (7): 1506–8.
Qazvinian, V., and Radev, D. R., 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, Coling 2008 Organizing Committee, pp. 689–96.
Qazvinian, V., and Radev, D. R., 2010. Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, Association for Computational Linguistics, pp. 555–64.
Qazvinian, V., Radev, D. R., Mohammad, S. M., Dorr, B., Zajic, D., Whidby, M., and Moon, T. 2013. Generating extractive summaries of scientific paradigms. J. Artif. Int. Res. 46 (1): 165201. El Segundo, CA, USA.
Qazvinian, V., Radev, D. R., and Özgür, A., 2010. Citation summarization through keyphrase extraction. In Proceedings of the 23nd International Conference on Computational Linguistics (COLING-10), Beijing, China, pp. 895903.
Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J., 1985. A Comprehensive Grammar of the English Language. London: Longman.
Radev, D. R., Muthukrishnan, P., Qazvinian, V., and Abu-Jbara, A., 2013. The acl anthology network corpus. Language Resources and Evaluation 47 (4): 919–44.
Radicchi, F., and Castellano, C., 2013. Analysis of bibliometric indicators for individual scholars in a large data set. Scientometrics 97 (3): 627–37.
Rafols, I., and Meyer, M., 2009. Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience. Scientometrics 82 (2): 263–87.
Shen, J., Yao, L., Li, Y., Clarke, M., Wang, L., and Li, D., 2013. Visualizing the history of evidence-based medicine: a bibliometric analysis. Journal of the American Society for Information Science and Technology 64 (10): 2157–72.
Small, H. 1982. Citation context analysis. In Progress in Communication Sciences 3: 287310.
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning C. D., Ng, A. Y., and Potts, C., 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington. Association for Computational Linguistics, pp. 16311642.
Spiegel-Rösing, I., 1977. Science studies: bibliometric and content analysis. Social Studies of Science 7 (1): 97113.
Surowiecki, J. 2004. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Doubleday.
Swales, J. M. 1990. Genre Analysis: English in Academic and Research Settings. Cambridge Applied Linguistics. Cambridge, Cambridge University Press.
Teufel, S. 2006. Argumentative zoning for improved citation indexing. In Shanahan, J. G., Qu, Y., and Wiebe, J. (eds.), Computing attitude and affect in text: Theory and Applications, 20:159–69. Springer Netherlands. http://dx.doi.org/10.1007/1-4020-4102-0_13
Teufel, S., Siddharthan, A., and Tidhar, D. 2006. Automatic classification of citation function. In Proceedings of EMNLP-06, Sydney, Australia.
Thelwall, M., Haustein, S., Larivière, V., and Sugimoto, C. R., 2013. Do altmetrics work? twitter and ten other social web services. PLoS ONE 8 (5): e64841.
Thompson, G., and Yiyun, Y., 1991. Evaluation in the reporting verbs used in academic papers. Applied Linguistics 12 (4): 365–82.
Velden, T., and Lagoze, C., 2013. The extraction of community structures from publication networks to support ethnographic observations of field differences in scientific communication. Journal of the American Society for Information Science and Technology 64 (12): 2405–27.
Vinkler, P. 2010. The Evaluation of Research by Scientometric Indicators, pp. 13. Chandos Learning and Teaching Series. Oxfordshire, United Kingdom: Chandos Publishing.
Waltman, L., van Eck, N. J., and Wouters, P. 2013. Counting publications and citations: Is more always better? Journal of Informetrics 7 (3): 635–41, ISSN , http://dx.doi.org/10.1016/j.joi.2013.04.001.
Wan, X., and Liu, F., 2014a. WL-index: leveraging citation mention number to quantify an individual’s scientific impact. Journal of the Association for Information Science and Technology 65 (12): 2509–17.
Wan, X., and Liu, F., 2014b. Are all literature citations equally important? Automatic citation strength estimation and its applications. Journal of the Association for Information Science and Technology 65 (9): 1929–38.
Weinstock, M. 1971. Citation Indexes, Kent, A. (ed.), vol. 5, Encyclopedia of Library and Information Science. New York: Marcel Dekker.
White, H. D., 2004. Citation analysis and discourse analysis revisited. Applied Linguistics 25 (1): 89116.
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., and Patwardhan, S., 2005. Opinionfinder: a system for subjectivity analysis. In Proceedings of HLT/EMNLP on Interactive Demonstrations, HLT-Demo ’05, Vancouver, B.C., Canada, Association for Computational Linguistics, pp. 3435.
Yarowsky, D., 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, ACL ’95, Cambridge, Massachusetts, Association for Computational Linguistics, pp. 189–96.
Yin, X., Huang, J. X., and Li, Z., 2011. Mining and modeling linkage information from citation context for improving biomedical literature retrieval. Information Processing & Management 47 (1): 5367.
Zhang, C.-T. 2009. The e-index, complementing the h-index for excess citations. PLoS ONE 4 (5): e5429+.
Zhao, D., and Strotmann, A., 2014. In-text author citation analysis: feasibility, benefits, and limitations. Journal of the Association for Information Science and Technology 65 (11): 2348–58.
Ziman, J. M., 1968. Public Knowledge: An Essay Concerning the Social Dimension of Science. Cambridge, England, UK: Cambridge University Press.
Zitt, M., and Cointet, J.-P. 2013. Citation impacts revisited: how novel impact measures reflect interdisciplinarity and structural change at the local and global level. ArXiv e-prints.

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed