Skip to main content
×
×
Home

Survey about citation context analysis: Tasks, techniques, and resources

  • MYRIAM HERNÁNDEZ-ALVAREZ (a1) and JOSÉ M. GOMEZ (a2)
Abstract

Bibliometric calculations currently used to assess the quality of researchers, articles, and scientific journals have serious structural problems; many authors have noted the weakness of citation counts, because they are purely quantitative and do not differentiate between high- and low-citing papers. If a paper’s reputation is simply evaluated according to the number of its citations, then incomplete, incorrect, or controversial articles may be promoted, regardless of their relevancy. Therefore, perverse incentives are generated for researchers who may publish many incorrect or incomplete papers to achieve high impact indexes. It is essential to improve the objective criteria for automatic article-quality assessments. However, to obtain these new criteria, it is necessary to advance the programmed detection of context, polarity, and function of bibliographic references.

We present an overview of general concepts and review contributions to the solutions to problems related to these issues, with the purpose of identifying trends and suggesting possible future research directions.

Copyright
References
Hide All
Abu-Jbara, A., and Radev, D., 2012. Reference scope identification in citing sentences. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL. Stroudsburg, PA, pp. 8090.
Abu-Jbara, A., Ezra, J., and Radev, D., 2013. Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL. Atlanta, GA, pp. 596606.
Angrosh, M. A., Cranefield, S., and Stanger, N. 2013. Conditional random field based sentence context identification: Enhancing citation services for the research community. In Proceedings of the First Australasian Web Conference, Adelaide, Australia, Australian Computer Society, Inc., vol. 144: pp. 5968.
Artstein, R., and Poesio, M., 2008. Inter-coder agreement for computational linguistics. Computational Linguistics 34 (4): 555–96.
Athar, A., 2011. Sentiment analysis of citations using sentence structure-based features. In Proceedings of the ACL 2011 Student Session, ACL. Stroudsburg, PA, pp. 81–7.
Athar, A. 2014. Sentiment analysis of scientific citations. Technical Report, University of Cambridge, Computer Laboratory, (UCAM-CL-TR-856).
Athar, A., and Teufel, S., 2012. Context-enhanced citation sentiment detection. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL. Montreal, Canada, pp. 597601.
Biber, D., and Finegan, E. 1994. Intra-textual variation within medical research articles. In Oostdijiik, N. and DeHaan, P. (eds.), Corpus-Based Research into Language, pp. 201–22. Amsterdam: Rodopi.
Bird, S., Dale, R., Dorr, B. J., Gibson, B., Joseph, M. T., Kan, M. Y., Lee, D., Powley, B., Radev, D. R., and Tan, Y. F., 2008. The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In Proceedings of the 6th International Conference on Language Resources and Evaluation Conference (LREC’08), Marrakesh, Morocco, pp. 1755–59.
Blitzer, J., Dredze, M., and Pereira, F., 2007. Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 440–47.
Boldrini, E., Fernández Martínez, J., Gómez Soriano, J. M., and Martínez Barco, P., 2009. Machine learning techniques for automatic opinion detection in non-traditional textual genres. In Proceedings of the First Workshop on Opinion Mining and Sentiment Analysis, WOMSA09, Seville, Spain, pp. 110–19.
Brembs, B., and Munafò, M. 2013. Deep impact: Unintended consequences of journal rank. Digital Libraries; Physics and Society. Available at http://arxiv.org/abs/1301.3748
Chen, M., Xu, Z., Weinberger, K., and Sha, F., 2012. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, pp. 767–84.
Ciancarini, P., Di Iorio, A., Nuzzolese, A. G., Peroni, S., and Vitali, F. 2014. Evaluating citation functions in CiTO: Cognitive issues. Semantic Web: Trends and Challenges, pp. 580–94. Berlin: Springer International Publishing.
Ciancarini, P., Iorio, A.Di Nuzzolese, A. G., Peroni, S., and Vitali, F. 2013. Semantic annotation of scholarly documents and citations. AI*IA 2013: Advances in Artificial Intelligence, vol. 8249: pp. 336–47. Berlin: Springer.
Davletov, F., Aydin, A. S., and Cakmak, A., 2014. High impact academic paper prediction using temporal and topological features. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, pp. 491–98.
Dong, C., and Schäfer, U., 2011. Ensemble-style self-training on citation classification. In Proceedings of 5th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 623–31.
Fang, F. C., Steen, R. G., and Casadevall, A., 2012. Misconduct accounts for the majority of retracted scientific publications. In Proceedings of the National Academy of Sciences of the United States of America, United States of America, vol. 109, pp. 17028–33.
Fernández, J., Boldrini, E., Gómez, J. M., and Martínez-Barco, P. 2011. Evaluating EmotiBlog robustness for sentiment analysis tasks. In Natural Language Processing and Information Systems, Heidelberg: Springer-Verlag, pp. 290–94.
Garfield, E., 1972. Citation analysis as a tool in journal evaluation: Journals can be ranked by frequency and impact of citations for science policy studies. Science 178: 471–79.
Garzone, M. A. 1997. Automated classification of citations using linguistic semantic grammars. Master’s Thesis. The University of Western Ontario. Available at http://www.collectionscanada.gc.ca/obj/s4/f2/dsk2/ftp04/mq28570.pdf
Garzone, M., and Mercer, R. E. 2000. Towards an automated citation classifier. In Advances in Artificial Intelligence, pp. 337–46. Berlin Heidelberg: Springer.
Green, A., Ashley, K., Litman, D., Reed, C., and Walker, V. 2014. In Proceedings of the First Workshop on Argumentation Mining, ACL. Baltimore, MD, p. 3.
He, Q., Kifer, D., Pei, J., Mitra, P., and Giles, C. L., 2011. Citation recommendation without author supervision. In Proceedings of the 4th ACM international Conference on Web Search and Data Mining, ACM. Kowloon, Hong Kong, pp. 755–64.
Hernández, M., and Gómez, J. M., 2014. Survey in sentiment, polarity and function analysis of citation. In Proceedings of the First Workshop on Argumentation Mining, ACL. Baltimore, MD, pp. 102–3.
Hirsch, J. E. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences of the United States of America, United States of America, 102 (46): 16569–72.
Hyland, K., 1996. Writing without conviction? Hedging in science research articles. Applied Linguistics 17: 433–54.
Hyland, K. 1998. Hedging in Scientific Research Articles, vol. 54. Amsterdam: John Benjamins Publishing.
Ioannidis, J. P. A., 2005. Why most published research findings are false. Chance 18 (4): 40–7.
Iorio, A., Di Nuzzolese, A. G., and Peroni, S., 2013. Towards the Automatic Identification of the Nature of Citations. Montpellier, France: SePublica, pp. 6374.
Jochim, C., 2014. Improving citation polarity classification with product reviews. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL. Baltimore, MD, pp. 42–8.
Jochim, C., and Schütze, H., 2012. Towards a generic and flexible citation classifier based on a faceted classification scheme. In Procedings of COLING’12, Mumbai, India, pp. 1343–58.
Kang, I.-S., and Kim, B.-K. 2012. Characteristics of citation scopes: a preliminary study to detect citing sentences. In Computer Applications for Database, Education, and Ubiquitous Computing Information Science, pp. 80–5. Berlin: Springer.
Kaplan, D., Iida, R., and Tokunaga, T., 2009. Automatic extraction of citation contexts for research paper summarization: a coreference-chain based approach. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, ACL. Suntec, Singapore, pp. 8895.
Kataria, S., Mitra, P., and Bhatia, S., 2010. Utilizing context in generative bayesian models for linked corpus. In AAAI Conference in Artificial Intelligence, Atlanta, Georgia, USA, pp. 1340–45.
Kataria, S., Mitra, P., Caragea, C., and Giles, C. L. 2011. Context sensitive topic models for author influence in document networks. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Barcelona, Spain, vol. 22 (3): p. 2274.
Kessler, M. M. 1963. Bibliographic coupling between scientific papers. American documentation Wiley Periodicals, Inc. 14 (1): 1025.
Klein, D., and Manning, C. D. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting, ACL. Stroudsburg, PA, USA, vol. 1: pp. 423−30.
Li, X., He, Y., Meyers, A., and Grishman, R., 2013. Towards fine-grained citation function classification. In Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 402–7.
Liakata, M., Saha, S., Dobnik, S., Batchelor, C., and Rebholz-Schuhmann, D., 2012. Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28: 9911000.
Livne, A., Gokuladas, V., Teevan, J., Dumais, S. T., and Adar, E., 2014. CiteSight: supporting contextual citation recommendation using differential search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM. Gold Coast, Australia, pp. 807–16.
MacRoberts, M. H., and MacRoberts, B. R. 1984. The negational reference: Or the Art of dissembling. Social Studies of Science, London, Beverly Hills and New Delhi, 14 (1): 91–4, Sage Publications Ltd.
Marder, E., Kettenmann, H., and Grillner, S. 2010. Impacting our young. In Proceedings of the National Academy of Sciences of the United States of America United States of America, 107: 21233.
Mei, Q., and Zhai, C. 2008. Generating impact-based summaries for scientific literature. In Proceedings of the 46 Annual Meeting: HLT, ACL. Columbus, Ohio, USA, vol. 8: pp. 816–24.
Mercer, R. E., Di Marco, C., and Kroon, F. W. 2004. The frequency of hedging cues in citation contexts in scientific writing. In Advances in Artificial Intelligence, pp. 7588. Berlin: Springer Heidelberg.
Meyers, A., 2013. Contrasting and corroborating citations in journal articles. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, Hissar, Bulgaria, pp. 460–66.
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social studies of science, 5 (1), 8692.
Mullen, T., and Collier, N., 2004. Sentiment analysis using support vector machines with diverse information sources. In Conference on Empirical Methods in Natural Language Processing, ACL. Barcelona, Spain, pp. 412–18.
Nallapati, R. M., Ahmed, A., Xing, E. P., and Cohen, W. W., 2008. Joint latent topic models for text and citations. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. Las Vegas, Nevada, USA, pp. 542–50.
Nicholson, J. M., and Ioannidis, J. P. A., 2012. Research grants: Conform and be funded. Nature 492: 34–6.
Page, L., Brin, S., Motwani, R., and Winograd, T. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report, Stanford InfoLab, Stanford University (SIDL-WP-1999–0120).
Pang, B., and Lee, L., 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL. Morristown, NJ, USA, pp. 271–78.
Peldszus, A. 2014. Towards segment-based recognition of argumentation structure in short texts. In Proceedings of the First Workshop on Argumentation Mining ACL 2014, Baltimore, MD, USA, pp. 8897.
Prabowo, R., and Thelwall, M., 2009. Sentiment analysis: A combined approach. Journal of Informetrics 3: 143–57.
Qazvinian, V., and Radev, D. R. 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics, ACL. Stroudsburg, PA, vol. 1: pp. 689−96.
Qazvinian, V., and Radev, D. R. 2010. Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th Annual Meeting, ACL. Uppsala, Sweden, pp. 555−64.
Radev, D. R., Muthukrishnan, P., and Qazvinian, V., 2009. The ACL Anthology Network corpus. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, ACL. Suntec, Singapore, pp. 5461.
Radicchi, F., 2012. In science “there is no bad publicity”: Papers criticized in comments have high scientific impact. Nature Scientific Reports 2: 815.
Reyhani Hamedani, M., Kim, S. W., Lee, S. C., and Kim, D. J., 2013. On exploiting content and citations together to compute similarity of scientific papers. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, ACM. San Francisco, CA, USA, pp. 1553–56.
Ritchie, A., Robertson, S., and Teufel, S., 2008. Comparing citation contexts for information retrieval. In Proceedings of the 17th Acm Conference on Information and Knowledge Management, ACM. Napa Valley, CA, USA, pp. 213–22.
Sample, I. 2013. Nobel winner declares boycott of top science journals. The Guardian. Available at http://www.theguardian.com/science/2013/dec/09/nobel-winner-boycott-science-journals
Sayyadi, H., and Getoor, L., 2009. FutureRank: Ranking scientific articles by predicting their future PageRank. In SDM Siam International Conference on Data Mining, Sparks, Nevada, pp. 533–44.
Schreiber, M., 2013. A case study of the arbitrariness of the h-index and the highly-cited-publications indicator. Journal of Informetrics 7: 379–87.
Sebastiani, F., 2002. Machine learning in automated text categorization. ACM Computing Surveys 34: 147.
Siegel, D., and Baveye, P., 2010. Battling the paper glut. Science 329: 1466.
Small, H., 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science 24: 265–69.
Small, H., 2011. Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics 87: 373–88.
Sugiyama, K., Kumar, T., Kan, M.-Y., and Tripathi, R. C., 2010. Identifying citing sentences in research papers using supervised learning. In 2010 International Conference on Information Retrieval and Knowledge Management (CAMP), Shah Alam, Selangor, Malaysia, pp. 6772.
Teufel, S. 1999. Argumentative zoning: Information extraction from scientific text. Doctoral dissertation, School of Cognitive Science, University of Edinburgh, UK. Available at http://www.cl.cam.ac.uk/~sht25/thesis/t1.pdf
Teufel, S. 2010. The structure of scientific articles: Applications to citation indexing and summarization. CLSI–Studies in Computational Linguistics, Chicago: University of Chicago Press.
Teufel, S., and Moens, M. 1999. Discourse-level argumentation in scientific articles: Human and automatic annotation. In Towards Standards and Tools for Discourse Tagging: Proceedings of the Workshop, ACL. Somerset, NJ, USA, pp 8493.
Teufel, S., Siddharthan, A., and Tidhar, D. 2006 July. Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ACL. Stroudsburg, PA, pp. 103–10.
Teufel, S., Siddharthan, A., and Tidhar, D. 2009. An annotation scheme for citation function. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, ACL. Stroudsburg, PA, pp. 80−7.
Tsai, C.-T., Kundu, G., and Roth, D., 2013. Concept-based analysis of scientific literature. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM ‘13, ACM Press. New York, NY, USA, pp. 1733–38.
Van Noorden, R., 2013. Brazilian citation scheme outed. Nature 500 (7464): 510–11.
Verlic, M., Stiglic, G., Kocbek, S., and Kokol, P., 2008. Sentiment in Science – A Case Study of CBMS Contributions in Years 2003 to 2007. In 21st IEEE International Symposium on Computer-Based Medical Systems, University of Jyväskylä, Finland, pp. 138–43.
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., and Patwardhan, S., 2005. OpinionFinder. In Proceedings of HLT/EMNLP on Interactive Demonstrations, ACL. Morristown, NJ, USA, pp. 34–5.
Yan, R., Tang, J., Liu, X., Shan, D., and Li, X., 2011. Citation count prediction: Learning to estimate future citations for literature. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM. Glasgow, UK, pp. 1247–52.
Young, N. S., Ioannidis, J. P. A., and Al-Ubaydli, O., 2008. Why current publication practices may distort science. PLoS Medicine 5 (10): e201.
Zhang, G., Ding, Y., and Milojević, S., 2013. Citation content analysis (cca): A framework for syntactic and semantic analysis of citation content. Journal of the American Society for Information Science and Technology 64 (7): 1490–503.
Zhang, W., Yu, C., and Meng, W., 2007. Opinion retrieval from blogs. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM ‘07, ACM Press. New York, NY, USA, p. 831.
Zhu, X., Turney, P., Lemire, D., and Vellino, A., 2014. Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology 66 (2): 408–27.
Ziman, J. M. 1987. An Introduction to Science Studies: The Philosophical and Social Aspects of Science and Technology, Cambridge: Cambridge University Press.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed