Hostname: page-component-76fb5796d-2lccl Total loading time: 0 Render date: 2024-04-27T05:31:31.989Z Has data issue: false hasContentIssue false

In-depth analysis of the impact of OCR errors on named entity recognition and linking

Published online by Cambridge University Press:  18 March 2022

Ahmed Hamdi*
Affiliation:
University of La Rochelle, Laboratoire L3i, Faculté des Sciences et Technologies, Bâtiment Pascal, Avenue Michel Crépeau, 17042 La Rochelle, France
Elvys Linhares Pontes
Affiliation:
University of La Rochelle, Laboratoire L3i, Faculté des Sciences et Technologies, Bâtiment Pascal, Avenue Michel Crépeau, 17042 La Rochelle, France
Nicolas Sidere
Affiliation:
University of La Rochelle, Laboratoire L3i, Faculté des Sciences et Technologies, Bâtiment Pascal, Avenue Michel Crépeau, 17042 La Rochelle, France
Mickaël Coustaty
Affiliation:
University of La Rochelle, Laboratoire L3i, Faculté des Sciences et Technologies, Bâtiment Pascal, Avenue Michel Crépeau, 17042 La Rochelle, France
Antoine Doucet
Affiliation:
University of La Rochelle, Laboratoire L3i, Faculté des Sciences et Technologies, Bâtiment Pascal, Avenue Michel Crépeau, 17042 La Rochelle, France
*
*Corresponding author. E-mail: ahmed.hamdi@univ-lr.fr

Abstract

Named entities (NEs) are among the most relevant type of information that can be used to properly index digital documents and thus easily retrieve them. It has long been observed that NEs are key to accessing the contents of digital library portals as they are contained in most user queries. However, most digitized documents are indexed through their optical character recognition (OCRed) version which include numerous errors. Although OCR engines have considerably improved over the last few years, OCR errors still considerably impact document access. Previous works were conducted to evaluate the impact of OCR errors on named entity recognition (NER) and named entity linking (NEL) techniques separately. In this article, we experimented with a variety of OCRed documents with different levels and types of OCR noise to assess in depth the impact of OCR on named entity processing. We provide a deep analysis of OCR errors that impact the performance of NER and NEL. We then present the resulting exhaustive study and subsequent recommendations on the adequate documents, the OCR quality levels, and the post-OCR correction strategies required to perform reliable NER and NEL.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S. and Vollgraf, R. (2019). FLAIR: an easy-to-use framework for state-of-the-art NLP. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 5459.Google Scholar
Asahara, M. and Matsumoto, Y. (2003). Japanese named entity extraction with redundant morphological analysis. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, pp. 815.CrossRefGoogle Scholar
Bikel, D.M., Miller, S., Schwartz, R. and Weischedel, R. (1998). Nymble: a high-performance learning name-finder. arXiv preprint cmp-lg/9803003.Google Scholar
Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135146.CrossRefGoogle Scholar
Boros, E., Hamdi, A., Linhares Pontes, E., Cabrera-Diego, L.A., Moreno, J.G., Sidere, N. and Doucet, A. (2020a). Alleviating digitization errors in named entity recognition for historical documents. In Proceedings of the 24th Conference on Computational Natural Language Learning, Online. Association for Computational Linguistics, pp. 431441.Google Scholar
Boros, E., Linhares Pontes, E., Cabrera-Diego, L.A., Hamdi, A., Moreno, J.G., Sidère, N. and Doucet, A. (2020b). Robust named entity recognition and linking on historical multilingual documents. In Conference and Labs of the Evaluation Forum (CLEF 2020). Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, vol. 2696, Thessaloniki, Greece. CEUR-WS Working Notes, pp. 117.Google Scholar
Borthwick, A., Sterling, J., Agichtein, E. and Grishman, R. (1998). Nyu: description of the mene named entity system as used in muc-7. In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29–May 1, 1998.Google Scholar
Brando, C., Frontini, F. and Ganascia, J.-G. (2016). REDEN: named entity linking in digital literary editions using linked data sets. Complex Systems Informatics and Modeling Quarterly 7, 6080.CrossRefGoogle Scholar
Broscheit, S. (2019). Investigating entity knowledge in BERT with simple neural end-to-end entity linking. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), Hong Kong, China. Association for Computational Linguistics, pp. 677685.CrossRefGoogle Scholar
Cao, N.D., Wu, L., Popat, K., Artetxe, M., Goyal, N., Plekhanov, M., Zettlemoyer, L., Cancedda, N., Riedel, S. and Petroni, F. (2021). Multilingual autoregressive entity linking. CoRR. https://arxiv.org/abs/2103.12528.Google Scholar
Chen, H., Zukov-Gregoric, A., Li, X.D. and Wadhwa, S. (2019). Contextualized end-to-end neural entity linking. arXiv preprint arXiv:1911.03834.Google Scholar
Chiron, G., Doucet, A., Coustaty, M., Visani, M. and Moreux, J.-P. (2017). Impact of ocr errors on the use of digital libraries: towards a better access to information. In Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries. IEEE Press, pp. 249252.CrossRefGoogle Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 24932537.Google Scholar
Croft, W., Harding, S., Taghva, K. and Borsack, J. (1994). An evaluation of information retrieval accuracy with simulated ocr output. In Symposium on Document Analysis and Information Retrieval, pp. 115126.Google Scholar
Cucerzan, S. (2007). Large-scale named entity disambiguation based on Wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, pp. 708716.Google Scholar
Dernoncourt, F., Lee, J.Y. and Szolovits, P. (2017). Neuroner: an easy-to-use program for named-entity recognition based on neural networks. arXiv preprint arXiv:1705.05487.Google Scholar
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 41714186.Google Scholar
Dredze, M., McNamee, P., Rao, D., Gerber, A. and Finin, T. (2010). Entity disambiguation for knowledge base population. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING’10, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 277285.Google Scholar
Ehrmann, M., Hamdi, A., Pontes, E.L., Romanello, M. and Doucet, A. (2021). Named entity recognition and classification on historical documents: a survey. CoRR, abs/2109.11406.Google Scholar
Fang, Z., Cao, Y., Li, Q., Zhang, D., Zhang, Z. and Liu, Y. (2019). Joint entity linking with deep reinforcement learning. In The World Wide Web Conference, WWW’19. New York, NY, USA: Association for Computing Machinery, pp. 438447.CrossRefGoogle Scholar
Favre, B., Béchet, F. and Nocéra, P. (2005). Robust named entity extraction from large spoken archives. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 491498.CrossRefGoogle Scholar
Filannino, M., Brown, G. and Nenadic, G. (2013). Mantime: temporal expression identification and normalization in the tempeval-3 challenge. arXiv preprint arXiv:1304.7942.Google Scholar
Ganea, O.-E. and Hofmann, T. (2017). Deep joint entity disambiguation with local neural attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 26192629.CrossRefGoogle Scholar
Gefen, A. (2014). Les enjeux épistémologiques des humanités numériques. Socio-La nouvelle revue des sciences sociales, (4), 6174.Google Scholar
Goldberg, Y. and Levy, O. (2014). word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.Google Scholar
Gotoh, Y. and Renals, S. (2000). Information extraction from broadcast news. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 358(1769), 12951310.CrossRefGoogle Scholar
Grave, E., Bojanowski, P., Gupta, P., Joulin, A. and Mikolov, T. (2018). Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation.Google Scholar
Grishman, R. and Sundheim, B. (1996). Message understanding conference-6: a brief history. In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, vol. 1.Google Scholar
Grover, C., Givon, S., Tobin, R. and Ball, J. (2008). Named entity recognition for digitised historical texts. In LREC.Google Scholar
Guo, J., Xu, G., Cheng, X. and Li, H. (2009). Named entity recognition in query. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’09. New York, NY, USA: Association for Computing Machinery, pp. 267274.Google Scholar
Guo, S., Chang, M.-W. and Kiciman, E. (2013). To link or not to link? a study on end-to-end tweet entity linking. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia. Association for Computational Linguistics, pp. 10201030.Google Scholar
Guo, Z. and Barbosa, D. (2014). Robust entity linking via random walks. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM’14. New York, NY, USA: ACM, pp. 499508.CrossRefGoogle Scholar
Hamdi, A., Jean-Caurant, A., Sidere, N., Coustaty, M. and Doucet, A. (2019). An analysis of the performance of named entity recognition over ocred documents. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, pp. 333334.CrossRefGoogle Scholar
Hamdi, A., Jean-Caurant, A., Sidère, N., Coustaty, M. and Doucet, A. (2020). Assessing and minimizing the impact of OCR quality on named entity recognition. In Hall M., Merčun T., Risse T. and Duchateau F. (eds), Digital Libraries for Open Knowledge. Cham: Springer International Publishing, pp. 87101.Google Scholar
Han, X. and Zhao, J. (1999). NLPR_KBP in TAC 2009 KBP track: a two-stage method to entity linking. In In Proceedings of Test Analysis Conference 2009 (TAC 09). MIT Press.Google Scholar
Heino, E., Tamper, M., Mäkelä, E., Leskinen, P., Ikkala, E., Tuominen, J., Koho, M. and Hyvönen, E. (2017). Named entity linking in a complex domain: case second world war history. In Gracia, J., Bond F., McCrae J.P., Buitelaar P., Chiarcos C. and Hellmann S. (eds), Language, Data, and Knowledge. Cham: Springer International Publishing, pp. 120133.Google Scholar
Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S. and Weikum, G. (2011). Robust disambiguation of named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP’11. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 782792.Google Scholar
Holley, R. (2009). How good can it get? analysing and improving ocr accuracy in large scale historic newspaper digitisation programs. D-Lib Magazine 15(3/4).Google Scholar
Huynh, V.-N., Hamdi, A. and Doucet, A. (2020). When to use ocr post-correction for named entity recognition? In Digital Libraries at Times of Massive Societal Transition - Collaborating and Connecting Community during Global Change. Springer International Publishing.Google Scholar
Ittner, D.J., Lewis, D.D. and Ahn, D.D. (1995). Text categorization of low quality images. In Symposium on Document Analysis and Information Retrieval. Citeseer, pp. 301315.Google Scholar
Jing, H., Lopresti, D. and Shih, C. (2003). Summarizing noisy documents. In Proceedings of the Symposium on Document Image Understanding Technology, pp. 111119.Google Scholar
Journet, N., Visani, M., Mansencal, B., Van-Cuong, K. and Billy, A. (2017). Doccreator: a new software for creating synthetic ground-truthed document images. Journal of Imaging 3(4), 62.CrossRefGoogle Scholar
Kolitsas, N., Ganea, O.-E. and Hofmann, T. (2018). End-to-end neural entity linking. In Proceedings of the 22nd Conference on Computational Natural Language Learning. Association for Computational Linguistics, pp. 519529.CrossRefGoogle Scholar
Kukich, K. (1992). Spelling correction for the telecommunications network for the deaf. Communications of the ACM 35(5), 8091.CrossRefGoogle Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K. and Dyer, C. (2016). Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.Google Scholar
Lawrie, D., Mayfield, J. and Etter, D. (2020). Building OCR/NER test collections. In Proceedings of The 12th Language Resources and Evaluation Conference, pp. 46394646.Google Scholar
Le, P. and Titov, I. (2018). Improving entity linking by modeling latent relations between mentions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 15951604.CrossRefGoogle Scholar
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S. and Bizer, C. (2015). DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal 6(2), 167195.CrossRefGoogle Scholar
Li, Y., Wang, C., Han, F., Han, J., Roth, D. and Yan, X. (2013). Mining evidences for named entity disambiguation. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’13. New York, NY, USA: ACM, pp. 10701078.Google Scholar
Linhares Pontes, E., Cabrera-Diego, L.A., Moreno, J.G., Boros, E., Hamdi, A., Sidère, N., Coustaty, M. and Doucet, A. (2020a). Entity linking for historical documents: challenges and solutions. In Digital Libraries at Times of Massive Societal Transition - Collaborating and Connecting Community during Global Change. Springer International Publishing.Google Scholar
Linhares Pontes, E., Hamdi, A., Sidere, N. and Doucet, A. (2019). Impact of ocr quality on named entity linking. In Jatowt A., Maeda A. and Syn S.Y. (eds), Digital Libraries at the Crossroads of Digital Information for the Future. Cham: Springer International Publishing, pp. 102115.Google Scholar
Linhares Pontes, E., Moreno, J.G. and Doucet, A. (2020b). Linking named entities across languages using multilingual word embeddings. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, JCDL’20. New York, NY, USA: Association for Computing Machinery, pp. 329332.Google Scholar
Lopresti, D. (2005). Performance evaluation for text processing of noisy inputs. In Proceedings of the 2005 ACM Symposium on Applied Computing. ACM, pp. 759763.CrossRefGoogle Scholar
Lopresti, D. (2009). Optical character recognition errors and their effects on natural language processing. International Journal on Document Analysis and Recognition (IJDAR) 12(3), 141151.CrossRefGoogle Scholar
Luo, G., Huang, X., Lin, C.-Y. and Nie, Z. (2015). Joint entity recognition and disambiguation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 879888.CrossRefGoogle Scholar
Ma, X. and Hovy, E. (2016). End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354.Google Scholar
Martins, P.H., Marinho, Z. and Martins, A.F. (2019a). Joint learning of named entity recognition and entity linking. arXiv preprint arXiv:1907.08243.CrossRefGoogle Scholar
Martins, P.H., Marinho, Z. and Martins, A.F.T. (2019b). Joint learning of named entity recognition and entity linking. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy. Association for Computational Linguistics, pp. 190196.CrossRefGoogle Scholar
Maynard, D., Tablan, V., Ursu, C., Cunningham, H. and Wilks, Y. (2001). Named entity recognition from diverse text types. In Recent Advances in Natural Language Processing 2001 Conference, pp. 257274.Google Scholar
McDonald, D.D. (1993). Internal and external evidence in the identification and semantic categorization of proper names. In Acquisition of Lexical Knowledge from Text.Google Scholar
Mikheev, A. (1999). A knowledge-free method for capitalized word disambiguation. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 159166.CrossRefGoogle Scholar
Miller, D., Boisen, S., Schwartz, R., Stone, R. and Weischedel, R. (2000). Named entity extraction from noisy input: speech and ocr. In Proceedings of the sixth conference on Applied natural language processing. Association for Computational Linguistics, pp. 316324.Google Scholar
Mitton, R. (1987). Spelling checkers, spelling correctors and the misspellings of poor spellers. Information Processing & Management 23(5), 495505.CrossRefGoogle Scholar
Munnelly, G. and Lawless, S. (2018). Investigating entity linking in early english legal documents. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL’18. New York, NY, USA: Association for Computing Machinery, pp. 5968.CrossRefGoogle Scholar
Mutuvi, S., Doucet, A., Odeo, M. and Jatowt, A. (2018). Evaluating the impact of OCR errors on topic modeling. In Dobreva M., Hinze A. and Zumer, M. (eds), Maturity and Innovation in Digital Libraries - 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19–22, 2018, Proceedings. Lecture Notes in Computer Science, vol. 11279. Springer, pp. 314.CrossRefGoogle Scholar
Nadeau, D. and Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 326.CrossRefGoogle Scholar
Nguyen, T.T.H., Jatowt, A., Coustaty, M. and Doucet, A. (2021). Survey of post-ocr processing approaches. ACM Computing Surveys 54(6), 124:1124:37.Google Scholar
Nguyen, T.-T.-H., Jatowt, A., Coustaty, M., Nguyen, N.-V. and Doucet, A. et al. (2019). Deep statistical analysis of ocr errors for effective post-ocr processing. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, pp. 2938.CrossRefGoogle Scholar
Palmer, D.D. and Ostendorf, M. (2001). Improving information extraction by modeling errors in speech recognizer output. In Proceedings of the First International Conference on Human Language Technology Research. Association for Computational Linguistics, pp. 15.Google Scholar
Pennington, J., Socher, R. and Manning, C. (2014). Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 15321543.Google Scholar
Peters, M.E., Ammar, W., Bhagavatula, C. and Power, R. (2017). Semi-supervised sequence tagging with bidirectional language models. arXiv preprint arXiv:1705.00108.Google Scholar
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.Google Scholar
Petkova, D. and Croft, W.B. (2007). Proximity-based document representation for named entity retrieval. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 731740.CrossRefGoogle Scholar
Ravi, M.P.K., Singh, K., Mulang, I.O., Shekarpour, S., Hoffart, J. and Lehmann, J. (2021). Cholan: a modular approach for neural entity linking on wikipedia and wikidata. arXiv preprint arXiv:2101.09969.Google Scholar
Ritter, A., Clark, S., Mausam, and Etzioni, O. et al. (2011). Named entity recognition in tweets: an experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 15241534.Google Scholar
Rodriquez, K.J., Bryant, M., Blanke, T. and Luszczynska, M. (2012). Comparison of named entity recognition tools for raw OCR text. In KONVENS, pp. 410414.Google Scholar
Ruiz, P. and Poibeau, T. (2019). Mapping the Bentham Corpus: concept-based navigation. Journal of Data Mining and Digital Humanities. Special Issue: Digital Humanities between knowledge and know-how (Atelier Digit_Hum).Google Scholar
Shen, W., Wang, J. and Han, J. (2015). Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 27(2), 443460.CrossRefGoogle Scholar
Sil, A. and Yates, A. (2013). Re-ranking for joint named-entity recognition and linking. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 23692374.CrossRefGoogle Scholar
Suchanek, F.M., Kasneci, G. and Weikum, G. (2007). Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, WWW’07. New York, NY, USA: ACM, pp. 697706.CrossRefGoogle Scholar
Taghva, K., Borsack, J. and Condit, A. (1996). Effects of OCR errors on ranking and feedback using the vector space model. Information Processing and Management 32(3), 317327.CrossRefGoogle Scholar
van Hooland, S., De Wilde, M., Verborgh, R., Steiner, T. and Van de Walle, R. (2013). Exploring entity recognition and disambiguation for cultural heritage collections. Digital Scholarship in the Humanities 30(2), 262279.CrossRefGoogle Scholar
van Hulst, J.M., Hasibi, F., Dercksen, K., Balog, K. and de Vries, A.P. (2020). Rel: an entity linker standing on the shoulders of giants. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.CrossRefGoogle Scholar
van Strien, D., Beelen, K., Ardanuy, M.C., Hosseini, K., McGillivray, B. and Colavizza, G. (2020). Assessing the impact of ocr quality on downstream NLP tasks. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ARTIDIGH. INSTICC, SciTePress, pp. 484496.CrossRefGoogle Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017). Attention is all you need. In Guyon I., Luxburg U.V., Bengio S., Wallach H., Fergus R., Vishwanathan S. and Garnett R. (eds), Advances in Neural Information Processing Systems, vol. 30, Long Beach, California, USA. Curran Associates, Inc., pp. 59986008 Google Scholar
Yaser, A.-O. (2005). Effect of degraded input on statistical machine translation. In 2005 Symposium on Document Image Understanding Technology, p. 103.Google Scholar
Zhang, W., Sim, Y.C., Su, J. and Tan, C.L. (2011). Entity linking with effective acronym expansion, instance selection and topic modeling. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three, IJCAI’11. AAAI Press, pp. 19091914.Google Scholar
Zheng, Z., Li, F., Huang, M. and Zhu, X. (2010). Learning to link entities with knowledge base. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT’10. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 483491.Google Scholar
Zuccon, G., Nguyen, A.N., Bergheim, A., Wickman, S. and Grayson, N. (2012). The impact of OCR accuracy on automated cancer classification of pathology reports. In HIC, pp. 250256.Google Scholar