Hostname: page-component-788cddb947-wgjn4 Total loading time: 0 Render date: 2024-10-10T04:06:20.974Z Has data issue: false hasContentIssue false

Morphologically rich Urdu grammar parsing using Earley algorithm

Published online by Cambridge University Press:  16 April 2015

QAISER ABBAS*
Affiliation:
Fachbereich Sprachwissenschaft, Universität Konstanz, 78457 Konstanz, Germany e-mail: qaiser.abbas@uni-konstanz.de

Abstract

This work presents the development and evaluation of an extended Urdu parser. It further focuses on issues related to this parser and describes the changes made in the Earley algorithm to get accurate and relevant results from the Urdu parser. The parser makes use of a morphologically rich context free grammar extracted from a linguistically-rich Urdu treebank. This grammar with sufficient encoded information is comparable with the state-of-the-art parsing requirements for the morphologically rich Urdu language. The extended parsing model and the linguistically rich extracted-grammar both provide us better evaluation results in Urdu/Hindi parsing domain. The parser gives 87% of f-score, which outperforms the existing parsing work of Urdu/Hindi based on the tree-banking approach.

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abbas, Q. 2012. Building a hierarchical annotated corpus of urdu: the URDU.KON-TB Treebank. Lecture Notes in Computer Science 7181 (1): 6679.CrossRefGoogle Scholar
Abbas, Q., 2014a. Building Computational Resources : The URDU.KON-TB Treebank and the Urdu Parser. PhD thesis, Germany: KOPS, University of Konstanz.Google Scholar
Abbas, Q., 2014b. Exploiting language variants via grammar parsing having morphologically rich information. In Proceedings of the EMNLP Workshop on Language Technology for Closely Related Languages and Language Variants, Association for Computational Linguistics, Doha, Qatar, pp. 35–45.Google Scholar
Abbas, Q., 2014c. Semi-semantic part of speech annotation and evaluation. In Proceedings of ACL 8th Linguistic Annotation Workshop held in conjunction with COLING, Association for Computational Linguistics, Dublin, Ireland, pp. 75–81.Google Scholar
Abbas, Q., Karamat, N., and Niazi, S., 2009. Development of tree-bank based probabilistic grammar for Urdu language. International Journal of Electrical & Computer Science 9 (09): 231235.Google Scholar
Abbas, Q., and Nabi Khan, A., 2009. Lexical functional grammar for Urdu modal verbs. In IEEE International Conference on Emerging Technologies (ICET), IEEE, Islamabad, Pakistan, pp. 7–12.Google Scholar
Abbas, Q., and Raza, G., 2014. A computational classification of Urdu dynamic copula verb. International Journal of Computer Applications 85 (10): 112.CrossRefGoogle Scholar
Abbas, Q., Zia, T., and Khan, A. N., 2015. Syntactic and semantic analysis of Urdu modal verbs using XLE parser. International Journal of Computer Applications 107 (10): 3946.CrossRefGoogle Scholar
Agrawal, B., Agarwal, R., Husain, S., and Sharma, D. M. 2013. An automatic approach to treebank error detection using a dependency parser. In Computational Linguistics and Intelligent Text Processing, pp. 294303. Samos, Greece. Springer-Verlag.CrossRefGoogle Scholar
Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2007. Compilers: Principles, Techniques, & Tools, vol. 1009. USA: Pearson/Addison Wesley.Google Scholar
Ali, W., and Hussain, S. 2010. Urdu dependency parser: a data-driven approach. In Proceedings of Conference on Language and Technology (CLT10), SNLP, Lahore, Pakistan.Google Scholar
Appel, A. W., and Palsberg, J., 2007. Modern Compiler Implementation in Java. New York: Cambridge University Press.Google Scholar
Arun, A., and Keller, F. 2005. Lexicalization in crosslinguistic probabilistic parsing: the case of French. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 306–313. Ann Arbor, Michigan, United States.CrossRefGoogle Scholar
Aycock, J., and Horspool, R. N. 2002. Practical earley parsing. The Computer Journal 45 (6): 620630.CrossRefGoogle Scholar
Begum, R., Husain, S., Dhwaj, A., Sharma, D. M., Bai, L., and Sangal, R. 2008. Dependency annotation scheme for Indian languages. In Proceedings of The 3rd International Joint Conference on Natural Language Processing (IJCNLP), IIIT, Hyderabad, India, pp. 721–726.Google Scholar
Bharati, A., Bhatia, M., Chaitanya, V., and Sangal, R. 1996. Paninian grammar framework applied to english. Technical Report, TRCS-96-238, CSE, IIIT, Kanpur, India.Google Scholar
Bharati, A., Chaitanya, V., Sangal, R., and Ramakrishnamacharyulu, K., 1995. Natural Language Processing: A Paninian Perspective. New Delhi: Prentice-Hall of India.Google Scholar
Bharati, A., Gupta, M., Yadav, V., Gali, K., and Sharma, D. M., 2009. Simple parser for Indian languages in a dependency framework. In Proceedings of the 3rd Linguistic Annotation Workshop, Association for Computational Linguistics, Singapore, pp. 162–165.Google Scholar
Bharati, A., Husain, S., Sharma, D. M., and Sangal, R. 2008. A two-stage constraint based dependency parser for free word order languages. In Proceedings of the COLIPS International Conference on Asian Language Processing 2008 (IALP), COLIPS, Thailand.CrossRefGoogle Scholar
Bhat, R. A., Jain, S., and Sharma, D. M. 2012. Experiments on dependency parsing of Urdu. In Proceedings of the 11th International Workshop on Treebanks and Linguistic Theories (TLT11), edi-colibri, Portugal.Google Scholar
Butt, M. 1993. Hindi-Urdu infinitives as NPs. In Kachru, Y. (ed.), South Asian Language Review: Special Issue on Studies in Hindi-Urdu, vol. 3(1), pp. 5172. New Delhi: Creative Publishers.Google Scholar
Butt, M. 2003. The light verb jungle. In Harvard Working Papers in Linguistics, Harvard University, USA.Google Scholar
Butt, M. 2010. The light verb jungle: still hacking away. In Amberber, M., Harvey, M., and Baker, B. (eds.), Complex Predicates in Cross-Linguistic Perspective, pp. 4878. USA: Cambridge University Press.Google Scholar
Butt, M., and King, T. H. 2007. Urdu in a parallel grammar development environment. In Takenobu, T., and Huang, C. -R. (eds.), Language Resources and Evaluation: Special Issue on Asian Language Processing: State of the Art Resources and Processing, vol. 41, pp. 191207. Netherlands: Kluwer Academic Publishers.Google Scholar
Butt, M., and Ramchand, G. 2001. Complex aspectual structure in Hindi/Urdu. In Liakata, M., Jensen, B., and Maillat, D. (eds.), Oxford Working Papers in Linguistics, Philology and Phonetics, pp. 130. UK: Oxford University.Google Scholar
Butt, M., and Rizvi, J. 2010. Tense and aspect in Urdu. In Cabredo-Hofherr, P., and Laca, B. (eds.), Layers of Aspect. Stanford: CSLI Publications.Google Scholar
Chomsky, N., 1956. Three models for the description of language. IRE Trans. Inform. Theory 2 (3): 113124.CrossRefGoogle Scholar
Collins, M., Ramshaw, L., Hajič, J., and Tillmann, C., 1999. A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Maryland, USA, pp. 505–512.Google Scholar
Corazza, A., Lavelli, A., Satta, G., and Zanoli, R. 2004. Analyzing an Italian treebank with state-of-the-art statistical parsers. In Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories (TLT 2004), Kluwer Academic Publishers, Tuebingen, Germany.Google Scholar
Dubey, A., and Keller, F., 2003. Probabilistic parsing for German using sister-head dependencies. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics, Sapporo, Japan, pp. 96–103.Google Scholar
Earley, J., 1970. An efficient context-free parsing algorithm. Communications of the ACM 13 (2): 94102.CrossRefGoogle Scholar
Earley, J. C., 1968. An Efficient Context-Free Parsing Algorithm. PhD thesis, PA, USA: Carnegie Mellon University, Pittsburgh.Google Scholar
Hopcroft, J. E., Motwani, R., and Ullman, J. D., 2001. Introduction to Automata Theory, Languages, and Computation. USA: Addison-Wesley.Google Scholar
Jiang, W., Xiong, H., and Liu, Q. 2009. Mutipath shift-reduce parsing with online training. In Proceedings of 1st Workshop on Chinese Syntactic Parsing Evaluation, CIPS ParsEval, Beijing.Google Scholar
Khan, A. J., 2006. Urdu/Hindi: An Artificial Divide: African Heritage, Mesopotamian Roots, Indian Culture & Britiah Colonialism. USA: Algora Publishers.Google Scholar
Kulick, S., Gabbard, R., and Marcus, M. 2006. Parsing the Arabic treebank: analysis and improvements. In: Hajič, J., and Nivre, J. (eds.), Proceedings of the TLT06, Institute of Formal and Applied Linguistics, Prague, Czech Republic, pp. 31–42.Google Scholar
Leblanc, R., and Fischer, C. N., 1988. Crafting a Compiler. USA: Benjamin-Cummings Publishing Company.Google Scholar
Lewis, P. M., Simons, G. F., and Fennig, C. D. 2013. Ethnologue: Languages of the World, 17th ed. Dallas: SIL International.Google Scholar
McDonald, R., Pereira, F., Ribarov, K., and Hajič, J. 2005. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 523–530. Vancouver, B.C., Canada.CrossRefGoogle Scholar
McLane, J. R., 1970. The Political Awakening in India. New Jersey, US: Prentice Hall.Google Scholar
Mukhtar, N., Khan, M. A., and Zuhra, F. T., 2011. Probabilistic context free grammar for Urdu. Linguistic and Literature Review 1 (1): 8694.Google Scholar
Mukhtar, N., Khan, M. A., and Zuhra, F. T., 2012. Algorithm for developing Urdu Probabilistic Parser. International journal of Electrical and Computer Sciences 12 (3): 5766.Google Scholar
Mukhtar, N., Khan, M. A., Zuhra, F. T., and Chiragh, N., 2012. Implementation of Urdu probabilistic parser. International Journal of Computational Linguistics (IJCL) 3 (1): 1220.Google Scholar
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., and Marsi, E., 2007. MaltParser: a language-independent system for data-driven dependency parsing. Natural Language Engineering 13 (2): 95135.CrossRefGoogle Scholar
Sikkel, K., and Nijholt, A., 1997. Parsing of Context-Free Languages. Berlin/Heidelberg, Germany: Springer Verlag.CrossRefGoogle Scholar
Tesnière, L., and Fourquet, J. 1959. Eléments de Syntaxe Structurale, vol. 1965. Paris: Klincksieck.Google Scholar
Tsarfaty, R., Seddah, D., Goldberg, Y., Kuebler, S., Candito, M., Foster, J., Versley, Y., Rehbein, I., and Tounsi, L. 2010. Statistical parsing of morphologically rich languages (SPMRL): what, how and whither. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages, Association for Computational Linguistics, Los Angeles, CA.Google Scholar
Tsarfaty, R., Seddah, D., Kübler, S., and Nivre, J., 2013. Parsing morphologically rich languages: introduction to the special issue. Computational Linguistics 39 (1): 1522.CrossRefGoogle Scholar
Tsarfaty, R., and Sima’an, K., 2007. Three-dimensional parametrization for parsing morphologically rich languages. In Proceedings of the 10th International Conference on Parsing Technologies, Association for Computational Linguistics, Prague, Czech Republic, pp. 156–167.Google Scholar