Skip to main content

Morphologically rich Urdu grammar parsing using Earley algorithm


This work presents the development and evaluation of an extended Urdu parser. It further focuses on issues related to this parser and describes the changes made in the Earley algorithm to get accurate and relevant results from the Urdu parser. The parser makes use of a morphologically rich context free grammar extracted from a linguistically-rich Urdu treebank. This grammar with sufficient encoded information is comparable with the state-of-the-art parsing requirements for the morphologically rich Urdu language. The extended parsing model and the linguistically rich extracted-grammar both provide us better evaluation results in Urdu/Hindi parsing domain. The parser gives 87% of f-score, which outperforms the existing parsing work of Urdu/Hindi based on the tree-banking approach.

Hide All
Abbas, Q. 2012. Building a hierarchical annotated corpus of urdu: the URDU.KON-TB Treebank. Lecture Notes in Computer Science 7181 (1): 6679.
Abbas, Q., 2014a. Building Computational Resources : The URDU.KON-TB Treebank and the Urdu Parser. PhD thesis, Germany: KOPS, University of Konstanz.
Abbas, Q., 2014b. Exploiting language variants via grammar parsing having morphologically rich information. In Proceedings of the EMNLP Workshop on Language Technology for Closely Related Languages and Language Variants, Association for Computational Linguistics, Doha, Qatar, pp. 35–45.
Abbas, Q., 2014c. Semi-semantic part of speech annotation and evaluation. In Proceedings of ACL 8th Linguistic Annotation Workshop held in conjunction with COLING, Association for Computational Linguistics, Dublin, Ireland, pp. 75–81.
Abbas, Q., Karamat, N., and Niazi, S., 2009. Development of tree-bank based probabilistic grammar for Urdu language. International Journal of Electrical & Computer Science 9 (09): 231235.
Abbas, Q., and Nabi Khan, A., 2009. Lexical functional grammar for Urdu modal verbs. In IEEE International Conference on Emerging Technologies (ICET), IEEE, Islamabad, Pakistan, pp. 7–12.
Abbas, Q., and Raza, G., 2014. A computational classification of Urdu dynamic copula verb. International Journal of Computer Applications 85 (10): 112.
Abbas, Q., Zia, T., and Khan, A. N., 2015. Syntactic and semantic analysis of Urdu modal verbs using XLE parser. International Journal of Computer Applications 107 (10): 3946.
Agrawal, B., Agarwal, R., Husain, S., and Sharma, D. M. 2013. An automatic approach to treebank error detection using a dependency parser. In Computational Linguistics and Intelligent Text Processing, pp. 294303. Samos, Greece. Springer-Verlag.
Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2007. Compilers: Principles, Techniques, & Tools, vol. 1009. USA: Pearson/Addison Wesley.
Ali, W., and Hussain, S. 2010. Urdu dependency parser: a data-driven approach. In Proceedings of Conference on Language and Technology (CLT10), SNLP, Lahore, Pakistan.
Appel, A. W., and Palsberg, J., 2007. Modern Compiler Implementation in Java. New York: Cambridge University Press.
Arun, A., and Keller, F. 2005. Lexicalization in crosslinguistic probabilistic parsing: the case of French. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 306–313. Ann Arbor, Michigan, United States.
Aycock, J., and Horspool, R. N. 2002. Practical earley parsing. The Computer Journal 45 (6): 620630.
Begum, R., Husain, S., Dhwaj, A., Sharma, D. M., Bai, L., and Sangal, R. 2008. Dependency annotation scheme for Indian languages. In Proceedings of The 3rd International Joint Conference on Natural Language Processing (IJCNLP), IIIT, Hyderabad, India, pp. 721–726.
Bharati, A., Bhatia, M., Chaitanya, V., and Sangal, R. 1996. Paninian grammar framework applied to english. Technical Report, TRCS-96-238, CSE, IIIT, Kanpur, India.
Bharati, A., Chaitanya, V., Sangal, R., and Ramakrishnamacharyulu, K., 1995. Natural Language Processing: A Paninian Perspective. New Delhi: Prentice-Hall of India.
Bharati, A., Gupta, M., Yadav, V., Gali, K., and Sharma, D. M., 2009. Simple parser for Indian languages in a dependency framework. In Proceedings of the 3rd Linguistic Annotation Workshop, Association for Computational Linguistics, Singapore, pp. 162–165.
Bharati, A., Husain, S., Sharma, D. M., and Sangal, R. 2008. A two-stage constraint based dependency parser for free word order languages. In Proceedings of the COLIPS International Conference on Asian Language Processing 2008 (IALP), COLIPS, Thailand.
Bhat, R. A., Jain, S., and Sharma, D. M. 2012. Experiments on dependency parsing of Urdu. In Proceedings of the 11th International Workshop on Treebanks and Linguistic Theories (TLT11), edi-colibri, Portugal.
Butt, M. 1993. Hindi-Urdu infinitives as NPs. In Kachru, Y. (ed.), South Asian Language Review: Special Issue on Studies in Hindi-Urdu, vol. 3(1), pp. 5172. New Delhi: Creative Publishers.
Butt, M. 2003. The light verb jungle. In Harvard Working Papers in Linguistics, Harvard University, USA.
Butt, M. 2010. The light verb jungle: still hacking away. In Amberber, M., Harvey, M., and Baker, B. (eds.), Complex Predicates in Cross-Linguistic Perspective, pp. 4878. USA: Cambridge University Press.
Butt, M., and King, T. H. 2007. Urdu in a parallel grammar development environment. In Takenobu, T., and Huang, C. -R. (eds.), Language Resources and Evaluation: Special Issue on Asian Language Processing: State of the Art Resources and Processing, vol. 41, pp. 191207. Netherlands: Kluwer Academic Publishers.
Butt, M., and Ramchand, G. 2001. Complex aspectual structure in Hindi/Urdu. In Liakata, M., Jensen, B., and Maillat, D. (eds.), Oxford Working Papers in Linguistics, Philology and Phonetics, pp. 130. UK: Oxford University.
Butt, M., and Rizvi, J. 2010. Tense and aspect in Urdu. In Cabredo-Hofherr, P., and Laca, B. (eds.), Layers of Aspect. Stanford: CSLI Publications.
Chomsky, N., 1956. Three models for the description of language. IRE Trans. Inform. Theory 2 (3): 113124.
Collins, M., Ramshaw, L., Hajič, J., and Tillmann, C., 1999. A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Maryland, USA, pp. 505–512.
Corazza, A., Lavelli, A., Satta, G., and Zanoli, R. 2004. Analyzing an Italian treebank with state-of-the-art statistical parsers. In Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories (TLT 2004), Kluwer Academic Publishers, Tuebingen, Germany.
Dubey, A., and Keller, F., 2003. Probabilistic parsing for German using sister-head dependencies. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics, Sapporo, Japan, pp. 96–103.
Earley, J., 1970. An efficient context-free parsing algorithm. Communications of the ACM 13 (2): 94102.
Earley, J. C., 1968. An Efficient Context-Free Parsing Algorithm. PhD thesis, PA, USA: Carnegie Mellon University, Pittsburgh.
Hopcroft, J. E., Motwani, R., and Ullman, J. D., 2001. Introduction to Automata Theory, Languages, and Computation. USA: Addison-Wesley.
Jiang, W., Xiong, H., and Liu, Q. 2009. Mutipath shift-reduce parsing with online training. In Proceedings of 1st Workshop on Chinese Syntactic Parsing Evaluation, CIPS ParsEval, Beijing.
Khan, A. J., 2006. Urdu/Hindi: An Artificial Divide: African Heritage, Mesopotamian Roots, Indian Culture & Britiah Colonialism. USA: Algora Publishers.
Kulick, S., Gabbard, R., and Marcus, M. 2006. Parsing the Arabic treebank: analysis and improvements. In: Hajič, J., and Nivre, J. (eds.), Proceedings of the TLT06, Institute of Formal and Applied Linguistics, Prague, Czech Republic, pp. 31–42.
Leblanc, R., and Fischer, C. N., 1988. Crafting a Compiler. USA: Benjamin-Cummings Publishing Company.
Lewis, P. M., Simons, G. F., and Fennig, C. D. 2013. Ethnologue: Languages of the World, 17th ed. Dallas: SIL International.
McDonald, R., Pereira, F., Ribarov, K., and Hajič, J. 2005. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 523–530. Vancouver, B.C., Canada.
McLane, J. R., 1970. The Political Awakening in India. New Jersey, US: Prentice Hall.
Mukhtar, N., Khan, M. A., and Zuhra, F. T., 2011. Probabilistic context free grammar for Urdu. Linguistic and Literature Review 1 (1): 8694.
Mukhtar, N., Khan, M. A., and Zuhra, F. T., 2012. Algorithm for developing Urdu Probabilistic Parser. International journal of Electrical and Computer Sciences 12 (3): 5766.
Mukhtar, N., Khan, M. A., Zuhra, F. T., and Chiragh, N., 2012. Implementation of Urdu probabilistic parser. International Journal of Computational Linguistics (IJCL) 3 (1): 1220.
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., and Marsi, E., 2007. MaltParser: a language-independent system for data-driven dependency parsing. Natural Language Engineering 13 (2): 95135.
Sikkel, K., and Nijholt, A., 1997. Parsing of Context-Free Languages. Berlin/Heidelberg, Germany: Springer Verlag.
Tesnière, L., and Fourquet, J. 1959. Eléments de Syntaxe Structurale, vol. 1965. Paris: Klincksieck.
Tsarfaty, R., Seddah, D., Goldberg, Y., Kuebler, S., Candito, M., Foster, J., Versley, Y., Rehbein, I., and Tounsi, L. 2010. Statistical parsing of morphologically rich languages (SPMRL): what, how and whither. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages, Association for Computational Linguistics, Los Angeles, CA.
Tsarfaty, R., Seddah, D., Kübler, S., and Nivre, J., 2013. Parsing morphologically rich languages: introduction to the special issue. Computational Linguistics 39 (1): 1522.
Tsarfaty, R., and Sima’an, K., 2007. Three-dimensional parametrization for parsing morphologically rich languages. In Proceedings of the 10th International Conference on Parsing Technologies, Association for Computational Linguistics, Prague, Czech Republic, pp. 156–167.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 3
Total number of PDF views: 53 *
Loading metrics...

Abstract views

Total abstract views: 580 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 22nd May 2018. This data will be updated every 24 hours.