Skip to main content

Extensive data for morphology: using the World Wide Web


This paper presents a number of recent studies in French morphology which make extensive use of data. These data relating to derived words have been automatically collected from digital corpora, mostly from the Web. The main point developed here is that this massive increase in the amount of available data can substantially modify the results of a morphological study, and can lead to new theoretical conclusions that would not have been possible with traditional data such as wordlists gathered from dictionaries. However, using the Web as a corpus brings up several technical and methodological questions, which are dealt with through examples and discussions about the different tools and techniques available. We exemplify our thesis through the study of the suffixal forms: -esque, -este, -able, -ment.

Corresponding author
Address for correspondence: Fabio Montermini, Université de Toulouse-Le Mirail, Maison de la Recherche, 5, allées Antonio Machado, F-31058 Toulouse Cedex 9, France e-mail:
Hide All
Anscombre, J.-C. and Leeman, D. (1994). La dérivation des adjectifs en -ble: morphologie ou sémantique?. Langue française, 103: 3244.
Baayen, R. H. (1991). Quantitative aspects of morphological productivity. In Booij, G. E. and van Marle, J. (eds.), Yearbook of Morphology 1991, Dordrecht: Kluwer Academic Publishers, pp. 109149.
Burzio, L. (2002) Surface-to-surface morphology: when your representations turn into constraints. In: Boucher, P. (ed.), Many Morphologies. Somerville, MA: Cascadilla Press, pp. 142177.
Dal, G. (2007). Les adverbes de manière en −ment du français: dérivation ou flexion?. In: Hathout, N. and Montermini, F. (eds.), Morphologie à Toulouse. Munich: Lincom, pp. 121149.
Fradin, B. (1997). Esquisse d'une sémantique de la préfixation en anti-. Recherches linguistiques de Vincennes, 26: 87112.
Fradin, B., Dal, G., Grabar, N., Lignon, S., Namer, F., Tribout, D. and Zweigenbaum, P. (to appear). Remarques sur l'usage des corpus en morphologie. Langages.
Gawelko, M. (1977). Evolution des suffixes adjectivaux en français. Wroclaw, Poland: Polska Akademia Nauk Komitet Neofilologiczny.
Hathout, N. and Tanguy, L. (2002). Webaffix: a tool for finding and validating morphological links on the WWW. In: Rodríguez, M. G. and Araujo, C. P. S. (eds.), Proceedings of the Third International Conference on Language Resources and Evaluation. Las Palmas de Gran Canaria, Spain: ELRA, pp. 17991804.
Hathout, N., Plénat, M. and Tanguy, L. (2003). Enquête sur les dérivés en -able. Cahiers de Grammaire, 28: 4990.
Hathout, N., Namer, F., Plénat, M. and Tanguy, L. (to appear). La collecte et l'utilisation des données en morphologie. In Fradin, B., Kerleroux, F. and Plénat, M. (eds.), Aperçus de morphologie du français. Saint-Denis: Presses Universitaires de Vincennes.
Leeman, D. (1992). Deux classes d'adjectifs en -ble. Langue française, 96: 4464.
Leeman, D. and Meleuc, S. (1990). Verbes en tables et adjectifs en -able. Langue française, 87: 3051.
Lignon, S. and Plénat, M. (to appear). Echangisme suffixal et contraintes phonologiques. (Cas des dérivés en -ien et en -icien). In Fradin, B., Kerleroux, F. and Plénat, M. (eds.), Aperçus de morphologie du français. Saint-Denis: Presses Universitaires de Vincennes.
Lüdeling, A., Evert, S. and Baroni, M. (2007). Using Web data for linguistic purposes. In: Hundt, M., Nesselhauf, N. and Biewer, C. (eds.), Corpus Linguistics and the Web. Amsterdam: Rodopi, pp. 724.
Molinier, C. (1992). Sur la productivité adverbiale des adjectifs. Langue française, 96: 6573.
Namer, F. (2003). WaliM: valider les unités morphologiques par le Web. In: Fradin, B., Dal, G., Kerleroux, F., Hathout, N., Plénat, M. and Roché, M. (eds.), Les unités morphologiques. Lille: Forum de morphologie, pp. 142150.
Pichon, E. (1940). Attache d'un suffixe à un complexe. Le français moderne, 8: 27–23.
Plénat, M. (1996). De l'interaction des contraintes: une étude de cas. In: Durand, J. and Laks, B. (eds.), Current Trends in Phonology: Models and Methods. Salford: ESRI, pp. 585615.
Plénat, M. (1997). Analyse morpho-phonologique d'un corpus d'adjectifs en -esque. Journal of French Language Studies, 7: 163179.
Plénat, M. (2000). Quelques thèmes de recherche actuels en morphophonologie française. Cahiers de lexicologie, 77: 2762.
Plénat, M. (to appear). Les contraintes de taille. In: Fradin, B., Kerleroux, F. and Plénat, M. (eds.), Aperçus de morphologie du français. Saint-Denis: Presses Universitaires de Vincennes.
Plénat, M. and Boyé, G. (to appear). Le Choix des thèmes dans les dérivés désadjectivaux en français. In Tranel, B. (ed.), Understanding Allomorphy. Perspectives from Optimality Theory. London: Equinox Publishing.
Plénat, M., Lignon, S., Serna, N. and Tanguy, L. (2002). La conjecture de Pichon. Corpus et recherches linguistiques, 1: 105150.
Resnik, P. and Elkiss, A. (2005). The linguist's search engine: an overview. In: Knight, K., Ng, H. T. and Oflazer, K. (eds.), Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. Ann Arbor, MI: University of Michigan. pp. 3336.
Santini, M. (2006). Identifying genres of Web pages. In: Mertens, P., Fairon, C., Dister, A. and Watrin, P. (eds.), Verbum ex machina. Actes de la 13e conférence sur le traitement automatique du langage (TALN 2006). Louvain-la-Neuve: Presses Universitaires de Louvain, pp. 307316.
Yvon, F. (1996). Prononcer par analogie: motivation, formalisation et évaluation. Unpublished PhD thesis, Paris: E.N.S.T.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Journal of French Language Studies
  • ISSN: 0959-2695
  • EISSN: 1474-0079
  • URL: /core/journals/journal-of-french-language-studies
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed