Skip to main content

EHME: a New Word Database for Research in Basque Language

  • Joana Acha (a1), Itziar Laka (a1), Josu Landa (a1) and Pello Salaburu (a1)

This article presents EHME, the frequency dictionary of Basque structure, an online program that enables researchers in psycholinguistics to extract word and nonword stimuli, based on a broad range of statistics concerning the properties of Basque words. The database consists of 22.7 million tokens, and properties available include morphological structure frequency and word-similarity measures, apart from classical indexes: word frequency, orthographic structure, orthographic similarity, bigram and biphone frequency, and syllable-based measures. Measures are indexed at the lemma, morpheme and word level. We include reliability and validation analysis. The application is freely available, and enables the user to extract words based on concrete statistical criteria 1 , as well as to obtain statistical characteristics from a list of words 2 .

Corresponding author
*Correspondence concerning this article should be addressed to Joana Acha. Department of Basic Cognitive processes. Universidad del País Vasco UPV/EHU. Tolosa Hiribidea. 20018. Donostia (Spain). E-mail:
Hide All
Acha, J., Laka, I., & Perea, M. (2010). Reading development in agglutinative languages: Evidence with beginning, intermediate and adult Basque readers. Journal of Experimental Child Psychology, 105, 359375.
Acha, J., & Perea, M. (2008). The effect of neighborhood frequency in reading: Evidence with transposed-letter neighbors. Cognition, 108, 290300.
Alvarez, C. J., Carreiras, M., & Taft, M. (2001). Syllables and morphemes: Contrasting frequency effects in Spanish. Journal of Experimental Psychology: Learning, Memory and Cognition, 27, 545555.
Azkarate, M. (1993). Basque compound nouns and generative morphology: Some data. In Ortiz de Urbina, J., & Hualde, J. I., (Eds.), Generative studies in Basque linguisstics. Amsterdam, Philadelphia: John Benjamins.
Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical Access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human Perception and Performance. 10, 340357.
Berent, I., & Marom, M. (2005). The skeletal structure of printed words: Evidence from the Stroop task. Journal of Experimental Psychology: Human Perception & Performance, 31, 328338.
Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, A., & Böhl, A. (2011). The word frequency effect. Experimental Psychology, 58, 412424.
Buchwald, A., & Rapp, B. (2006). Consonants and vowels in orthographic representation. Cognitive Neuropsychology, 23, 308337.
Caramazza, A. (1990). The structure of graphemic representations. Cognition, 37, 243297.
Carreiras, M., Alvarez, C. J., & de Vega, M. (1993). Syllable frequency and visual word recognition in Spanish. Journal of Memory and Language, 32, 766780.
Carreiras, M., Duñabeitia, J. A., Vergara, M., de la Cruz-Pavia, I., & Laka, I. (2010). Subject relative clauses are not universally easier to process: Evidence from Basque. Cognition, 115, 7992.
Carreiras, M., & Perea, M. (2002). Masked priming effects with syllabic neighbors in the lexical decision task. Journal of Experimental Psychology: Human Perception & Performance, 28, 12281242.
Carreiras, M., & Perea, M. (2004). Naming pseudowords in Spanish: Effects of syllable frequency. Brain & Language, 90, 393400.
Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In Dornic, S. (Ed.), Attention and performance VI (pp. 535555). New York, NY: Academic Press.
Davis, C. J. (2005). N-Watch: A program for deriving neighborhood size and other psycholinguistic statistics. Behavior Research Methods, 37, 6570.
Davis, C. J., & Perea, M. (2005). BuscaPalabras: A program for deriving orthographic and phonological neighborhood statistics and other psycholinguistic indices in Spanish. Behavior Research Methods, 37, 665671.
Davis, C. J., Perea, M., & Acha, J. (2009). Re(de)fining the orthographic neighbourhood: The role of addition and deletion neighbors in lexical decision and reading. Journal of Experimental Psychology: Human Perception and Performance, 35, 15501570.
De Rijk, R. (2007). Standard Basque, a progressive grammar. Cambridge, MA: MIT Press.
Dixon, R. M. W. (1994). Ergativity, Cambridge studies in linguistics 69. Cambrige, UK: Cambridge University Press.
Erdozia, K., Laka, I., Mestres-Misse, A., & Rodriguez-Fornells, A. (2009). Syntactic complexity and ambiguity resolution in a free word-order language: Behavioral and electrophysiological evidences from Basque. Brain and Language, 109, 117.
Forster, K. I., & Forster, J. C. (2003). DMDX: A Windows display program with millisecond accuracy. Behavior Research Methods, Instruments, & Computers, 35, 16124.
Giraudo, H., & Grainger, J. (2000). Effects of prime word frequency and cumulative root frequency in masked morphological priming. Language and Cognitive Processes, 15, 421444.
Grainger, J. (1990). Word frequency and neighborhood frequency effects in lexical decision and naming. Journal of Memory and Language, 29, 228244.
Hino, Y., & Lupker, S. J. (2000). Effects of Word frequency and spelling to sound Regularity in naming with and without preceding lexical decision. Journal of Experimental Psychology: Human Perception and Performance, 26, 166183.
Holopainen, L., Ahonen, T., & Lyytinen, H. (2002). The role of reading by analogy in first grade Finnish readers. Scandinavian Journal of Educational Research, 46, 8398.
Hualde, J. I., & Ortiz de Urbina, J. (Eds.) (2003). A grammar of Basque. New York, NY: Mouton de Gruyter.
Laka, I. (1996). A brief grammar of Euskara, the Basque language. Vitoria-Gasteiz, Spain: Universidad del País Vasco/Euskal Herriko Unibertsitatea. Retrieved from
Laka, I. (2006). Deriving split-ergativity in the progressive: The case of Basque. In Johns, Alana, Massam, Diane, & Ndayuragije, Juvenal (Eds.) Ergativity: Emerging Issues (pp. 173195). Dordrecht, Berlin: Springer.
Laka, I., & Korostola, L. E. (2001). Aphasia manifestations in Basque. Journal of Neurolinguistics, 14, 133157.
Miller, B., Juhasz, B. J., & Rayner, K. (2006). The orthographic uniqueness point and eye movements during reading. British Journal of Psychology, 97, 191216.
Perea, M., & Carreiras, M. (1998). Effects of syllable frequency and syllable neighborhood frequency in visual word recognition. Journal of Experimental Psychology: Human Perception and Performance, 24, 134144.
Perea, M., & Pollatsek, A. (1998). The effects of neighborhood frequency in reading and lexical decision. Journal of Experimental Psychology: Human Perception and Performance, 24, 767779.
Perea, M., Urkia, M., Davis, C. J., Agirre, A., Laseka, E., & Carreiras, M. (2006). E-Hitz: A word-frequency list and a program for deriving psycholinguistic statistics in an agglutinative language (Basque). Behavior Research Methods, 38, 610615.
Landa, J., Sarasola, I., & Salaburu, P. (2010). Euskal Hiztegiaren Maiztasun Egitura (EHME). Euskal Herriko Unibertsitatea [Dictionary of frequency structures in Basque. University of the Basque Country]. Bilbao, Spain: Euskara Institutoa.
Sarasola, I., Salaburu, P., Landa, J., & Zabaleta, J. (2007). Ereduzko Prosa Gaur (EPG). Euskal Herriko Unibertsitatea [Current prototypical prose. University of the Basque Country]. Bilbao, Spain: Euskara Institutoa.
Taft, M. (2004). Morphological decomposition and the reverse base frequency effect. The Quarterly Journal of Experimental Psychology, 57, 745765.
Treiman, R., & Zukowski, A. (1991). Levels of phonological awareness. In Brady, S. A. & Shankweiler, D. P. (Eds.), Phonological processes in literacy. A tribute to Isabelle Y. Liberman (pp. 6783). Hillsdale, NJ: Erlbaum.
van Heuven, W. J. B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 11761190.
Whitney, C. (2001). How the brain encodes the order of letters in a printed word: The SERIOL model and selective literature review. Psychonomic Bulletin and Review, 8, 221243.
Zawiszewski, A., Gutierrez, E., Fernandez, B., & Laka, I. (2011). Language distance and non-native syntactic processing: Evidence from event-related potentials. Bilingualism: Language and Cognition, 14, 400411.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

The Spanish Journal of Psychology
  • ISSN: 1138-7416
  • EISSN: 1988-2904
  • URL: /core/journals/spanish-journal-of-psychology
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Altmetric attention score

Full text views

Total number of HTML views: 2
Total number of PDF views: 21 *
Loading metrics...

Abstract views

Total abstract views: 290 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 16th August 2018. This data will be updated every 24 hours.