Hostname: page-component-89b8bd64d-b5k59 Total loading time: 0 Render date: 2026-05-07T14:52:27.748Z Has data issue: false hasContentIssue false

Automatic generation of nominal phrases for Portuguese and Galician

Published online by Cambridge University Press:  03 June 2024

María José Domínguez Vázquez*
Affiliation:
Universidade de Santiago de Compostela, Instituto da Lingua Galega - ILG, Santiago de Compostela, Spain
Alberto Simões
Affiliation:
2Ai, School of Technology, IPCA, Barcelos, Portugal
Daniel Bardanca Outeiriño
Affiliation:
Universidade de Santiago de Compostela, CiTIUS, Santiago de Compostela, Spain
María Caíña Hurtado
Affiliation:
Universidade de Santiago de Compostela, Instituto da Lingua Galega - ILG, Santiago de Compostela, Spain
José Luis Iglesias Allones
Affiliation:
Universidade de Santiago de Compostela, Instituto da Lingua Galega - ILG, Santiago de Compostela, Spain
*
Corresponding author: María José Domínguez Vázquez; Email: majo.dominguez@usc.es
Rights & Permissions [Opens in a new window]

Abstract

This paper presents XeraWord, an innovative tool for automatically generating nominal phrases. XeraWord can be used for different tasks, ranging from teaching languages to the creation of examples in lexicography, or even for the development of resources for natural language processing. In this area, Xera was the first experiment, allowing the automatic generation of nominal phrases in three languages: German, French and Spanish. This tool was extended to support other languages, namely, Portuguese and Galician.

We start by presenting the theory behind the development of Xera and its new version, XeraWord, namely, the applied base methodology, and the natural language processing resources used to support it. Then, TraduWord, a tool specifically developed to construct resources for new languages, is presented. This tool allows the semi-automatic translation of the data required for the nominal phrase generation. For this, we discuss its advantages and disadvantages, analysing the quality of the translated resources, as well as the amount of manual work required to validate and correct these resources.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Table 1. Corpora results for the search of conversación con + determinant +director/-a in Sketch Engine

Figure 1

Figure 1. Methodological approach.

Figure 2

Figure 2. Semantic and formal description of the argument patterns for the German noun Umzug.

Figure 3

Figure 3. Lexical expansion using WordNet.

Figure 4

Table 2. Examples of cosine similarities between words

Figure 5

Figure 4. Sample JSON output for the inflexion tool.

Figure 6

Table 3. Data in the current pilot phase of XeraWord

Figure 7

Figure 5. Part of one lexical package for the structure N1 of the Spanish noun mudanza [move].

Figure 8

Table 4. Example for the description levels of a lemma

Figure 9

Figure 6. Ontological features with the semantic role agent for the Galician noun olor [smell] using the structure

Figure 10

Figure 7. Example of the spreadsheet generated for the lexical package referring to the Portuguese noun presença [presence].

Figure 11

Table 5. Percentage of terms translated with TraduWord

Figure 12

Table 6. Typology of problems found in the translations obtained from MyMemory

Figure 13

Table 7. Distribution of translations according to their source for the Portuguese noun cheiro [smell]

Figure 14

Table 8. Manual intervention in the translations according to their source for the Portuguese noun cheiro [smell]

Figure 15

Figure 8. Query interface of XeraWord.

Figure 16

Table 9. Examples generated for Galician and Portuguese