Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-07T09:56:27.806Z Has data issue: false hasContentIssue false

Lexicon or grammar? Using memory-based learning to investigate the syntactic relationship between Belgian and Netherlandic Dutch

Published online by Cambridge University Press:  21 May 2021

Robbert De Troij*
Affiliation:
Quantitative Lexicology and Variational Linguistics, KU Leuven, Blijde-Inkomststraat 21, 3000 Leuven, Belgium Centre for Language Studies, Radboud University Nijmegen, 6500 HD Nijmegen, the Netherlands
Stefan Grondelaers
Affiliation:
Centre for Language Studies, Radboud University Nijmegen, 6500 HD Nijmegen, the Netherlands
Dirk Speelman
Affiliation:
Quantitative Lexicology and Variational Linguistics, KU Leuven, Blijde-Inkomststraat 21, 3000 Leuven, Belgium
Antal van den Bosch
Affiliation:
KNAW Meertens Instituut, Oudezijds Achterburgwal 185, 1024 DK Amsterdam, the Netherlands
*
*Corresponding author. E-mail: robbert.detroij@kuleuven.be
Rights & Permissions [Opens in a new window]

Abstract

This article builds on computational tools to investigate the syntactic relationship between the highly related European national varieties of Dutch, viz. Belgian Dutch (BD) and Netherlandic Dutch (ND). It reports on a series of memory-based learning analyses of the post-verbal distribution of er “there” in adjunct-initial existential constructions like Op het dak staat (er) een schoorsteen “On the roof (there) is a chimney,’, which has been claimed to be among the most notoriously difficult variables in Dutch. On the basis of balanced datasets extracted from Flemish and Dutch newspaper corpora, it is shown that er’s distribution in both national varieties can be learned to a considerable extent from bare lexical input which is not assigned to higher-level categories. However, whereas this yields good results for ND, BD scores are consistently lower, suggesting that BD cannot do with lexical features alone to attain accuracy scores comparable to ND. This ties in with earlier findings that the more advanced standardization of ND materializes in a higher lexical collocability, whereas Flemish speakers need additional higher-level linguistic information to insert er.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2021. Published by Cambridge University Press
Figure 0

Figure 1. Mosaic plot of er’s distribution across the four newspapers (NRC and AD for ND; DS and HLN for BD). The area of each tile of the plot is proportional to the number of observations it represents.

Figure 1

Table 1. Slice of the WIN datasets (the first two examples are from LeNC and the last two are from TwNC)

Figure 2

Table 2. Number of feature values in the WIN datasets

Figure 3

Table 3. Slice of the PAR datasets (the first two examples are from LeNC and the last two are from TwNC)

Figure 4

Table 4. Number of feature values of the PAR datasets

Figure 5

Figure 2. Number of feature values by feature value frequency in LeNC (light gray) and TwNC (dark gray), for both WIN and PAR.

Figure 6

Table 5. Selected values for the TiMBL hyperparameters

Figure 7

Figure 3. Boxplots capturing the accuracy for increasing sample sizes, both for WIN versus PAR feature representations and intra- versus cross-varietal training and testing.

Figure 8

Figure 4. Effect plots for a linear regression model predicting TiMBL accuracies.