Published online by Cambridge University Press: 22 March 2010
How do infants find the words in the speech stream? Computational models help us understand this feat by revealing the advantages and disadvantages of different strategies that infants might use. Here, we outline a computational model of word segmentation that aims both to incorporate cues proposed by language acquisition researchers and to establish the contributions different cues can make to word segmentation. We present experimental results from modified versions of Venkataraman's (2001) segmentation model that examine the utility of: (1) language-universal phonotactic cues; (2) language-specific phonotactic cues which must be learned while segmenting utterances; and (3) their combination. We show that the language-specific cue improves segmentation performance overall, but the language-universal phonotactic cue does not, and that their combination results in the most improvement. Not only does this suggest that language-specific constraints can be learned simultaneously with speech segmentation, but it is also consistent with experimental research that shows that there are multiple phonotactic cues helpful to segmentation (e.g. Mattys, Jusczyk, Luce & Morgan, 1999; Mattys & Jusczyk, 2001). This result also compares favorably to other segmentation models (e.g. Brent, 1999; Fleck, 2008; Goldwater, 2007; Johnson & Goldwater, 2009; Venkataraman, 2001) and has implications for how infants learn to segment.
This work was supported by a University of Delaware Research Foundation grant to the second author, and by NIH (5R01HD050199) and NSF grants (BCS-0642529) to the third author. We thank Vijay Shanker for valuable discussions, and Regine Lai and Aimee Stahl for feedback on the manuscript.