Skip to main content
    • Aa
    • Aa

The Kestrel TTS text normalization system


This paper describes the Kestrel text normalization system, a component of the Google text-to-speech synthesis (TTS) system. At the core of Kestrel are text-normalization grammars that are compiled into libraries of weighted finite-state transducers (WFSTs). While the use of WFSTs for text normalization is itself not new, Kestrel differs from previous systems in its separation of the initial tokenization and classification phase of analysis from verbalization. Input text is first tokenized and different tokens classified using WFSTs. As part of the classification, detected semiotic classes – expressions such as currency amounts, dates, times, measure phases, are parsed into protocol buffers ( The protocol buffers are then verbalized, with possible reordering of the elements, again using WFSTs. This paper describes the architecture of Kestrel, the protocol buffer representations of semiotic classes, and presents some examples of grammars for various languages. We also discuss applications and deployments of Kestrel as part of the Google TTS system, which runs on both server and client side on multiple devices, and is used daily by millions of people in nineteen languages and counting.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

S. Abney , 1996. Partial parsing via finite-state cascades. Natural Language Engineering 2 (4): 337344.

A. Aho , 1969. Nested stack automata. Journal of the Association for Computing Machinery 16 (3): 383406.

C. Allauzen , and M. Riley , 2012. A pushdown transducer extension for the OpenFst library. In Conference on Implementation and Application of Automata, Lecture Notes in Computer Science vol. 7381, Heidelberg: Springer, pp. 6677.

S. Bangalore , and G. Riccardi , 2001. A finite-state approach to machine translation. In 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA, pp. 18.

de A. Gispert , G. Iglesias , G. Blackwood , E. Banga , and W. Byrne , 2010. Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars. Computational Linguistics 36 (3): 505533.

A. Joshi , 1996. A parser from antiquity. Natural Language Engineering 2 (4): 291294.

M. Mohri 2009. Weighted automata algorithms. In M. Droste , W. Kuich , and H. Vogler (eds.) Handbook of Weighted Automata, Monographs in Theoretical Computer Science, Springer, pp. 213254.

M. Mohri , F. C. N. Pereira , and M. Riley , 2002. Weighted finite-state transducers in speech recognition. Computer Speech and Language 16 (1): 6988.

R. Navigli , 2009. Word sense disambiguation: a survey. ACM Computing Surveys 41 (2): 169.

W. Skut , S. Ulrich , and K. Hammervold , 2003. A generic finite state compiler for tagging rules. Machine Translation 18 (3): 239250.

W. Skut , S. Ulrich , and K. Hammervold , 2004. A bimachine compiler for ranked tagging rules. In Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04, Association for Computational Linguistics, Geneva, Switzerland, pp. 198204.

R. Sproat , 1996. Multilingual text analysis for text-to-speech synthesis. Natural Language Engineering 2 (4): 369380.

R. Sproat , A. Black , S. Chen , S. Kumar , M. Ostendorf , and C. Richards , 2001. Normalization of non-standard words. Computer Speech and Language 15 (3): 287333.

P. Taylor , 2009. Text to Speech Synthesis. Cambridge, England, UK: Cambridge University Press.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 4
Total number of PDF views: 36 *
Loading metrics...

Abstract views

Total abstract views: 780 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 21st September 2017. This data will be updated every 24 hours.