Skip to main content
×
Home
    • Aa
    • Aa

Leveraging bilingual terminology to improve machine translation in a CAT environment*

  • MIHAEL ARCAN (a1), MARCO TURCHI (a2), SARA TONELLI (a2) and PAUL BUITELAAR (a1)
Abstract
Abstract

This work focuses on the extraction and integration of automatically aligned bilingual terminology into a Statistical Machine Translation (SMT) system in a Computer Aided Translation scenario. We evaluate the proposed framework that, taking as input a small set of parallel documents, gathers domain-specific bilingual terms and injects them into an SMT system to enhance translation quality. Therefore, we investigate several strategies to extract and align terminology across languages and to integrate it in an SMT system. We compare two terminology injection methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and cache-based model. We test the cache-based model on two different domains (information technology and medical) in English, Italian and German, showing significant improvements ranging from 2.23 to 6.78 BLEU points over a baseline SMT system and from 0.05 to 3.03 compared to the widely-used XML markup approach.

Copyright
Footnotes
Hide All
*

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 (Insight).

Footnotes
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 6
Total number of PDF views: 47 *
Loading metrics...

Abstract views

Total abstract views: 308 *
Loading metrics...

* Views captured on Cambridge Core between 30th May 2017 - 17th October 2017. This data will be updated every 24 hours.