Hostname: page-component-77c78cf97d-lmk9j Total loading time: 0 Render date: 2026-04-27T18:20:08.222Z Has data issue: false hasContentIssue false

Extracting paraphrase patterns from bilingual parallel corpora

Published online by Cambridge University Press:  16 September 2009

SHIQI ZHAO
Affiliation:
Harbin Institute of Technology, No. 27 Jiaohua Street, Nangang District, Harbin 150001, China e-mails: zhaosq@ir.hit.edu.cn, tliu@ir.hit.edu.cn, lisheng@ir.hit.edu.cn
HAIFENG WANG
Affiliation:
Toshiba (China) Research and Development Center, No. 1, East Chang An Ave., Dongcheng District, Beijing 100738, Chinawanghaifeng@rdc.toshiba.com.cn
TING LIU
Affiliation:
Harbin Institute of Technology, No. 27 Jiaohua Street, Nangang District, Harbin 150001, China e-mails: zhaosq@ir.hit.edu.cn, tliu@ir.hit.edu.cn, lisheng@ir.hit.edu.cn
SHENG LI
Affiliation:
Harbin Institute of Technology, No. 27 Jiaohua Street, Nangang District, Harbin 150001, China e-mails: zhaosq@ir.hit.edu.cn, tliu@ir.hit.edu.cn, lisheng@ir.hit.edu.cn

Abstract

Paraphrase patterns are semantically equivalent patterns, which are useful in both paraphrase recognition and generation. This paper presents a pivot approach for extracting paraphrase patterns from bilingual parallel corpora, whereby the paraphrase patterns in English are extracted using the patterns in another language as pivots. We make use of log-linear models for computing the paraphrase likelihood between pattern pairs and exploit feature functions based on maximum likelihood estimation (MLE), lexical weighting (LW), and monolingual word alignment (MWA). Using the presented method, we extract more than 1 million pairs of paraphrase patterns from about 2 million pairs of bilingual parallel sentences. The precision of the extracted paraphrase patterns is above 78%. Experimental results show that the presented method significantly outperforms a well-known method called discovery of inference rules from text (DIRT). Additionally, the log-linear model with the proposed feature functions are effective. The extracted paraphrase patterns are fully analyzed. Especially, we found that the extracted paraphrase patterns can be classified into five types, which are useful in multiple natural language processing (NLP) applications.

Information

Type
Papers
Copyright
Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable