College of Information Science and Engineering, Ritsumeikan University, Japan
(+81)-77-561-5065
yohei[at]fc.ritsumei.ac.jp

Bilingual Dictionary Induction

Semi-automatically create bilingual dictionaries among various Indonesian Ethnic Languages

  • The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. The pivot language and cognate recognition approaches have been proven useful for inducing bilingual lexicons for such languages. We propose constraint-based bilingual lexicon induction for closely-related languages by extending constraints from the recent pivot-based induction technique and further enabling multiple symmetry assumption cycle to reach many more cognates in the transgraph. We further identify cognate synonyms to obtain many-to-many translation pairs. Our method demonstrates the potential to complement other bilingual dictionary creation methods like word alignment models using parallel corpora for high-resource languages while well handling low-resource languages.
  • Publications:
    1. Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2017. A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 2, Article 9 (November 2017), 29 pages. DOI: https://doi.org/10.1145/3138815 [full paper]
    2. Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2016. Constraint-based bilingual lexicon induction for closely related languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 3291–3298, Paris, France, May. [full paper]

Leave a Reply

Your email address will not be published. Required fields are marked *