Semi-automatically create bilingual dictionaries among various Indonesian Ethnic Languages
The constraint-based approach has been proven useful for inducing bilingual dictionary for closely-related low-resource languages. When we want to create multiple bilingual dictionaries linking several languages, we need to consider manual creation by a native speaker if there are no available machine-readable dictionaries are available as input. To overcome the difficulty in planning the creation of bilingual dictionaries, the consideration of various methods and costs, plan optimization is essential. Utilizing both constraint-based approach and plan optimizer, we design a collaborative process for creating 10 bilingual dictionaries from every combination of 5 languages, i.e., Indonesian, Malay, Minangkabau, Javanese, and Sundanese. We further design an online collaborative dictionary generation to bridge the spatial gap between native speakers. We define a heuristic plan that only utilizes manual investment by the native speaker to evaluate our optimal plan with total cost as an evaluation metric. The optimal plan outperformed the heuristic plan with a 63.3% cost reduction.
Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp.3397-3404, Miyazaki, Japan, May, 2018. [full paper]
College of Information Science and Engineering, Ritsumeikan University, Japan