College of Information Science and Engineering, Ritsumeikan University, Japan
(+81)-77-561-5065
yohei[at]fc.ritsumei.ac.jp

[OLD]

Semi-automatically create bilingual dictionaries among various Indonesian Ethnic Languages

Indonesian Languages Endangerment

Indonesia has a population of 221,398,286 and 707 living languages which cover 57.8% of Austronesian Family and 30.7% of languages in Asia [1]. There are 341 Indonesian ethnic languages facing a various degree of language endangerment (trouble / dying) where some of the native speakers do not speak Bahasa Indonesia well since they are in remote areas. Unfortunately, there are 13 Indonesian ethnic languages which already extinct. In order to save low-resource languages like Indonesian ethnic languages from language endangerment, we are trying to enrich the basic language resource, i.e., bilingual dictionary.

International Research Collaboration

There are two factors we consider in selecting the target languages: language similarity and number of speakers. In order to ensure that the created bilingual dictionaries will be useful for many users, we listed the top 10 Indonesian ethnic languages ranked by the number of speakers and further select Javanese and Sundanese based on the number of speaker. To find and coordinate native speakers of those languages, we collaborated with Telkom University. Since our constraint-based approach works best on closely related language, we select Malay and Minangkabau based on relatedness with Indonesian. To find and coordinate native speakers of those language, we collaborated with Islamic University of Riau. Hence, we target 5 languages, i.e., Indonesian (ind), Malay (zlm), Minangkabau (min), Javanese (jav), and Sundanese (sun). In the first experiment (2016), we create all combinations of bilingual dictionaries from 5 languages (Indonesian, Malay, Minangkabau, Sundanese, and Javanese). In the second experiment (2019), we added Banjarese and Palembang to the family. At the end of 2020, we are planning to enrich bilingual dictionaries of the original 5 languages into 4000 translation pairs each as the third experiment.

Map data ©2018 GBRMPA, Google, SK telecom, ZENRIN | Term of Use

Bilingual Dictionaries