INDONESIAN LANGUAGE SPHERE PROJECT
Building dictionary development ecosystem for Low Resource Languages
Indonesia has a population of 221,398,286 and 707 living languages which cover 57.8% of Austronesian Family and 30.7% of languages in Asia . There are 341 Indonesian ethnic languages facing a various degree of language endangerment (trouble / dying) where some of the native speakers do not speak Bahasa Indonesia well since they are in remote areas. Unfortunately, there are 13 Indonesian ethnic languages which already extinct. In order to save low-resource languages like Indonesian ethnic languages from language endangerment, we are trying to enrich the basic language resource, i.e., bilingual dictionary. Lately, low resource languages are getting more attention by UNESCO, ELRA, ACM, etc.
This is the statistic of languages, bilingual dictionaries and publications of this project.
Created Bilingual Dictionaries
In the first experiment (2016), we created all combinations of bilingual dictionaries from 5 languages (Indonesian, Malay, Minangkabau, Sundanese, and Javanese). In the second experiment (2019), we added Banjarese and Palembang to the family. At the end of 2020, we are planning to enrich bilingual dictionaries of the original 5 languages into 4000 translation pairs each as the third experiment.DICTIONARIES
We work closely on computational linguistics, natural language processing, machine learning, and crowdsourcing approaches to enrich Indonesian Ethnic Languages.
Arbi Haza Nasution
Head of Department
Department of Informatics Engineering, Universitas Islam Riau, Indonesia
College of Information Science and Engineering, Ritsumeikan University, Japan
Global Center for Science and Engineering, Waseda University, Japan