Indowordnet’s help in Indian language machine translation

S., Sreelekha; Bhattacharyya, Pushpak

doi:10.1007/s00146-019-00907-w

Indowordnet’s help in Indian language machine translation

Open Forum
Published: 06 September 2019

Volume 35, pages 689–698, (2020)
Cite this article

AI & SOCIETY Aims and scope Submit manuscript

Sreelekha S.¹ &
Pushpak Bhattacharyya¹

254 Accesses
1 Citation
Explore all metrics

Abstract

Languages with insufficient digitally available resources, such as, Indian–Indian and English–Indian language Machine Translation (MT) system developments, faces the difficulty to translate various lexical phenomena. In this paper, we present our work on a comparative study of 440 phrase-based statistical trained models for 110 language pairs across 11 Indian languages. We have developed 110 baseline statistical machine translation systems. Then, we have augmented the training corpus with Indowordnet synset word entries of lexical database and further trained 110 models on top of the baseline system. We have done a detailed performance comparison using various evaluation metrics such as BLEU score, METEOR, and TER. We observed significant improvement in evaluations of translation quality across all the 440 models after using the Indowordnet. These experiments give a detailed insight in two ways: (1) usage of lexical database with synset mapping for resource poor languages and (2) efficient usage of Indowordnet synset mapping. Moreover, synset mapped lexical entries helped the SMT system to handle the ambiguity to a great extent during the translation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phrase-Based English–Nyishi Machine Translation

The Utility of Hierarchical Phrase-Based Model Machine Translation for Low Resource Languages

Statistical machine translation of Indian languages: a survey

Article 17 November 2017

Nadeem Khan Jadoon, Waqas Anwar, … Farooq Ahmad

Notes

References

Agarwal A, Lavie A (2008) Meteor, M-Bleu, M-ter evaluation matrics for high correlation with human ranking of machine translation output. In: Proceedings of the third workshop on statistical machine translation. ACL, Columbus, pp 115–118
Ahsan A, Kolachina P, Kolachina S, Sharma DM, Sangal R (2010) Coupling statistical machine translation with rule-based transfer and generation. In:AMTA—The ninth conference of the association for machine translation in the Americas, Denver, Colorado
Antony PJ (2013) Machine translation approaches and survey for Indian Languages. Assoc Comput Ling Chin Lang Proc 18(1):47–78
Google Scholar
Bhattacharyya P (2010) IndoWordnet. LREC—International language resources and evaluation conference. http://www.lrec-conf.org/proceedings/lrec2010/pdf/939_Paper.pdf
Bhattacharyya P, Khapra M, Kunchukuttan A (2016) Statistical machine translation between related languages. In: Annual conference of the North American chapter of the association for computational linguistics: Tutorials
Brown PE, Pietra SA Della, Pietra VJD, Mercer RLJ (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311
Chakrawarti RK, Bansal P (2017) Approaches for improving Hindi to English machine translation system. Indian J Sci Technol 10(16):1
Article Google Scholar
Choudhary N, Jha GN (2011) Creating multilingual parallel corpora in Indian languages. In: Proceedings of language technology conference
Denkowski Michael, Lavie Alon (2014) Meteor universal: language specific translation evaluation for any target language. Proc Ninth Workshop Stat Mach Transl. https://doi.org/10.3115/v1/W14-3348
Article Google Scholar
Emeneau Murray B (1956) India as a linguistic area. Language 32(1):3–16
Article Google Scholar
Khan Md, Anwarus S, Yamada S, Tetsuro N (2011) Translating unknown words using WordNet and IPA-based-transliteration. In: 14th international conference on computer and information technology (ICCIT). https://doi.org/10.1109/iccitechn.2011.6164838
Koehn P, Hoang H, Birch A, Burch CC, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the association for computational linguistics 2007 demo and poster sessions. Prague, pp 177–180
Kumar R, Mohanty, Bhattacharyya P, Kalele S, Pandey P, Sharma A, Kapra M (2008) Synset based multilingual dictionary: insights, applications and challenges. Global Wordnet Conference
Nair L, Peter DS (2012) Machine translation systems for Indian languages. Int J Comput Appl 39(1):7975–8887
Google Scholar
Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th association for computational linguistics (ACL), Philadelphia, July 2002, pp 311–318
Sreelekha S, Bhattacharyya P (2014) Lexical resources for Hindi–Marathi MT. In: WILDRE–3rd workshop on Indian language data resource evaluations. (LREC 2014)–International language resources and evaluation conference, Reykjavik, Iceland, WILDRE 2014 Proceedings, p 102
Sreelekha S, Bhattacharyya P (2016) Lexical resources to enrich English Malayalam Machine translation. In: LREC—international conference on lexical resources and evaluation, Slovenia
Sreelekha S, Bhattacharyya P (2017) Role of morphology injection in SMT: a case study from Indian language perspective. In: ACM transactions on Asian and low-resource language information processing (TALLIP) vol 17, No. 1, Article 1. https://doi.org/10.1145/3129208
Sreelekha S, Bhattacharyya P (2018) Morphology generation for English–Malayalam SMT. In: LREC—international conference on lexical resources and evaluation, Miyazaki (Japan)
Sreelekha S, Dabre R, Bhattacharyya P (2013) Comparison of SMT and RBMT, the requirement of hybridization for Marathi–Hindi MT, ICON-2013. In: 10th International conference on natural language processing, Noida, India
Sreelekha D, Bhattacharyya P, Malathi D (2015) Solving data spasity by morphology injection in factored SMT, ACL-Anthology, ICON 2015. In: 12th international conference on natural language processing
Sreelekha S, Bhattacharyya P, Malathi D (2018) Statistical vs. rule based; a case study on Indian language perspective. J Adv Intell Syst Comput. https://doi.org/10.1007/978-981-10-5520-1_60
Article Google Scholar
Vintar Š, Fišer D (2016) Using WordNet-based word sense disambiguation to improve MT performance. In: Costa-jussà M, Rapp R, Lambert P, Eberle K, Banchs R, Babych B (eds) Hybrid approaches to machine translation. Theory and applications of natural language processing. Springer, Cham
Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the pre-print version copy of the article in Arxiv.org. The authors would like to thank Department of Science & Technology, Govt. of India for providing fund under Woman Scientist Scheme (WOS-A) with the project code-SR/WOS-A/ET/1075/2014.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai, India
Sreelekha S. & Pushpak Bhattacharyya

Authors

Sreelekha S.
View author publications
You can also search for this author in PubMed Google Scholar
Pushpak Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sreelekha S..

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

S., S., Bhattacharyya, P. Indowordnet’s help in Indian language machine translation. AI & Soc 35, 689–698 (2020). https://doi.org/10.1007/s00146-019-00907-w

Download citation

Received: 12 October 2018
Accepted: 09 August 2019
Published: 06 September 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00146-019-00907-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indowordnet’s help in Indian language machine translation

Abstract

Access this article

Similar content being viewed by others

Phrase-Based English–Nyishi Machine Translation

The Utility of Hierarchical Phrase-Based Model Machine Translation for Low Resource Languages

Statistical machine translation of Indian languages: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Indowordnet’s help in Indian language machine translation

Abstract

Access this article

Similar content being viewed by others

Phrase-Based English–Nyishi Machine Translation

The Utility of Hierarchical Phrase-Based Model Machine Translation for Low Resource Languages

Statistical machine translation of Indian languages: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation