IndicBERT v2 is a multilingual BERT model trained on IndicCorpv2, covering 24 Indic languages. IndicBERT performs competitive to strong baselines and performs best on 7 out of 9 tasks on IndicXTREME benchmark.
The models are trained with various objectives and datasets. The list of models are as follows:
All the code for pretraining and fine-tuning IndicBERT model is available at this GitHub repository
@article{Doddapaneni2022towards,
title={Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages},
author={Sumanth Doddapaneni and Rahul Aralikatte and Gowtham Ramesh and Shreyansh Goyal and Mitesh M. Khapra and Anoop Kunchukuttan and Pratyush Kumar},
journal={ArXiv},
year={2022},
volume={abs/2212.05409}
}