Microsoft Plans to Bring Speech Data for 3 Indian Languages
Microsoft’s Speech Corpus is considered the largest publicly available Indian language speech dataset, which is provided by Microsoft Research Open Data initiative. Microsoft Indian Language Speech Corpus Package Offers Test Data for Telugu, Tamil, and Gujarati Languages.
On 06-Sep-2018, Microsoft launched the Speech Corpus package that brings conversational and phrasal speech training and test data for Telugu, Tamil, and Gujarati languages, which includes audio and corresponding transcripts. The content of the speech dataset is available for free Researchers and academia who can build Indian language speech recognition for applications where speech is required. This will make Digital Marketing easily interoperable within the Indian languages and enhance internet-based marketing efforts to be more accurate.
Microsoft believes that India’s increasing digital literacy needs to be supported by a multi-lingual digital world and Microsoft Indian Language Speech Corpus aims at reducing language barriers. Microsoft Indian Language Speech Corpus has the ability to address differences in enunciation, accent, diction, and slang. The Speech Corpus is available for free to those who are developing speech recognition systems in Tamil, Telugu and Gujarati.