Building a self learning word prediction and auto-correct module for FirefoxOS and openweb handling multilingual input
Submitted by Rabimba Karanjai (@rabimba) on Wednesday, 15 July 2015
Language input for mobile devices has always been a challenge on how to provide intuitive experience along with the easy of type. One approach towards that end is predictive text input. But predictions are as good as the wordlist that it gets generated from. Often it becomes a much harder problem to implement the same approach for localized languages like Hindi,Bengali (India, Bangladesh) and languages that require IME to type effectively. One approach is to learn from users typing preference and improve the dictionary weight-age to improve prediction. This talk will discuss upon how this can be implemented in Firefox OS and how the same approach can be used for openweb apps universally without locking in to any specific language. We also will briefly discuss how it manages to improve localized language predictions and the challenges some transliteration system faces along with how we can tackle them.
A predictive text input system predicts the user’s next input word from the characteristics of natural languages and the user’s text input history. It can dramatically reduce the burden of text input tasks especially in environments where standard full-size keyboards cannot be used. When a user of a predictive text input system types the “a” key and “p” key to enter “application”, the system suggests “apple”, “application”. Candidate words are usually selected based on the word frequencies and the user’s usage pattern, but it would be better if the system can predict words based on the context of the text composition task.
Also for localized/asian languages transliteration has been one of the common methods for multilingual text input. One such way predictive transliteration, where user could input a word, by intuitively combining the input alphabet phonetically and the predictive transliteration system should correctly convert it to the target language.
For both of these cases it is of paramount importance to learn about the users usage of the words and learn from the usage pattern/words to dynamically improve upon the prediction for better output candidates. The talk will be organised as follows. In first part we will discuss briefly on how to integrate a modular learning algorithm to the prediction engine of FirefoxOS(gaia). Then we will talk about specific challenges to be addressed by a phonetic transliteration system and how we can address them. We will finish it with the limitations of present approach and what can be done to improve it.
Full Time Graduate Researcher, part time hacker and FOSS enthusiast.
I used to write code for Watson and do a bunch of other things at their lab (mostly deals with algorithm,NLP, Ontologies,reading papers among other stuff). At present intern at Almaden Research Center. And crawling my way towards a PhD at RICE University.
My present interest deviates towards security. Primarily static analysis and marginally towards systems.