A reader asks:
How far away are we from implementing these chatbots in Indian languages – Kannada, Tamil, Telugu and Malayalam?
I have some thoughts on this, but I am by no means even well informed on the subject of non-English chatbots. So if there is someone out there who is working on chatbots in Indian languages, it would be nice if you can pitch in. (I use is Disqus conditional load, which will only show up when you scroll down to the bottom of the page and hover for a couple of seconds)
Natural Language Processing
The key features in chatbots are all based on recent (and sometimes older) developments in Natural Language Processing (NLP).
It is a little hard to explain without going into detail, but NLP itself comprises of multiple techniques, and some of the best developments in NLP recently were due to the evolution of machine learning algorithms (e.g. dependency parsing).
An important thing to note is that NLP in non-English languages is itself far behind English language NLP, to the best of my knowledge.
When we talk about chatbots in non-English languages, we are faced with two challenges. Not only is NLP for non-English languages below par compared to what you have for English, you also need to account for the existing error rates even in English language chatbots.
So how good are the best chatbots?
The best known chatbot, as in the most conversational, is most likely Mitsuku. And even when interacting with Mitsuku, it feels a little “artificial”.
I remember reading how Google used Machine Learning (ML) and started getting much better results for Google Translate when compared to rules based websites. I don’t have a link or reference handy, but I think the other website which was compared was BabelFish. The important takeaway from the article was that ML will dominate software translation in the future. Machine Learning is expected to provide (and is already providing) a lot of improvement to software translation.
The translation approach
Since translation is becoming quite good nowadays, there is actually a case to make for translating the non-English language user message into English first, then passing it on to a chatbot built in (say) API.AI, collecting the answer, and translating it back to the non-English language using Google translate once again.
While this short-term measure would probably work better than you might expect, it will also likely be worse than a chatbot built in the English language itself.
And its not only the below par NLP. There are two other reasons for this.
One, improving ML accuracy requires a lot of data. And ML also needs a high quality training (annotated) dataset. While these may be available in the more popular non-English languages such as Spanish and Chinese, I would be surprised if high quality annotated datasets are available for regional South Indian languages.
Second, take a look at the Conversation Design Toolkit I put together based on Google’s documentation. You will see that you need to be aware of idioms and colloquialisms to create better (more conversational) chatbots, and these may not even have equivalents in non-English chatbots. This makes the task even more challenging.
In conclusion, I feel we are a while away from chatbots in Indian languages which work as well as English language chatbots – and the English language ones themselves fall short of natural.
Again, if you wish to share an alternate viewpoint, or if you simply just have some data/stats to share around this topic, it would be of great interest to this blog’s audience.