What if a specific system entity isn’t available in all languages in a multi-lingual bot?
I got this question from a reader on my article on creating multi-lingual bots (not to be confused with non-English bots which only use a single language).
If the system entities isn’t available for another language, say bahasa indonesia, does this means that I cannot create add bahasa indonesia as one of the language of the multilanguage bot?
The answer to this question is a little involved, and depends on many factors. Also, if your bot will “evolve” over time, then you definitely need to read this article.
Check all the system entities available for your language
You can do this by going to the Dialogflow system entities examples page and filtering by language.
The first thing you will note is that the English language has 50 system entities – and yes, I actually counted it 🙂
Compared to English, Indonesian language has only 37 system entities. This means 13 system entities which are present in the English language are not available in Indonesian as of this writing. This will likely change over time as Dialogflow adds more support for more system entities in different languages.
I don’t know which entities are missing in Indonesian, although it would be a good idea if someone were to create a resource page just for this purpose. 🙂
Design a flowchart for all the languages
I would suggest getting my Dialogflow Flowcharts course, which goes into plenty of detail about describing and annotating entities when creating a flowchart for your Dialogflow bot. I am obviously biased, but the price of the course will be a fraction of the cost of starting to create a multi-lingual bot and realizing later that it will not behave in the same way for all the languages.
Now we come to the interesting part.
Check for multi-language support for your entities
Check to see (using the table above) if all the entities you have used in your flowchart are supported in all the languages in your multi-lingual bot.
Case 1: All system entities are supported in all languages
If all the system entities are supported in all languages, then that is obviously the best case. You can just start designing your multi-lingual bot.
Case 2: All system entities are not supported in all languages
If a given system entity is not supported in one of the languages, you have three options.
Option 1 : Use a different agent for each language (i.e. one Dialogflow project per language). This is obviously a lot of work to manage, but probably the best choice if you are willing to do the extra work.
Option 2: You can redesign your entire flowchart to not use the specific system entity.
Option 3: You can also have different capabilities for your bot across different languages (that is, the bot will be less powerful in one language when compared to the other). Basically, you avoid creating the entire conversation branch which uses the non-existent system entity and just let it go to the fallback intent. Usually, this third option is the worst tradeoff. Not only is your bot less capable, you anyway need to do extra work to manage your bots because they will not behave the same across all languages.
Redesign your flowchart
Here it gets more interesting again.
Sometimes you can redesign the flowchart in such a way that it uses a system entity in English, and you can reconstruct the same behavior in the other language by using a regex entity.
That will work sometimes, but probably will not work in many cases (although you should test and verify this, I am no language expert). As a small bonus, sometimes this regex entity will become more powerful than even the English language system entity if you can incorporate both numbers and words.
What if you cannot design a regex entity either?
In that case, you will have to resort to the last option.
Use a wildcard entity if there is no other option
A wildcard entity allows the user to type in freeform text, and you can parse it on the backend to extract the entity of your choice.
In other words, you will use the wildcard entity in all your languages to get specific entity input. You have effectively slightly dumbed down your bot in all languages, but that is always a problem when you use a lowest common denominator approach.
Now, obviously, you need to be quite careful when using these wildcard entities because the wildcard entity has a tendency to capture everything, including garbage input. It is very hard to parse garbage input and extract an entity from it in your backend code. 🙂
But if you design your chatbot well, and you understand how candidate intents work, you can minimize the risk of the wildcard entity capturing unnecessary stuff.