Recently, on the Dialogflow Product forum, there was this question:
There is also a reply by a Google employee, which you should go and read.
There are two interesting things about this question:
- The asker is effectively trying to fit their entire chatbot into a single intent
- The asker thinks that everything should be an entity (which is probably a direct consequence of point 1)
But I was more interested in why someone would think that the entire sentence is full of candidates for entities. In my opinion, you should actually try to use an entity only when you absolutely need to.
So how can you decide what should and shouldn’t be an entity?
Here are some ideas.
Learn about the different types of nouns
Did you know that the English language has something like nine different types of nouns? Here is a quick list:
- proper nouns
- common nouns
- material nouns
- countable nouns
- uncountable or mass nouns
- collective nouns
- concrete nouns
- abstract nouns
Here is an article which describes them in detail.
Does this mean there is a 1-to-1 mapping between these different types of nouns and Dialogflow entities? No, not that I am aware of.
But knowledge of these different types of nouns will be very handy as you think about what to use and what not to use as an entity. For example, a proper noun is most likely an entity. (Again, there are probably some exceptions to this rule).
Learn about Part-of-speech tagging
In general, learning about Natural Language Processing will help you build out better chatbots (even if you don’t use Dialogflow).
Suppose you don’t want to spend all that effort. Here is a quick hack. At least learn about Part-of-speech tagging and some general principles about how POS tagging is implemented.
So what is part of speech tagging? Let us say you have a well formed English sentence. Part of speech tagging assigns tags to all the different words and identifies which word is which part-of-speech, e.g. pronoun, noun, verb, adjective etc. (see the next section)
Read this StackOverflow answer to get an idea of the various different tags which are typically used in POS tagging.
Use the Parts of Speech online tool
And the next tip is to use the Parts of Speech online tool.
Here is how it parses a sentence that the questioner asked:
Notice that “What” is marked as a pronoun – that is, it refers to some other noun (the job title) in the sentence. Often, pronouns can be omitted from the sentence and you will still get a gist of what the sentence is about. In this case, reading “job title of Frank Gallagher?” still gives you a very good idea of what the user’s intent is. That’s why the Google employee answered that “question words” shouldn’t be annotated as entities in this case. And in general, pronouns are not good candidates for entities, in my opinion.
In the same way, you can use the tool to develop some useful insights into how you declare your intents and how you choose your entities. (of course, you first need to learn about the types of nouns and about POS tagging to be able to do this).
Do you have any tips for how to choose entities? Let me know in the comments.