Home / DialogFlow / Entity / Dialogflow RegEx : How to extract regular expression style entities
DialogFlow | Entity

Dialogflow RegEx : How to extract regular expression style entities

You probably know that Dialogflow doesn’t have inherent support for entities which can be expressed as regular expressions. It is possible they might come up with support for regex based entity extraction at some point in the future, but until then, you can use the following idea.

Update Sep 2019: Regex entities are now available in Dialogflow (article coming soon).

First, before we go into the idea, the suggested solution is to simply use the wildcard entity (@sys.any) in your webhook after the intent fires. Now, needless to say, this isn’t an ideal situation for a lot of reasons. For example, let us simply consider what the article itself suggests:

What do these recommendations imply?

1 A higher number of examples are helpful when you use the wildcard entity in an intent. This is because the wildcard entity, by its very nature, aggressively captures any and all input.

2 Using the entire training phrase is not a good idea (since Dialogflow doesn’t have anything else to go from) which means if the only thing the user types is the entity value then it can be very challenging

3 Prompting the user for confirmation is obviously a strategy for playing it safe, but you do end up potentially annoying the user

4 Restricting the intent with an explicit input context is helpful to guard the intent, but what if the expected behavior is that the user does type in the entity value in their first message to the bot?

You need to create a custom integration

This answer only works if you don’t use a custom integration. Perhaps that’s not what you wanted or expected, but this is a prerequisite. Besides, by using a custom integration, you also get a host of other benefits.

If you can create a custom integration, this is how you would do it.

1 Create a composite entity corresponding to the regex entity you want

There are different ways to do this, but you can basically take the example values and create a composite entity based around it. Here is the key: make sure you use a hyphen as a separator for the different pieces in your regex.

An example: I recently found out about the Swedish national ID, which has a specific regex format.

For example, here are a couple of numbers given in the example:

Here is the key – break the entity down into as many logical pieces as possible.

For example, for above – we have YY – MM – DD – dddd

where the last 4 digits could be any combination. (Of course, it is limited by the checksum value for a given person, but in theory those 4 digits could take any combination).

Here is how I define the entities:

First, the Year

Next, the month:

Then, the day of month:

The four random digits at the end:

Now we can define the personnummer composite entity:

2 Create an intent with potential user phrases and test if the composite entity works

This is an important step. You want to make sure your composite entity, as defined, is annotated correctly by Dialogflow.

For example, here is my intent

3 Preprocess the input

Now you might be saying: “But Aravind, the whole problem is that no user is going to type the value in the format I have defined in Dialogflow”.

This is where the fun begins. 🙂

You should preprocess the input to change it into the format you have defined in your Dialogflow composite entity.

For e.g. say the user types in

My personnummer is 8112289874

You should change it to

My personnummer is 81 – 12 – 28 – 9874

before sending the phrase to Dialogflow when you call the detectIntent API method.

Most programming languages will allow you to do something like this using some kind of regex-replace function.

For e.g. here is an example regex based replacement function for the personnummer in PHP:

And this is what the original and modified text strings will look like with the preg_replace method above (screenshot from my console output):

4 Use the preprocessed input in the detectIntent API call

Now you will send the modified text to Dialogflow. That is, you will call Dialogflow’s detectIntent API method and the input query will be the modified text.

When you do that, you will not only be able to get the right intent matched, but you will also enable very good entity extraction.

I modified my quickstart code and here is a console screenshot of the output I got from using this technique:

Note: the fact that you are seeing the fulfillment text output the number shows that Dialogflow was able to extract the correct entity.

At the same time, the exact format of the output (and why it doesn’t precisely match the input) depends on how Dialogflow chooses to print an entity value when using the format $entity. For reference, here is what I have in the Text response section for this intent.

On the other hand, if Dialogflow is unable to extract the entity value correctly, the fulfillment text “crashes” and you will see an empty response instead. You can check this out for yourself by triggering an intent but providing an entity value which doesn’t pattern match correctly. 

In a future article, I will explain how this technique is better than the recommended solution mentioned at the top of this article.

Middleware Course

I am planning to create a course on creating middleware which can improve your Dialogflow agent’s abilities. Here is a quick list of things you can do when you build your own custom middleware.

In the meantime, if you need help with this or similar questions, you can get in touch with me here.

Related Posts

  • Hi Aravind,
    I want to create 2 entities corresponding to ZQEA2-1FHCV-DRQ52-KJKCS-JFCGC and 3998532544934535B, it’s random string with A-Z 0-9 characters.
    Any advice would be greatly appreciated.

    • My advice would be to go for it. 🙂

      Jokes aside, what you are describing should fall under what I have explained in this article. Any reason you think it will not work?

      • so i need to create entity contain every numbers, letters in alphabet and then combine them multiple times in a composite entity. It’s the only option ?

        • I am supposing you have already seen that simply using the wildcard entity has some limitations, and we go for this approach only when you want to avoid those limitations.

          >> It’s the only option?

          No, you can also use the wildcard entity. But it is the most reasonable option which will also allow Dialogflow to do the “entity matching” heavy lifting, and overall make your intent mapping process work better.

          • It isn’t immediately clear which step isn’t working. Here are some suggestions:

            Option 1: Change the “char” entity from individual characters to groups of 2 characters. Yes, it means you will have 36 ^ 2 = 1296 entity values. (10 digits + 26 alphabets in any combination for the two characters).

            Option 2: Compare the accuracy of the intent mapping by just using a wildcard entity. As you might imagine, the approach I am suggesting here is much better suited to the kind of “clumped” entities – a very good example being the Personnummer. You are extending the approach to something which is much harder to pull off because there isn’t any underlying “pattern” for a GUID which makes it easier for the entity detection. As I have already mentioned, this is an alternative approach, but it may not be worth the trouble for certain scenarios.

            Option 3: Get in touch for 1-on-1 Skype consultation via my contact form if it is urgent.

    • I am not sure why it matters – considering it is just some name and the only thing is that you need to be consistent.But the usage is because that is how it seems to be actually spelled in Swedish. You can click on the link to the Wiki article and confirm this for yourself.