Home / DialogFlow / Machine Learning / Dialogflow Machine Learning Algorithm
DialogFlow | Machine Learning

Dialogflow Machine Learning Algorithm

I get some variant of this question quite often from readers or coaching clients:

What type of machine learning is happening within the black box? Any ideas?

While the short answer is “No, I don’t since Dialogflow isn’t open source as of date”, this doesn’t mean you cannot try and reverse engineer and get as much understanding as you can by using a special tool which already exists inside Dialogflow: the score coming back in the JSON response.

Here are a few things which can help you understand what is going on. I have also added the relevant thumbnail from my Dialogflow Conversation Design for those who might be interested in pursuing this further.

The candidate list

Imagine this. You have designed a chatbot. And you are giving a demo of your chatbot to a friend.

They have tried a couple of messages, and both work just fine. And then they try a third message. You are anticipating a certain response from the chatbot based on what you think should happen. Instead, a different response comes back.

And you are wondering: why did that response get selected by Dialogflow? To answer this question, you should first understand that there is such a thing as a list of intents which are all “viable candidates” at that point in the conversation.

Candidate list

The score coming back in the JSON response

A second thing you should understand is that there is a field in the JSON called score, now called the Intent Detection Confidence in API v2, coming back each time you try out a message in the Dialogflow simulator.

As it turns out, we can use the intent detection confidence value to do a bunch of testing and understand what is going on under the hood.

Scoring the user’s message


Words which have a common root, such as intent, intend, intended, and intention are treated by Dialogflow as well as other NLU bot frameworks as being similar or even identical (from the viewpoint of the algorithm which processes them). This root is called the “stem” of the word and stemming is what helps Dialogflow manage multiple variants of the same basic “word concept” so it can do better intent matching.

In the video below, I show how stemming can impact Dialogflow’s intent mapping.



There are some words in the English language which are not high in information value. Words such as “a”, “the” etc are so common that they are generally not very useful when doing intent mapping – they are considered somewhat superfluous.

I show an example of that in my video.


ML Threshold

So you already know that tweaking the value of the ML Threshold makes it more or less likely that only close phrases (to what is already in your training set) will match the intent. But how does it work in practice?

I show an example below.

ML Threshold

Term Reinforcement

You can tell Dialogflow to give higher weight to certain words and phrases by repeating it (or them) in multiple training phrases. This is why the Dialogflow team encourages bot creators to use about 10-15 training phrases per intent.

The repetition tells Dialogflow which are the most important words that you really would like to pattern match on. The words which don’t get repeated as much in the training phrases are given less weight (and thus less importance) as the matching happens.

Term Reinforcement

Other ways to get insight into what’s going on

In addition to the ideas I have described above, you can play around with the JSON score (intent detection confidence) in a few more ways to understand what really happens under the hood:

  • use different entity values and see if/how it affects the score
  • introduce a typo into the entity value
  • introduce a typo into a non-entity word (that is, a word which you have declared in the training phrase)
  • use close synonyms instead of the words already in your training phrases
  • create contexts with lifespan more than 1 (which I don’t usually recommend) and see if higher lifespan contexts produce higher scores for the same training phrase
  • see if putting everything within a followup intent tree affects the score


At the end of the day, while all these can help, you will still be doing a good amount of testing, and trial and error if you want your chatbot to be as accurate as possible in handling the user’s messages. But I hope this article serves as a good starting point for you to go and explore how things work.

Related Posts

  • Aravind, frankly, did not expect to get another discovery on your next article. Although it would be necessary to get used to this already))) In these seemingly simple examples, you showed how to work with the black box. And this is not necessarily DialogFlow. Such a cunning research approach can be applied with any closed systems. Thanks a lot!!!

    • I appreciate the enthusiasm, but I think at this rate people are going to think I am paying you to write these comments 🙂