/ / / Simplex: Creating a simple Google Duplex clone using Dialogflow
Conversation Design | DialogFlow | Uncategorized

Simplex: Creating a simple Google Duplex clone using Dialogflow

Looking for a Google Duplex SDK or maybe a Google Duplex API? I don’t know of any, but in this article, I explain how the haircut scheduling call demoed for Google Duplex can be built using Dialogflow. We will be building Simplex, which is a very simple Google Duplex clone and will present some ideas on how a Google Duplex like agent could be created.

What is Google Duplex?

During I/O 2018, Google showed a glimpse of a technology called Google Duplex. If you haven’t seen the video, here it is:

Can we use Dialogflow to do something similar? Yes, but we first need to look at a few disclaimers.

Disclaimers

First, since Google Duplex is supposed to help out with things other than simply scheduling appointments with hair salons, this article is going to be quite narrow. In addition, since we don’t know the full capabilities of Google Duplex, it is very hard to say if something like Google Duplex can be built using only Dialogflow. My guess is probably not.

Second, I am aware that there are some people asking if this technology is overall a good thing for society and such. Its probably too early to say, and obviously I hope that it is only used in appropriate ways.

Third, I don’t have any special insight into any of these technologies. I write this as a Dialogflow consultant, and that’s the extent of my insight into the Google Duplex technology.

Transcript

Here is the full transcript of the call:


Hair Salon Person (HSP): Hello, how can I help you?

Google Duplex (GD): Hi, I’m calling to book a women’s haircut for a client. I’m looking for something on May 3rd

HSP: Sure, give me one second

GD: Mm-hmm

HSP: Sure, what time are you looking for around?

GD: At 12 pm

HSP: We do not have a 12pm available. The closest we have to that is a 1:15

GD: Do you have anything between 10am and 12pm?

HSP: Depending on what service she would like? What service is she looking for?

GD: Just a women’s haircut, for now

HSP: OK, we have a 10 o’clock

GD: 10am is fine

HSP: Ok, what’s her first name?

GD: The first name is Lisa

HSP: Ok, perfect. So I will see Lisa at 10 o’clock on May 3rd

GD: OK, great thanks.

HSP: Great. Have a great day. Bye.


Mm-hmm

No doubt, the most impressive and funny moment in the video is when the Assistant says “Mm-hmm”.

But you might be surprised to know that Google probably had to add that response in (or some other phrase which means the same thing) to get the conversation working correctly. This is because Dialogflow is based on a request-response model, and it cannot handle two consecutive messages from the end user without interjecting a response in the middle.

Also, if the Assistant had merely remained silent, that might have prompted the hair salon person to say something like “Hello? Are you still there?” and might have messed up the dialogue.

Goal of the agent

So we can take a look at the transcript and figure out how to build out the corresponding intents.

Before we do that, we need to understand the goal of what the Assistant is trying to do. It is trying to provide two pieces of information to the hair salon person (HSP) – the date and time of the appointment. The HSP asks for the client’s name, but this is not the Assistant’s concern. If there was some way for the Hair salon to make the booking without using the client’s name (e.g. using only phone number), then the Assistant would be fine with that too.

The Assistant needs to be certain of the date and time (to add to the calendar).

Is this slot filling?

People familiar with Dialogflow might be wondering if this is an example of slot filling. It is not. The agent isn’t collecting information, but rather providing it. Also, slot filling doesn’t really allow for handling cases like the HSP saying she doesn’t have the specific time requested.

Flowchart

I have created a conversation flow diagram using XMind which is based on the general principles outlined in my previous blog post. [1]

Welcome Intent

The conversation starts with the HSP saying : “Hello, how can I help you?”

You would probably expect some variant of this greeting, and look for the phrase “how can I help” or “what can I do for you?”

To this the agent should simply reply with the task it is trying to fulfill, and also add some extra information to make it easier for the HSP (by providing the date in this case).

Note that the Welcome intent (which I have named UserSaysGreeting) is following the general conventions I recommend: put the end result of the intent being triggered as the name of the intent. In addition, it also follows the other convention of creating an output context which explains the state of the system.

Waiting Intent

At this point, the HSP says “Sure, give me a second”.

While it seems very cool (and it is pretty cool), this is just an expected request-response pair, and you should be able to handle this without much issue.

HSP Asks for appointment time

At this point in the conversation, we can expect the HSP to ask for the time of the appointment. The agent responds with the first choice time.

In reality, this response should not be hard coded into the Text Response area. Rather it should be coming back from the webhook based on the schedule of the Google Assistant Owner (GAO) account. But for our purposes it is sufficient right now.

No availability at the first suggested time

At this point, we handle the possibility that the HSP says they are not available at the time that Duplex says. Note that the HSP is suggesting an alternate time, which is processed on the webhook end to see if it is still a time which is feasible for the GAO.

Webhook logic

The webhook parses the value coming in the time parameter, and checks to see if it is between 10 AM and 12 PM. If it is, then it suggests that the time is fine and moves on to the next step in the conversation.

If the time is outside the range, then it sets the output context to awaiting_confirmation_newtime and asks if there is any availability between 10 and 12.

HSP suggests a new time on original date

Now the HSP looks at the message which says “Do you have any availability between 10 AM and 12 PM?”. Note that this message is only generated on the webhook end and isn’t shown in any of the screenshots.

Also note that the input context is awaiting_confirmation_newtime, the same as what was set by the webhook. Generally speaking, you should try and avoid updating the context from a webhook. But this is an example where it is OK to update the webhook from the context since there is a requirement to go down completely different branches in the conversation logic based on the input (time) supplied by the user.

Why do we call the webhook?

It is possible that even if this intent is fired, the HSP might have suggested something close to 10 AM or 12 PM but not between. We would need to check this value and confirm that the time is indeed between the range which fits the GAO’s schedule.

No other availability on first choice date

When the HSP sees the message “Do you have any availability between 10 AM and 12 PM?”, another possibility is that there is no other time available and she has already suggested the last available time slot. Or she might know there is no slot available within the interval specified.

When the HSP says she has no availability, the Assistant will now try and schedule on the second choice date.

Note that I am making up this date and time myself, as a way to explain the possibilities of using this approach. That is, this isn’t an agent which is created by “overfitting” on the input example and one which will not work in any other scenario. It is intended to be quite flexible [2].

HSP asks for haircut type

We add an intent to handle the question where the HSP asks for the haircut type.

Notice that in the conversation transcript, the HSP is asking for the haircut type to decide on a second option for time. The intent is designed to also handle this scenario, but it will also work even if this question is asked earlier or later in the conversation. [3]

HSP asks for client’s first name

This intent is straight forward and here too, the response is not going to be hard coded but will be coming from the webhook. But it is close enough for our requirements. [4]

HSP confirms appointment

This is an intent which is required for completing the conversation. Here, we expect the HSP to say something which tells us that she has actually booked the appointment. You can make a lot of variations on the training phrase on this one to make sure.

Also notice that when this intent fires, the context changes from awaiting_confirmation to confirmed_awaiting_goodbye.

At this point, the agent is ready to end the call.

Goodbye Intent

Finally, there will be some kind of exchange of messages to mark the end of the conversation.

Rinse and Repeat

You can use a very similar set of intents (but with a different input context) to handle the scenario where the HSP doesn’t have any availability at all on the first proposed date. You can see the list of intents you would need in the flowchart.

Summary

This article provided an outline of how you might use Dialogflow to create Simplex, a simple clone of Google Duplex.

Where are the fallback intents?

As you might have noticed, I haven’t included any fallback intents here. They will certainly be necessary, but you would want to build them after figuring out what kind of fallbacks can be handled in such an agent.

Where are the entities?

You could argue that the haircut type should be an entity, but I have omitted it for simplicity. Plus, it will probably not affect the input/output contexts we are using.

Will this handle corner cases?

It can be extended to handle corner cases. The benefit of using my approach is that it is a fairly flexible system and you can add/remove intents as necessary depending on how many corner cases you wish to handle.

This is too simplistic

Maybe you think this is too simplistic and will not work in the majority of cases. Please send in a sample conversation you have had in real life and I will try and incorporate it into this agent. 🙂

This is way too complex. I can do all this in a single slot filling intent.

I wouldn’t recommend doing that, but if you can achieve it in a way which is also easy for you to later debug your chatbot, good for you.

Get the agent ZIP file

You can get the ZIP file for the agent as well as the sample code used in the webhook by joining the MBD Membership course.

Reference

[1] You can take a more in depth look at creating these flowcharts in my Dialogflow Flowcharts course. Or you can get the Learn Dialogflow bundle and get all the courses (including Flowcharts) which will give you a lot of insight into the reasons why I have designed this agent the way I have.

[2] The agent is flexible, but within limits. We can make a reasonable assumption that the person at the hair salon is very busy, is working off some kind of imaginary “slot filling script” where she needs to get certain pieces of information, and would like to complete the call quickly and get back to business.

[3] In fact, it is possible that the HSP doesn’t even ask for the haircut type during the whole conversation. The agent can handle this possibility too.

[4] Similarly, it is possible that the HSP doesn’t ask for the client name during the whole conversation, and our agent has been designed to handle that scenario.

FREE COURSES

Related Posts

    • This is meant to be an explanatory post, and I don’t have any plug-and-play answer for other languages. If you spend some time, you should be able to follow this article and design a flowchart and construct a similar conversation flow in non-English languages also.

  • noob question. the chatbot (Google Duplex) is supposed to make the call. how do you enable DialogFlow to make a call? thanks.

    • I have named my article Simplex mainly because the article is only concerned with the Dialogflow portion. If it is possible to answer your question in a comment on a website, then everyone would be doing it by now isn’t it? 🙂

      More seriously, that’s not the focus of this article, and you need to look up how to add, say, telephony to your app, and how to get the audio stream, and then how to send the audio stream to Dialogflow for processing etc. Probably a multi-“man months” project. I am guessing something like Twilio might help in this case, although I have not tried it myself, so I am not sure. https://stackoverflow.com/a/44189954

  • any idea how to incorporate interjections like Mm-hmm into telephony response using SSML? Google doesn’t seem to support interjections within SSML.