Looking for a Google Duplex SDK or maybe a Google Duplex API? I don't know of any, but in this article, I explain how the haircut scheduling call demoed for Google Duplex can be built using Dialogflow. We will be building Simplex, which is a very simple Google Duplex clone and will present some ideas on how a Google Duplex like agent could be created.
What is Google Duplex?
During I/O 2018, Google showed a glimpse of a technology called Google Duplex. If you haven't seen the video, here it is:
Can we use Dialogflow to do something similar? Yes, but we first need to look at a few disclaimers.
First, since Google Duplex is supposed to help out with things other than simply scheduling appointments with hair salons, this article is going to be quite narrow. In addition, since we don't know the full capabilities of Google Duplex, it is very hard to say if something like Google Duplex can be built using only Dialogflow. My guess is probably not.
Second, I am aware that there are some people asking if this technology is overall a good thing for society and such. Its probably too early to say, and obviously I hope that it is only used in appropriate ways.
Third, I don't have any special insight into any of these technologies. I write this as someone who specializes in Dialogflow development, and that's the extent of my insight into the Google Duplex technology.
Here is the full transcript of the call:
Hair Salon Person (HSP): Hello, how can I help you?
Google Duplex (GD): Hi, I’m calling to book a women’s haircut for a client. I’m looking for something on May 3rd
HSP: Sure, give me one second
HSP: Sure, what time are you looking for around?
GD: At 12 pm
HSP: We do not have a 12pm available. The closest we have to that is a 1:15
GD: Do you have anything between 10am and 12pm?
HSP: Depending on what service she would like? What service is she looking for?
GD: Just a women’s haircut, for now
HSP: OK, we have a 10 o’clock
GD: 10am is fine
HSP: Ok, what’s her first name?
GD: The first name is Lisa
HSP: Ok, perfect. So I will see Lisa at 10 o’clock on May 3rd
GD: OK, great thanks.
HSP: Great. Have a great day. Bye.
No doubt, the most impressive and funny moment in the video is when the Assistant says "Mm-hmm".
But you might be surprised to know that Google probably had to add that response in (or some other phrase which means the same thing) to get the conversation working correctly. This is because Dialogflow is based on a request-response model, and it cannot handle two consecutive messages from the end user without interjecting a response in the middle.
Also, if the Assistant had merely remained silent, that might have prompted the hair salon person to say something like "Hello? Are you still there?" and might have messed up the dialogue.
Goal of the agent
So we can take a look at the transcript and figure out how to build out the corresponding intents.
Before we do that, we need to understand the goal of what the Assistant is trying to do. It is trying to provide two pieces of information to the hair salon person (HSP) - the date and time of the appointment. The HSP asks for the client's name, but this is not the Assistant's concern. If there was some way for the Hair salon to make the booking without using the client's name (e.g. using only phone number), then the Assistant would be fine with that too.
The Assistant needs to be certain of the date and time (to add to the calendar).
Is this slot filling?
People familiar with Dialogflow might be wondering if this is an example of slot filling. It is not. The agent isn't collecting information, but rather providing it. Also, slot filling doesn't really allow for handling cases like the HSP saying she doesn't have the specific time requested.
I have created a conversation flow diagram using XMind which is based on the general principles outlined in my previous blog post. 
The conversation starts with the HSP saying : "Hello, how can I help you?"
You would probably expect some variant of this greeting, and look for the phrase "how can I help" or "what can I do for you?"
To this the agent should simply reply with the task it is trying to fulfill, and also add some extra information to make it easier for the HSP (by providing the date in this case).
Note that the Welcome intent (which I have named UserSaysGreeting) is following the general conventions I recommend: put the end result of the intent being triggered as the name of the intent. In addition, it also follows the other convention of creating an output context which explains the state of the system.
At this point, the HSP says "Sure, give me a second".
While it seems very cool (and it is pretty cool), this is just an expected request-response pair, and you should be able to handle this without much issue.
HSP Asks for appointment time
At this point in the conversation, we can expect the HSP to ask for the time of the appointment. The agent responds with the first choice time.
In reality, this response should not be hard coded into the Text Response area. Rather it should be coming back from the webhook based on the schedule of the Google Assistant Owner (GAO) account. But for our purposes it is sufficient right now.
No availability at the first suggested time
At this point, we handle the possibility that the HSP says they are not available at the time that Duplex says. Note that the HSP is suggesting an alternate time, which is processed on the webhook end to see if it is still a time which is feasible for the GAO.
The webhook parses the value coming in the time parameter, and checks to see if it is between 10 AM and 12 PM. If it is, then it suggests that the time is fine and moves on to the next step in the conversation.
If the time is outside the range, then it sets the output context to awaiting_confirmation_newtime and asks if there is any availability between 10 and 12.
HSP suggests a new time on original date
Now the HSP looks at the message which says "Do you have any availability between 10 AM and 12 PM?". Note that this message is only generated on the webhook end and isn't shown in any of the screenshots.
Also note that the input context is awaiting_confirmation_newtime, the same as what was set by the webhook. Generally speaking, you should try and avoid updating the context from a webhook. But this is an example where it is OK to update the webhook from the context since there is a requirement to go down completely different branches in the conversation logic based on the input (time) supplied by the user.
Why do we call the webhook?
It is possible that even if this intent is fired, the HSP might have suggested something close to 10 AM or 12 PM but not between. We would need to check this value and confirm that the time is indeed between the range which fits the GAO's schedule.
No other availability on first choice date
When the HSP sees the message "Do you have any availability between 10 AM and 12 PM?", another possibility is that there is no other time available and she has already suggested the last available time slot. Or she might know there is no slot available within the interval specified.
When the HSP says she has no availability, the Assistant will now try and schedule on the second choice date.
HSP asks for haircut type
We add an intent to handle the question where the HSP asks for the haircut type.
Notice that in the conversation transcript, the HSP is asking for the haircut type to decide on a second option for time. The intent is designed to also handle this scenario, but it will also work even if this question is asked earlier or later in the conversation. 
HSP asks for client's first name
This intent is straight forward and here too, the response is not going to be hard coded but will be coming from the webhook. But it is close enough for our requirements. 
HSP confirms appointment
This is an intent which is required for completing the conversation. Here, we expect the HSP to say something which tells us that she has actually booked the appointment. You can make a lot of variations on the training phrase on this one to make sure.
Also notice that when this intent fires, the context changes from awaiting_confirmation to confirmed_awaiting_goodbye.
At this point, the agent is ready to end the call.
Finally, there will be some kind of exchange of messages to mark the end of the conversation.
Rinse and Repeat
You can use a very similar set of intents (but with a different input context) to handle the scenario where the HSP doesn't have any availability at all on the first proposed date. You can see the list of intents you would need in the flowchart.
This article provided an outline of how you might use Dialogflow to create Simplex, a simple clone of Google Duplex.
Where are the fallback intents?
As you might have noticed, I haven't included any fallback intents here. They will certainly be necessary, but you would want to build them after figuring out what kind of fallbacks can be handled in such an agent.
Where are the entities?
You could argue that the haircut type should be an entity, but I have omitted it for simplicity. Plus, it will probably not affect the input/output contexts we are using.
Will this handle corner cases?
It can be extended to handle corner cases. The benefit of using my approach is that it is a fairly flexible system and you can add/remove intents as necessary depending on how many corner cases you wish to handle.
This is too simplistic
Maybe you think this is too simplistic and will not work in the majority of cases. Please send in a sample conversation you have had in real life and I will try and incorporate it into this agent. 🙂
This is way too complex. I can do all this in a single slot filling intent.
I wouldn't recommend doing that, but if you can achieve it in a way which is also easy for you to later debug your chatbot, good for you.
Get the agent ZIP file
You can get the ZIP file for the agent as well as the sample code used in the webhook by joining the MBD Membership course.