Text-to-Speech & Speech-to-text Implementation

One of the best features of chatbots is the capability of recognizing and translating voice-based commands. It is about the simplicity of speech-to-text (STT) and text-to-speech (TTS)based communications.

Kore.ai’s Bots Platform includes an automatic speech recognition (ASR) engine to enable voice-driven interactions between bots and users. It also allows your chatbot to communicate outside of traditional text interfaces or messaging applications

This extends your chatbot’s capability to understand voice commands in channels like a website or mobile app, where speech-to-text functionality isn’t typica.

It is one’s option to either use Kore.ai’s ASR engine or Google speech engine. In order to use Google speech engine one needs a valid Google speech API key. (Place it in ‘web-kore-sdk/libs/speech/key.js’)
allowGoogleSpeech will use Google cloud service api. Google speech key is required for all browsers except chrome.
image

Here are the implementation steps on how to enable TTS and STT in Web-sdk

Step 1: Download the lastest web-kore-sdk from Github https://github.com/Koredotcom/web-kore-sdk and please refer to https://developer.kore.ai/docs/bots/sdks/kore-ai-web-sdk-tutorial/

Step 2: Under the file web-kore-sdk-master > UI > index.html
a. Uncomment the lines for Google Speech.
image

b. Check the botoptions speech url and tts socket url as defined in the screenshot
image

c. check the chatconfig options to be true for isTTSEnabled, isSpeechEnabled, allowGoogleSpeech and autoEnableSpeechAndTTS to be true.
image

Step 3: Add the generated Google Speech API key in the web-kore-sdk/libs/speech/key.js
For generation of the Google Speech API key please follow https://cloud.google.com/text-to-speech/docs/quickstart-protocol and https://cloud.google.com/speech-to-text/docs/quickstart-protocol
image

Step 4: Under the file web-kore-sdk-master > UI-JAVASCRIPT > index.html
a. Check the url’s present under the botoptions to be the same as the one’s present in the screenshot.
image

b. Check the isTTSEnabled and isSpeechEnabled under the chatconfig to be true

Step 5: Save all the files and open the index.js on any browser. You will be able to see the microphone (Speech to Text) and speaker (Text to Speech) option underneath of the chatbot.
On clicking on the microphone option, it will take the input from the speech given by the user.
Text to Speech will work when the speaker button highlighted in the screenshot is unmuted. It will take upto 30 seconds of time for the bot to start reading out the message.
image

To change the kore chatbot voice option (male to female), please refer to Change of voice(male to female) for chatbot
For Third party integration, refer to Can we integrate third party speech to text API

2 Likes

Thanks for this, is the ASR engine working on the on-premise installation?
If so, how can I enable it for the Web Channel?

Thanks,
Jose