Recommendations for machine learning training in Kore.ai NLP Engine

Madhu · June 4, 2018, 1:52pm

Here are a few recommendations to be followed when training your bots for machine learning within Kore.ai bots platform.

Give a balanced training. For all the intents that the bot needs to detect add an approximately same number of sample utterances. A skewed model may result in skewed results.
We recommend providing at least 8-10 sample utterances against each intent. The model with just 1-2 utterances will not yield any machine learning benefits. Ensure that the utterance is varied and you do not provide variation that uses the same words in different orders.
Avoid training common phrases that could be applied to every intent, e.g. “I want to”. Ensure that the utterances are varied for larger variety and learning.
After every change, train the model and check the model. Ensure that all the dots in the ML model are diagonal (in the True-positive and True-negative) quadrant and you do not have scattered utterances in other quadrants. Train the model until you achieve this.
Regularly train the bot with new utterances
Regularly review the failed or abandoned utterances and add them to utterance list against a valid task/intent

andy.heydon · June 18, 2018, 6:10am

What is the best thing, the bees knees if you will, is to try and avoid using overly long, rambling and verbose training sentences to provide an example of what a user might say for an intent

It is very unlikely that a user will say something that matches a very long sentence.
Long sentences “bury the lede” of what is most important in aiding and identifying an intent.
The NL engine will focus on unique words, e.g. “bees knees”, “rambling” that will lead to false positives.
Longer sentences interfere will other intents because they have more matching words, and that reduces the confidence in those other intents.
Don’t be afraid to edit utterances added to the model via unsupervised learning.

The end of the month is coming up and I am about to be paid. I want to make a trade.

Intent matching occurs within a sentence, that is words are not cherry picked from several input sentences to try and find a match. An intent is something that is expressed in a single sentence.
Each input sentence is evaluated against the ML model individually, so it is impossible for the multiple sentences to match in their entirety.
There is an increased likelihood of false positives - the end of the month doesn’t directly suggest the user wants to make a trade.

Hey, can I pay my bill

A variation of the multiple sentences warning is with the use of interjections, like “hey”. Interjections, particularly on a voice channel, are partly a way for humans to lead gently into the conversation.
Kore splits interjections into their own sentence and they are normally ignored if there is something else significant in the utterance. If the user says “Hey, I want to pay my bill” then the bot will start the “Pay Bill” intent, but if the user has said just “Hey”, then the response would have been “Hello”.
Virtually everything uttered by a user could start with an interjection so from an ML training perspective they add nothing and should be removed.
Note that interjections cover more than just exclamations like “Hey” but also things like “yes”, “I’m sorry”, “please”.

simon.fellows · June 28, 2018, 7:14am

Great thread gents. Agree the minimal the better but I also find some oddity.

I find the Intent utterances can be minimal where the exact word is used in the Intent, however I find the utterances for synonyms of that intent need ot be more explicit, I may be setup incorrectly though.

For example.

Manage Bloomberg as the intent and it picks up the variances of manage without need to add utterances. If I have a Synonym for Bloomberg of BB, then I need to be more specific with the utterances loaded or they get ignored.

I still haven’t figured out the best solution on that one yet.

andy.heydon · June 29, 2018, 9:25pm

Unfortunately at the moment, the ML engine does not pick up the bot level synonyms, and so you have to supply the different variations in the training sentences.

sameera.tumuluri · October 12, 2021, 10:07pm