Recommendations for machine learning training in NLP Engine


(Madhu) #1

Here are a few recommendations to be followed when training your bots for machine learning within bots platform.

  • Give a balanced training. For all the intents that the bot needs to detect add an approximately same number of sample utterances. A skewed model may result in skewed results.

  • We recommend providing at least 8-10 sample utterances against each intent. The model with just 1-2 utterances will not yield any machine learning benefits. Ensure that the utterance is varied and you do not provide variation that uses the same words in different orders.

  • Avoid training common phrases that could be applied to every intent, e.g. “I want to”. Ensure that the utterances are varied for larger variety and learning.

  • After every change, train the model and check the model. Ensure that all the dots in the ML model are diagonal (in the True-positive and True-negative) quadrant and you do not have scattered utterances in other quadrants. Train the model until you achieve this.

  • Regularly train the bot with new utterances

  • Regularly review the failed or abandoned utterances and add them to utterance list against a valid task/intent

(Andy Heydon) #2

What is the best thing, the bees knees if you will, is to try and avoid using overly long, rambling and verbose training sentences to provide an example of what a user might say for an intent

  • It is very unlikely that a user will say something that matches a very long sentence.
  • Long sentences “bury the lede” of what is most important in aiding and identifying an intent.
  • The NL engine will focus on unique words, e.g. “bees knees”, “rambling” that will lead to false positives.
  • Longer sentences interfere will other intents because they have more matching words, and that reduces the confidence in those other intents.
  • Don’t be afraid to edit utterances added to the model via unsupervised learning.

The end of the month is coming up and I am about to be paid. I want to make a trade.

  • Intent matching occurs within a sentence, that is words are not cherry picked from several input sentences to try and find a match. An intent is something that is expressed in a single sentence.
  • Each input sentence is evaluated against the ML model individually, so it is impossible for the multiple sentences to match in their entirety.
  • There is an increased likelihood of false positives - the end of the month doesn’t directly suggest the user wants to make a trade.

Hey, can I pay my bill

  • A variation of the multiple sentences warning is with the use of interjections, like “hey”. Interjections, particularly on a voice channel, are partly a way for humans to lead gently into the conversation.
  • Kore splits interjections into their own sentence and they are normally ignored if there is something else significant in the utterance. If the user says “Hey, I want to pay my bill” then the bot will start the “Pay Bill” intent, but if the user has said just “Hey”, then the response would have been “Hello”.
  • Virtually everything uttered by a user could start with an interjection so from an ML training perspective they add nothing and should be removed.
  • Note that interjections cover more than just exclamations like “Hey” but also things like “yes”, “I’m sorry”, “please”.

(Simon Fellows) #3

Great thread gents. Agree the minimal the better but I also find some oddity.

I find the Intent utterances can be minimal where the exact word is used in the Intent, however I find the utterances for synonyms of that intent need ot be more explicit, I may be setup incorrectly though.

For example.

Manage Bloomberg as the intent and it picks up the variances of manage without need to add utterances. If I have a Synonym for Bloomberg of BB, then I need to be more specific with the utterances loaded or they get ignored.

I still haven’t figured out the best solution on that one yet.

(Andy Heydon) #4

Unfortunately at the moment, the ML engine does not pick up the bot level synonyms, and so you have to supply the different variations in the training sentences.