Facing ambiguity issue with List of Values entity type

(Baba Farooq Basha Renumanu) #1

When the entity synonyms are identical, the ideal way of defining the Synonyms is to separate each synonym in double quotes ("") and a comma separated value. Once the synonyms have been defined in double quotes ("") the NL Engine will look for exact match of the user input rather the relative match and avoid the ambiguity scenarios.

(Andy Heydon) #2

Be careful with putting synonyms in double quotes may be overly restrictive.

Consider the following choices:

  1. Good Plan
  2. Better Plan
  3. Best Plan

If the user only utters “I want a plan” in response to an entity prompt of “Which plan?” then all three choices are presented because nothing unique can be gleaned - technically the NL engine scores each choice the same because they all match the same word, “plan”.

But if the user had responded “the best one”, then the NL engine would have scored choice #3 as the winner because that is the only choice that mentions any input word.

Now if each of the synonyms for those choices were in double quotes, “Good Plan”, “Better Plan” and “Best Plan”, then when the user said “the best one”, nothing would be matched because the double quotes mean that those words in that exact order have to be part of the input.

It is recommended to reserve the use of double quotes to those multi-word synonyms that should truly only appear together, e.g. “credit card” and you want to avoid a false positive match on a single word.

(Andy Heydon) #3

As an example of how quotes are useful though, consider this situation I encountered recently. The bot developer was complaining that the system was not presenting an ambiguous when the input only contained “account”.

The entity had 4 choices:

  1. Savings Account
  2. Checking Account
  3. Savings
  4. USD Savings

None of these choice synonyms used quotes, which means that each word is scored and assessed separately. Note that “Savings” features in three of the four choices (though I don’t know why the choice #3 is present :wink:), which means that the relative significance of it is low, it is not a good indicator of a choice.
The word “Account” is a better indicator as it is only associated with two choices, whereas “Checking” and “USD” are the best because they are unique to a choice.

Part of what makes the system select “Savings Account” as the unambiguous choice is that the word “Account” is more significant for the first choice than “Savings”, but for “Checking Account” then it is not nearly as significant as “Checking” in the input would be.

One solution would be to enclose the first two choices in quotes so that both words would need to be present to match that choice and to ignore “Account” by itself.
Another solution would be to remove the third choice entirely - how is it different from the 1st and 4th options? Removing it will re-balance the relative importance of each word and an input of “Account” would yield an ambiguous list of the first two choices.