Sub Entity Value is not getting Identified for String Type Sub Entities

tdeshappriya · February 6, 2023, 5:58am

I’m working on a scenario which bot will ask the firstline of the address and postal code from the user in a one single entity. So below is the dialog between user and the bot.

Bot: May I have the first line of your address and the postal code?
User: My firstline of the address is Park Road and the postal code is 123456

So I have created two subentities with the below configurations

EntityAddressFirstLine
Entity Type: String
Pattern: My firstline of the address is * and

EntityPostalCode
Entity Type: Number
Pattern: and the postal code is *

EntityValidateAddress
Entity Type: Composite
Composite Pattern: My firstline of the address is @EntityAddressFirstLine and the postal code is @EntityPostalCode

But the user’s input is not getting identified from the bot. I’m getting the error “Invalid user input, Entity value not identified” in the debug.

andy.heydon · February 7, 2023, 12:00am

Hi @tdeshappriya

Using string entities in a composite entity are tricky because the very nature of strings is that they are greedy and try to grab as much as they possibly can. The challenge is knowing where to start and stop, which is where entity patterns come in, but you have to be careful to not be too specific. Remember that a user can say anything and most likely will, so the more specific the patterns are, then the smaller the range of user utterances that they will match, and the downside of that with string entities in a composite is that can end up extracting the entire sentence leaving nothing else for the other subentities. That is basically what is happening here.

Given the user’s utterance is:
My firstline of the address is Park Road and the postal code is 123456

This will be tokenized and parsed by the NL engine as:
My first line of the address is Park Road and the postal code is 123456

Note the second word is spell corrected into two words. The ripple on effect is that the EntityAddresFirstLine pattern will not explicitly match, and the composite pattern for EntityValidateAddress won’t either because they are defined with one word. All patterns are evaluated against the cleaned up and corrected version of the user’s utterance, not the raw version. So changing the patterns to have first line as two words will make this scenario work, as well as when the user has correctly spelled the words individually.

But I want to reiterate that the style of these patterns, with long phrases, is not a sustainable style. For example, the very simple variant of “The firstline of the address is Park Road and my postal code is 123456” will fail to be matched!

The second point, if you are moving beyond a simple proof of concept, is that this pair of entities don’t make a great composite entity use case. A composite entity is meant to capture values that are close and connected to each other such that the composite patterns are just the names of the subentities in the order that they would appear. Think “large cheese pizza” or “Honda Civic”. You can add static words to the composite pattern but the more you have to add then the fewer the potential matches. Composite patterns have an element of inherent flexibility in that they can match with up to two additional words between each mentioned subentity.

This close and connected characteristic depends on whether the user is always going to say them at the same time (because they are connected in their mind). I don’t think that would hold up here. It would only take the user to mishear or misread the prompt to response with just one of the elements and then the composite will fail to match.

So I think you would be better served to treat this scenario with two distinct entities, a zipcode entity (because it is more constrained than a general number) followed by a string entity. The NL platform will always consider every sentence when looking for an entity so you don’t have to be concerned about extracting them all at once. You can still use entity patterns to guide the NL engine as to where to start looking, but a lighter touch is probably better.