We are using an airports entity node for one of the use cases and have observed a very strange behavior. When the user prompt includes “goa” as destination the entity type fails to recognize it as goa city and accordingly provide the relative airport. However when the user prompt inclludes “Goa” as the city then it recognizes it. The settings are set not to auto-correct. any thoughts on this behaviour?
Hi @deepak.bhattad1 ,
Please use the City Entity, To achieve your requirement, this might be helpful for you.
Thank you,
Srujan Madderla
Kore.ai Community Team
We need the airports entity as we need to identify airports from the utterance from the user.
This is a tricky situation. The problem is that GOA is the IATA code for Genoa Cristoforo Colombo airport, so the question is what should be interpreted when the input is:
- GOA
- Goa
- goa
The first will be Genoa because the IATA code matches.
The second will be the city of Goa, India because the user has gone to the trouble of using the correct casing.
The third is ambiguous because there is no clear hint. Now in our tests, “goa” returns GOI, Goa International because it is the highest “ranked” airport. We are using a popularity ranking so that, for example, if someone types Orlando as an airport then the entity will return MCO instead of the several other local regional airports in the Orlando area.
GOA/Goa is the only case of city name matching a different airport IATA code.
Your question implies that your airport entity is not recognizing “goa” as GOI, so can you provide more details please?
Hello Andy, Thanks for that response and more details.
In our case, while Goa works fine and identifies GOI (expected response) the word “goa” does not recognize anything at all. It rather prompts the user to reenter the destination again. So basically the word goa is going unrecognized as an entity itself.
Please do note that this is observed in Version 9.X on an On Prem server while the same works fine and as exepcted on the latest Version 10.X on the cloud. Could there be a difference between the 2 versions and more importantly between an on-prem version vs on the cloud version?
There should not be a difference between an On-Prem server and cloud. There is no code difference between the two deployments.
The airport entity has not changed between R9 and R10. We have specific tests for these exact GOA/Goa/goa scenarios and they have not failed in years.
So perhaps you can describe a bit more about your situation:
- What is the actual utterance?
- Is this a simple dialog, are there other entities before and/or after the airport?
- Is “goa” is response to a specific airport entity prompt?
- Are there any entity patterns in use?
- Is NER is use?
Thanks Andy for going through this in further detail. My response to your queries:
- The utterance is “What are the next available flights to goa?”
- The only other entity that we are capturing is Flight #. We have also used an entity node for seeking a confirmation from the user - "if he has the flight number handy? ". The reason why we are not using a confirmation node and rather an entity node is because the requirement is that the bot should not wait for Yes or No and rather taken an intent and map it with an utterance or in this case take the flight number and map it to the flight number entity.
- goa can be an utterance or when the user is prompted for destination if he does not have the flight number
- For airports we are not using entity patterns but are using it for flight number.
- NER is also in use.
Some background on entity processing.
Entities with entity patterns that match something in the user’s utterance, and entities identified by NER, reserve/protect their matched words until that specific entity is processed. That means that an entity can be prohibited from using some words because a future entity has laid claim to them.
So unbounded (“to *”) entity patterns could block off a large part of an utterance if that pattern matches at the beginning of the text.
NER isn’t perfect, and is probably unnecessary for an airport because the standard processing is pretty comprehensive. NER for an airport just means “look here first”, the actual extraction of an airport value follows the default process, because it is a mapping process - turning words into a JSON object. If the words identified by NER are not understandable as an airport then the entity will ignore the NER data and look somewhere else.
You don’t say what type of entity used is used for the flight number, but be aware that each word in a user’s utterance can only be used by one entity. So if the flight number is a string, then it could grab too much, consuming “goa” and leaving the airport entity with nothing.
Thank you for that indepth knowledge on Entity.
We have used patterns for flight number extraction (flight number entity is a custom entity - regex (example: SG246)) where the patterns are like ~arrival * ; * ~arrival; ~departure *; *~ departure; to; for; ~flight 1; 1~flight. We have used concepts as you may have observed.
These patterns may be causing that issue of goa not being recognized. However the same utternace with Goa recognizes it as City and the corresponding airport. Any thoughts on why a capital letter is being recognized but not a small letter.
Also any thoughts on how to better train the flight number entity to pick up the flight number while also taking care of these capitalisation issues
Given that a flight number is a specific set of characters that is a unique sequence then entity patterns are probably not needed.
In general entity patterns are useful when there are two entities of the same type - a “from” airport and a “to” airport is a classic example, and you need to distinguish between them. Or when the entity value is potentially ambiguous - numbers often fall into this criteria.
A regex is not going to accidentally pick up something different.
Now there are alternatives to a regex if you are using the latest R10.1 code. It is sophisticated but does offer more flexibility, particularly with voice channels. It is centered around the Composite entity and the modelNumber entity rule. There would be two subentities, one for the airline and one for the flight number. The airline itself could be handled by a List of Items that can normalize variations because the user can say 2 letters (SG) or the full name (Singapore).
I’ve looked into this a bit more and the issue is related to the number of words in the utterance. Part of the challenge is that there are many IATA airport codes that are normal and common words, “ARE”, “AND”, “THE” and “NEW” for example. So to avoid false positives then we downplay ambiguous matches in sentences. A specific answer of “goa” to a prompt is OK, but not in a longer sentence.
Now I do think we could do a better job here, for example we aren’t using the hint of an entity pattern and its additional context in that ambiguity as much as we could. So if you raise a ticket with support then we can track this issue.
Sure Andy. I have created a ticket - Request #36066. Hopefully, you also get a chance to look at it and provide some help.
However, in your response above you referred to a composite entity and mentioned 2 sub-entities. I’m assuming one of the entities you are referring to is an airport and not an airline. Airlines will have a different code and I don’t think there is any entity type for airline.
You asked about a better way to train the flight number entity and a composite entity as a model number is a way to do that.
There isn’t a specific entity type for Airlines, but you can use a List of Items for that. An airline can be referred to by a 2 letter code, and sometimes a 3 letter code is used. There are also synonyms for the name in different forms. The List of Items maps those variations to a consistent value. Note you cannot control what the user might say, so you have to be prepared to react to anything.
Now actually you could also use a composite in a different manner for the airport as a workaround. One of the use cases for a composite is to handle fallback. In this case you would have one subentity as an Airport and a second one as List of Items or Custom Concept to handle the special cases that the Airport does not pick up.