@balakrishnav
As I mentioned in my first post the mechanism for extracting specific data from a user’s utterance is an entity. Your dialog has to have an appropriate entity in the flow, and even if you are going to use an NER model then those entities have to exist because their presence is needed in NER annotation process.
So for scenario 1, then you will need an entity of type Custom where you can add the regex to it - [A-Za-z]{4}\d{3}. And that is all you have to do, you don’t need to do any extra training. If you are feeling keen then you could annotate the ML training for the dialog intent to highlight values for the NER model, but I personally would not bother as it is not going to gain you anything. One of the goals of the Kore.ai platform is to reduce the amount of training a bot developer has to do, hence entities have processing behind their simple node definition.
Scenario 2 is a little different and there are, perhaps, three different ways it could be implemented. (In the Kore.ai there is always more than one way of doing things!)
The first step is that you are asking the user a question, “Do you have a Fever or a Cough?”. The standard way to interrupt the dialog flow to ask a question and get a response is with an entity.
The next decision is the type of entity, and for this case where the user is deciding between one of a set of values then you would use an Enumerated List of Items. The list of items entity (note some of use also use the term “LoV”), has an additional configuration section where you define the list of items and the set of synonyms that indicate that value. At run-time the platform will use those synonyms to determine which choice to select. In this example, just the words “fever”, “high temperature” and “cough”,“wheeze” could be sufficient as it is generally not best practice to use too many words that are not explicitly relevant to avoid false positives.
Again, there is no need to train an NER model because the platform and the LoV entity will be able to find those words anywhere in a user’s utterance.
Now the LoV entity will work well for many cases, but one of your examples, #4 “No fever, just cough is there” would likely generate an ambiguity and a subsequent prompt for the user to clarify their choice. The LoV synonyms are all positive selection and so you cannot really indicate the negative situation. Now you could just decide that this is an edge case and the clarification prompt is OK; consider a more ambiguous case “no, fever” where the comma downplays the role of no in negating “fever”. It starts to get very tricky.
That leads to two other options that utilize sentence training (or patterns) which may be closer to your desire to always use ML. Those are traits and subintents.
Subintents are a technique to identify something within a dialog that is not easily handled by an entity, so using them to identify an entity can be a little cumbersome though it is possible. But it is not the way I would recommend, and given ML’s propensity to try to find answer you will run the risk of false positives. In the interests of brevity I’m not going to describe all the steps here.
That leaves traits. These are somewhat misunderstood and sometimes confusing, tool within the Kore platform. At a basic level, traits are just another way of identifying something potentially useful from a user’s utterance. There is no commitment to use whatever the traits engine identifies, but they can be help with providing additional depth of meaning. Every user utterance is always sent to traits engine and the context.traits array in the context object is updated with any results, which then can be referenced in script or connections.
Traits are grouped into a confusingly named “Trait Type”. Of the traits in a group then the engine will only identify one of them. So if you have a single classification of things (fever OR cough) then you would have 2 traits in a single trait type. But if both traits are equally valid (fever AND cough) then you would have 2 trait types with a single trait in each one. The individual traits are then trained either with sample phrases or patterns.
Now traits don’t force anything to happen, they are just a background identification scheme, so in the flow you still need to have a prompt to ask the user a question. Therefore an entity of some kind is still required, and so this is what I would probably do:
- Add a List of Items entity to the dialog, like I described above, with simple and obvious synonyms.
- Define this entity as optional - you still get the prompt (if needed) but if the user doesn’t enter a known value then the flow carries on.
- Define two traits in a single group to cover the scenarios where the user is vague or uses idioms to describe their symptoms.
- The entity transitions to a script node where we can check for either an entity value in context.entities. and/or a trait in context.traits. If there is nothing then you can always transition back to the entity to prompt again.
This way you can use the LoV for the vast majority of utterances that will be simple and obvious (and the UI might present buttons for those choices as part of the prompt) and then traits can be used, if and when needed, to fill in the gaps for the irregular and idiomatic expressions.
One thing you don’t mention is that fever and cough are not necessarily mutually exclusive symptoms - a user could say “I have a fever and a cough”. In this case then you can turn on the multi-item switch in the entity and separate the traits into distinct types.