Handling Sentence Splitting in the FM Engine

This article explains how the Kore.ai Assistant processes user utterances containing periods (“.”) during natural language understanding (NLU).

Understanding Sentence Splitting

The Kore.ai Assistant utilizes a two-step process to interpret user input:

  1. FM Engine Pre-processing:
    The FM engine performs initial processing on the utterance.
    It analyses the sentence structure and identifies potential sentence breaks.

  2. Entity Identification:
    Based on the pre-processed output from the FM engine, the system identifies relevant entities within the utterance.

Current Limitations (Case 1):

  • Utterances with Period and Space:
    If a user’s utterance contains a period followed by a space and additional text (e.g., “my laptop is not working. please help”), the FM engine currently splits the sentence into two parts during pre-processing.It then processes each part sequentially for entity identification.
    This can lead to inaccurate entity identification, as the context of the entire sentence is not considered at once.

Solution and Future Enhancements:

  • Feature Request (FR) in Progress:
    We acknowledge this limitation and have an active Feature Request (FR) to address it.
    The plan is to bypass the FM engine in specific scenarios and interact directly with the Machine Learning (ML) engine for improved sentence understanding.This will allow the system to consider the entire utterance for entity identification, even if it contains a period followed by a space.

Expected Behavior (Case 2):

  • Utterances without Space after Period: For utterances where a period is followed immediately by text without a space (e.g., “Kore.ai” or “google.com”), the FM engine does not split the sentence during pre-processing.This is because the lack of space indicates a single sentence or a potential website URL.This behavior is expected and aligns with the intended functionality.

Summary Table:

Scenario Utterance Example FM Engine Splitting Behavior
Case 1 (Current Limitation) “my laptop is not working. please help” Yes (Splits into two parts) May lead to inaccurate entity identification
Case 2 (Expected Behavior) “Kore.ai” or “google.com No (Treats as single sentence) Expected behaviour