Synonyms - Why do we add escape characters in synonym value and why avoid special characters in synonyms?

While creating bot, I have entity with type list of items (lookup).
In that, while specifying synonyms, I have added escape character in synunym values as shown:
synonyms: [
““abc xyz””,
““1234 xyz””
]

I am unable to recall why this was added.
Can you please explain

  • why we are adding escape character and
  • why do we remove special characters from synonym values?

hi @Subrahmanyam,

Anything on this?

Hi Neha,

Please elaborate your requirement/use-case so that we can provide an appropriate solution.

Regards,
Yoga Ramya,
Kore Support Team.

1 Like

Hey,

  1. I have one entity node where I have specified it’s type as List of Static values.
    While adding values to JSON for static list, I have added “” character in synonyms values as shown:
    {
    “name”: “test one”,
    “value”: “test one”,
    “synonyms”: [
    "“test 1"”,
    ““test one””
    ]
    },
    {
    “title”: “test two”,
    “synonyms”: [
    "“test 2"”,
    ““test two””
    ],
    “value”: “test two”
    }

Is the escape character really required?

  1. And we also remove special characters like “#,$,*,! etc” from title and synonyms. Will the bot not work if we add?

Regards,
Neha Sheikh

Hi @nehamsheikh,

We have tested the scenario quotes “” and by removing the quotes.
There is no functionality difference in the Synonym values.

Also, we have tested by having special characters in the Name values. The functionality working as expected.
Please refer the attached screenshots
img1 img2

However, if you are providing the data to the enumerated list using a script node, then the quotes symbol is not accepted in the script editor. (given ““Neha””)
It ideally expects a single pair of quotes around a value like “Neha”

img3

Let us know if you need any clarification on the above.

Regards,
Yoga Ramya.

There seems to be some confusion here with double quotes.
In JSON syntax, strings need to be enclosed in double quotes which is ASCII character 34 (0x22), see http://json.org/.

This means that for a double quote to be part of the value then it needs to be escaped by a backslash (there are a handful of other control characters that need to be escaped too, but those are generally not appropriate for NL applications).

Any other character is treated as-is. From the samples above then it looks like other characters that look similar to double quotes, but are not ASCII 34, are being used. Unicode has several “double quote” variations and applications (e.g. Word, Outlook) like to use them because they look “nicer”. Those will be handled as that specific Unicode character in the JSON parsing and also in the synonym parsing.

Do you mean all special characters are supported by Bot?

Special characters - !@#$%^&*()_+{}|":>?<
Title value samples - Containing multiple words with combination of special characters
“value 1 ! & value 2 @ value 3”
“value 1 #$ value 2 (45)”
“value 1 , values 2, value % value1”

Please confirm.

Hey,

So as far as I understood

  1. Double quotes with escape characters are added in synonyms for exact match as shown:
    image

  2. Without escape characters, double quotes won’t be added to synonyms value and it will perform approximate match as shown:
    image

Please confirm if I got it right?

Regards,
Neha Sheikh

Hi @nehamsheikh,

We have checked the special characters as mentioned above.
The scenario is working as expected.
Please refer the screenshot below:
entity

Let us know if you need any further clarification.

Regards,
Yoga Ramya.

Thanks for confirming.

So we can have any special characters in entity title/synonyms.
Bot will map the value entered in user utterance correctly and work.

There is no exception in special characters we well, right?

Regards,
Neha Sheikh

The challenge with special characters is that they are not a normal part of conversation and people are pretty relaxed in how they use them.

In general, characters other than the sentence terminators (period, question mark, exclamation mark) are treated as a word separators and they are parsed to their own individual word. There are exceptions to that of course, for example a percentage symbol after a number, well known currency symbols, hyphenated words.

Often people see some symbols as interchangeable, .e.g. forward and backward slash.
Other times people might not supply them because they cannot remember how something is formatted, e.g. should that contract reference be partitioned by a slash or a hyphen?

Also note that some symbols are shorthand for words, so standalone “&” is expanded to “and” and “@” to “at”.

So I would recommend to be try and be relaxed over special characters if you want to have the greatest success of matching against a wide variety of user input. The system will create several variations of synonyms with and without punctuation to try and be as inclusive as possible.

1 Like