ASR Key Items

In addition to the best practices list on the ASR overview page, the following sections expound on certain key items for creating an effective ASR experience. NICE CXone uses the v11 Nuance Engine for automatic speech recognition, therefore complete documentation and instruction for utilizing this engine can be found through Nuance support. An existing understanding of automatic speech recognition (and the Nuance engine) is crucial for creating a proper ASR-enhanced IVR system.

Tuning

Tuning should only be turned on if you are actively tuning your IVR. Leaving the tuning feature on causes immense bloat and stress on the server, as each interaction creates a new audio file.

Tuning is designed to pinpoint a specific menu to receive data on how the ASR Studio actions are performing. The basics of tuning entail identifying what your callers are commonly saying (or how they're saying it), listening to recordings to understand the commonalities in the interactions, potentially amending your grammar file(s), and potentially adjusting confidence values. You can turn tuning on with a Voiceparams action by setting the ASRTuningEnabled property to True. Doing so enables tuning on all ASR actions after that point; likewise, setting that property to False disables (or "turns off") tuning. You can adjust an action's confidence values directly in the action's properties, or you can set script-specific confidence values through a global variable with an Assign action (see drop-down below).

The ASR tuning report displays response rates for ASR actions that fire in a script. This report is broken-down by action and each confidence branch setting. If you have tuning turned on, you can expand these sections and listen to recorded audio files from that segment.

If you have any PCIE (personal identifiable information) data that might be captured in the IVR, you may want to tailor which sections of your IVR to record to avoid issues with capturing personal data.

Assign script-specific tuning parameters for your Nuance ASR actions.

Set a global dynamic data variable in a Snippet action named nuanceTuningParamsJson. Its value must be a valid JSON string containing the Nuance parameters to be defined from their defaults. For example:

DYNAMIC asrParams
ASSIGN asrParams.sensitivity = "87"
ASSIGN asrParams.Speech_Complete_Timeout = "1000",
ASSIGN asrParams.Speech_Incomplete_Timeout = "1000"
ASSIGN asrParams.No_Input_Timeout = "1000"
ASSIGN global:nuanceTuningParamsJson = "{asrParams.asjson()}"

If any parameters are set to invalid values, the invalid value will be replaced with the default for that parameter, and a variable called invalidParamsList will be returned listing those values that were changed. The following tables provide the possible tuning parameters:

ASR Parameters Supported in Studio

Nuance Parameter	Description	Studio Support
Speech_Complete_Timeout	How long to wait before concluding that a caller is finished speaking.	Supported using nuanceTuningParamsJson. Default: "Speech-Complete-Timeout" : "0"
Speech_Incomplete_Timeout	Duration of silence to determine that callers have finished speaking.	Supported using nuanceTuningParamsJson. Default: "Speech-Incomplete-Timeout": "1500"
No_Input_Timeout	How long to wait for speech after a prompt ends.	Supported using nuanceTuningParamsJson. Default: "No-Input-Timeout": "7000"
sensitivity	Sensitivity of the speech detector when looking for speech.	Default: 50 (scale of 0-100)

Unsupported Nuance Parameters

The following nuance parameters are not supported in CXone since the Studio Asr action plays prompts instead of Nuance.

Nuance Parameter	Description	Default Value
swiep_suppress_barge_in_time	Disables barge-in briefly at the beginning of a prompt.	0 (no delay)
swiep_in_prompt_sensitivity_percent	Controls how loudly callers must speak to interrupt prompts (barge-in) and detect speech.	50 (percent)
swirec_barge_in_mode	Sets special recognition mods in the recognizer.	normal

Grammar Files

A grammar file is one of the most effective methods of increasing the accuracy of your ASR-infused IVR. DTMF Signaling tones that are generated when a user presses or taps a key on their telephone keypad. only offers 12 options (12 tones). ASR analyzes actual human interactions, thereby exponentially increasing the amount of options that your system recognizes. For example, if you attempt to capture an identification number, a common response from a contact would be "My member number is 123456789". An ASR-enhanced script would recognize the entire "My member number is 123456789" utterance and specifically capture the number. Other scripts would fail when the contact began with "My member number is...". This enhanced customer experience, however, requires some customization and tailoring in the background, which a grammar file provides.

Grammar files allow you to list the variety of different possible utterances that a contact might speak in response to a prompt. The Nuance engine would then match the contact's response with an entry of the grammar file. Since the ASR engine must find a match for a full utterance (otherwise it will fail), using a grammar file gives the Nuance engine a focused list of utterances from which to choose.

This focused list enhances accuracy by taking into consideration "extraneous" additions to a response (such as "my member number is..." or "I think it's...") and also limits the number of possible matches that the engine can make. Grammar files help limit the amount of permutations in utterances. Longer strings exponentially increase the possible responses you might capture (especially with alphanumeric strings). If your IVR asks for a 2-character alphanumeric string, you might receive 1300 possible responses. If your IVR asks for an 8-character alphanumeric string, you might receive 3~ trillion possible responses. 3 trillion responses is unrealistic to properly manage, therefore grammar files are the way to go. You can significantly limit the scope of exponential possibilities, and focus the possibilities into the list that you determine are acceptable (and possible) responses.

An important distinction to make, is that while grammar files + ASR create a result that is similar to a natural language processor (NLP), they are not. ASR is like a bridge between DTMF and NLP — it's not meant to capture everything, but it is meant to capture most things. This is why grammar files are especially important and effective.

Most of the ASR actions contain built-in grammars, excluding Asrsql, Asrcompile, Asr, and Asrmenu. You can still create and use your own grammar files in accordance with a built-in grammar.

Grammar File Tips & Tricks

Grammar files should be used for most ASR Studio actions.
Grammars are language-specific; you can reference a language in the header of the file so that the engine specifically looks for utterances in the specified language. If you do reference a language, the entries must use the same alphabet, sentence structure, and so forth as the referenced language. For example, if you were to use the word "piñata" for a Spanish-specific grammar, your entry must use the tilde symbol (~)over the "n"; the entry must be "piñata" and not "pinata".
Symbols cannot be used in the utterance of a grammar file, but can be returned with the value.

Grammar File Examples

Below are 3 different grammar files for you to download. These examples illustrate the "rule approach" for creating the structure of a grammar file. These examples show the method of using 3 rules in a grammar file, a prefix, the main grammar, and a suffix. Prefixes are utterances people often say before giving the main body of info, like "it is", "um", or "I think it is". Suffixes are little additions at the end of an utterance, like "I guess" or "maybe". The middle rule is the actual grammar where you can define all of the possible entries for the data that you want to collect, like colors, numbers, or models.

Color_Grammar_Example.grxml

Digits_Grammar_Example.grxml

Format_Grammar_Example.grxml

Important Parameters

Confidence Parameters

Confidence parameters indicate the accuracy by which the Nuance engine recognized and matched a contact's utterance. Utterances can fall into 3 different confidence categories: High, Minimum, and No Confidence. ASR actions also contain branches for each confidence parameter to conveniently allow you to customize the user experience and deal with accuracy variability. Confidence variables are types of system variables and therefore do not appear in a script trace unless you enable system variables to appear in the trace.

Confidence values are affected by factors like background noise or conversations, accents, or spelling of grammar file entries.

MAX offers a method of sensitivity-customization if an agent is assigned a Personal Connection skill though the voice threshold setting to assist in measuring and filtering out levels of background noise, the agent's voice detection, and so forth.

As mentioned in the Tuning section above, part of your tuning process should include analyzing your caller-base to understand common idiosyncrasies of speech, like pronunciation. One method of increasing accuracy is to add phonetic spelling entries in your grammar file(s). For example, if one of your entries is "fungi", you could also add "fun guy", "fun jee", or "fun gee" as entries to handle the different possible pronunciations of the word "fungi".

Timeout Setting

The length of time that the action will sense an utterance and attempt to find a match; the default duration is 10 seconds.

Intervoice Timeout Setting

An amount of time that the system will wait after a contact stops speaking to make sure that the contact does not continue speaking (like InterDigitTimeout). For example, when providing an account number, humans will speak groups of numbers at a time — "123 <pause> 456 <pause> 789<pause>". The <pauses> in the preceding example represent the intervoice timeout. The default value is 3 seconds, therefore, when creating or tuning a script, remember to account for the time for a contact to speak, the intervoice timeout time, and a small amount of time for processing — too many timeout settings might stack on top of each other to result in a failed action.

Errors

Error	Description
ASR Initialization Failed	The media server is unable to contact the ASR server. This could be caused by several reasons, including the ASR service not running or ports that are not open.
Grammar File Error: Grammar could not be compiled. Please check your grammar for syntax errors.	Typically caused by xml issues with the grammar.
URL Failure. Recognizer was unable to access the specified URL	Grammar does not exist, was not referenced correctly, or the file server could not be reached.
ASRRESULT	Determines if ASR was detected or not.
ASRCONF	The resulting ASR confidence value, 0-100.
ASRCOMPLETIONCAUSECODE	Indicates ASR completion.
ASRERRORMESSAGE	A textual description of the error as reported by Nuance.
ASRSTATUSCODE	Indicates the status with one of the following values: ASR_STATUS_WAITING = 100 (TCP open is still waiting) ASR_STATUS_OK = 200 ASR_STATUS_DTMF = 298 ASR_STATUS_RECOGNITION_FAILED = 299 ASR_STATUS_MALFORMED_CONFIDENCE_RESULT = 300 ASR_STATUS_CLIENT_ERROR = 400 ASR_STATUS_SERVER_ERROR = 500 ASR_STATUS_SERVER_ESTABLISHMENT_FAILED = 590 ASR_STATUS_SERVER_SELECT_WSAEINTR = 591 ASR_STATUS_SERVER_CLOSED_TCP_CONNECTION = 592 ASR_STATUS_SERVER_TCP_RECV_FAILED = 593 ASR_STATUS_NO_RELAY_LINE_AVAILABLE = 594 ASR_STATUS_SERVER_TCP_OPEN_TIMED_OUT = 595 ASR_STATUS_SERVER_RESPONSE_TIMED_OUT = 596 ASR_STATUS_MAX_SESSIONS_EXCEEDED = 597 ASR_STATUS_DUPLICATE_ENABLE_REQUEST_ERROR = 598 ASR_STATUS_INTERNAL_ERROR = 599 ASR_STATUS_STOPPED_BY_MEDIA_CHANNEL = 998