Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) allows contacts to respond to IVR Automated phone menu that allows callers to interact through voice commands, key inputs, or both, to obtain information, route an inbound voice call, or both. prompts by speaking, either instead of or in addition to, pressing keys on their telephone. CXone offers ASR as an optional feature using the industry-leading Nuance ASR engine (version 11), which enhances the accuracy of your system's voice recognition and also allows you to record stereo.

ASR is meant to simplify and speed up your callers' experience with your IVR. An ASR-enabled IVR should recognize not only words but also phrases, match them with values you have pre-defined, and route or answer calls accordingly.

Terminology

You should be familiar with ASR-specific usage of the following terms:

Utterance — Words or phrases spoken by a caller in response to IVR prompts.
Grammar file — Provides rules for the ASR engine. It covers the words or phrases callers can be expected to say in response to a prompt, then assigns content to variables based on those responses. This makes the recognition process much more efficient, and gives much higher rates of accuracy. Many ASR Studio actions have built-in grammar files. You can also use custom grammar files, or grammars, for some actions. These are typically written in XML and saved as .grxml files. They should be compiled prior to use in your CXone system.
Phrase list — Provides a simple list of phrases that callers can be expected to say in response to a prompt, one per line. Phrase lists are typically entered using the PhraseList property of a Studio action.
Confidence percentage — Also referred to as recognition percentage. When the ASR engine recognizes a phrase spoken by a caller, it also returns a percentage that indicates how confident it is in interpretation, or matching the utterance to the phrase list or grammar file. The confidence percentage can be used to route calls to different branches in your ASR-enabled IVR script. Confidence levels used in the CXone are:
- High — Confidence percentage is high; typically, 75% or greater. The contact can be routed through the OnHighConfidence branch without any further confirmation of the utterance.
- Medium — Confidence percentage is mid-range; that is, somewhere between high and minimum. The contact can be routed through the OnMedConfidence branch and asked to confirm the utterance.
- Minimum — Confidence percentage is at the minimum acceptable level. This value is typically used to set a bottom number for the OnMedConfidence branch.
- No Confidence — The utterance was unrecognizable and the ASR engine cannot interpret it. The contact can be routed through the OnNoConfidence branch and asked to repeat the utterance.

ASR Actions

For production IVR scripts, Studio offers seven ASR actions designed for specific types of prompts, as well as two more general actions. All of these actions allow you to capture and interpret an utterance, populate a variable based on the utterance, and route the contact based on the variable value, the confidence percentage, or both. Choosing the best action for each prompt will help your scripts process speech effectively. To view ASR actions in Studio, ASR must be enabled in your role. The following is a list of each ASR action:

Asr — Accepts any type of utterance and interprets it based on a custom phrase list or grammar file you provide. This action offers a great deal of flexibility but is also more complicated to set up.
Asralphanum — Accepts an utterance of a combination of letters, numbers, or both (for example, a password or email address). This action comes with a built-in grammar file.
Asrcurrency — Accepts an utterance of a monetary value (for example, a payment amount). This action comes with a built-in grammar file for one or more currencies, based on the language pack for your tenant.
Asrdate — Accepts a variety of utterances related to dates, based on its built-in grammar file. This includes full dates, days of the week, relative date references (such as yesterday), and more.
Asrdigits — Accepts an utterance of a string of digits (for example, a phone number or social security number). This action comes with a built-in grammar file.
Asrmenu — Accepts utterances that you define to create a speech-enabled menu. This action can use a custom phrase list or grammar file, or you can use the branch variables you create for the menu itself as a basis for interpreting the caller's utterance.
Asrnumber — Accepts utterance of numeric values. For example, an utterance of "five six" would be interpreted by this action as "fifty-six", whereas Asrdigits would interpret the same utterance as 2 separate digits, "five" and "six". This action comes with a built-in grammar file.
Asrtime — Accepts a variety of utterances related to time, based on its built-in grammar file. This includes durations (such as "twelve hours") in addition to specific times (such as "three p m").
Asryesno — Accepts positive or negative utterances based on its built-in grammar file. For example, there are multiple variations on how a caller might say "yes" (yes, yeah, yep, yup, okay, and so on). This action recognizes such variations.

Studio also offers two actions that can be used to build a custom grammar file from an existing database. For example, your IVR might ask a caller for a part number. Or you might want to let the caller select an extension by giving an employee's name. In either case, you likely already have a database that contains the possible values a caller might utter, and it makes sense to build your file using the data you already have. The two actions used for this purpose are:

Asrcompile — Used to compile custom grammar files into the .gram format used by the Nuance ASR engine. This action is used in scripts that are run once, or at most, on an occasional basis. The script can be used to process existing .grxml files or in combination with Asrsql to create a new custom grammar file.
Asrsql — Works with the DB Connector feature to pull a file of values from an existing database. This file can then be formatted and compiled into a grammar file for your ASR-enabled IVR.

Best Practices

As you develop ASR-enabled IVR scripts, keep the following in mind:

Familiarize yourself with the ASR actions so you can choose the right action for each prompt.
Several actions offer a choice between spoken and DTMF Signaling tones that are generated when a user presses or taps a key on their telephone keypad. input. In some cases, DTMF might actually provide a better caller experience. For example, keying a social security number is just as easy as speaking it, and may be easier for the system to interpret.
Languages available for speech recognition vary depending on where your tenant is housed, but can be set using the Voiceparams Studio action. Ask your account manager for more information.
You can also use phonetic spellings in your phrase lists or grammar files to increase accuracy. This can be especially helpful if the prompt may elicit responses that are often mispronounced. An example could be "fungi" (plural of fungus). You could add the following additional phonetic entries in addition to the "fungi" entry: "fun guy", "fun gee", "fun jee". Language and pronunciation is not completely standard cross-culturally, therefore adding extra entries with phonetic spelling could enhance accuracy. This highlights the importance to understand your callers and tune or tweak your IVR.
You can fine-tune the ASR settings for each script (or even before/after individual ASR actions) by setting a nuanceTuningParamsJson variable with a Snippet action.
Scripts should include routing in case there is a failure in the ASR functionality, such as reverting to DTMF-only mode or playing a failure message before terminating the interaction.
You can engage Professional Services to assist you in developing ASR-enabled IVR scripts and their components, such as custom grammar files built from your existing database. Contact your account manager to learn more.

Localization and ASR

If your organization plans to use ASR to support more than one language, keep the following in mind:

Throughout parsing, "english" is hard-coded.
In parsing money, only "$" is supported.
In parsing money, '.' is always used to check for fractional values. ',' is not supported.
In pronouncing money, "dollars" and "cents" are hard-coded.
In pronouncing numbers, "negative" is hard-coded.
In pronouncing numbers, "point" is hard-coded.
ReadString is not localized (it reads English words).

Supported Languages for ASR and TTS
US Region (PCI, Non-PCI, and FedRAMP)
Brazilian Portuguese Canadian French English: US English: UK French	German Italian Spanish: European Spanish: US
Canada
Canadian French English: US English: UK French	German Italian Spanish: European Spanish: US
Europe (Includes South Africa)
Dutch: NE and BE English: UK English: US French German	Italian Portuguese Spanish: European Spanish: US
United Kingdom (UK)
Dutch: NE and BE English: UK English: US French German	Italian Portuguese Spanish: European Spanish: US
Australia
English: AUS English: UK English: US French	German Italian Spanish: European Spanish: US
Japan
English: US Japanese Korean Mandarin Chinese