How Amazon Alexa works with AVS

How the Alexa Skills work internally and how does it responds to users instructions.

In this blog we will discuss about these following points:-

1. How does Alexa work?

2. Alexa Voice Service (AVS)

3. Use of Natural Language Understanding (NLU) in Alexa


1. How does Alexa work?

Amazon Alexa introduces with many capabilities like playing music, reading the news, setting alarms, know about the weather etc.

User says ”Alexa, ask scoutfoto about ScoutFoto photography

The above request has 3 main parts:

  • Wake word ‘Alexa’ which is used to awake the Amazon device and ready Alexa to the listening mode and take request from users.
  • Invocation name is the keyword used to active a specific skill. Invocation name is like  the name of that skill which is used to identify the Alexa skill enabled in a device. All the custom skills need an invocation name to start it.

Utterance In the above example, the keyword ‘ScoutFoto photography’ is an utterance. Alexa identifies the user’s intent from the requested utterance and responds accordingly. So simply the utterance decide what user wants Alexa to perform.

Alexa works on Natural Language Processing (NLP) which is a procedure of converting speech into words.

  • Amazon takes your words and split into individual sounds. It then uses a database containing various words’ pronunciations to find which words most closely correspond to the combination of individual sounds.
  • It then identifies important words to make sense of the request and call corresponding functions. For instance, if Alexa detects words like “sport” or “Cricket”, then it would open the sports app.
  • Amazon’s servers send the information back to your device as a response and then Alexa speaks.

2. Alexa Voice Service (AVS)

The Alexa Voice Service permits you to access cloud-based Alexa capabilities.

Alexa enabled devices sends the user’s request to a cloud-based service called Alexa Voice Service (AVS). Alexa Voice Service is a core aspect of Alexa and works like the brain of Alexa devices and responsible to all the complex operations such as Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU).

  • Automatic Speech Recognition

Automated Speech Recognition (ASR) taking place in the cloud. SpeechRecognizer is the core interface of the Alexa Voice Service (AVS). It exposes  events for capturing user speech and prompting a client when Alexa needs additional speech input.

3. Use of Natural Language Understanding (NLU) in Alexa

Natural language understanding is meant to process speech and respond accordingly to a request. Natural Language Understanding (NLU) is the core system that powers Alexa, can enhance voice recognition capabilities for developers who want conversational interfaces.

With Alexa, users need a keyword and the application name to get a response. AWS provides application programming interfaces (APIs) to run Amazon Alexa on devices.

  • Techniques to understand Text:- Below you can see a parse tree of the sentence „The thief robbed the apartment“.

If we look one level higher part of speech of each word (noun, verb, and determiner), we see some hierarchical grouping of words into phrases.  Let’s take a example,”The thief robbed the apartment”, where “the thief“ is a noun phrase, “robbed the apartment“ is a verb phrase and all together, make a sentence.

In phrases one or more words are used that contain a noun and maybe descriptive words, verbs or adverbs. The concept is to collective nouns with words that are in relation to them.

In the above parese tree provides us with information about the grammatical relationships of the words with the structure of their representation. Like in the above parse tree example, we can see in the structure that “the thief“ is the subject of “robbed“.

With structure it means that we have the verb (“robbed“), which is represented or marked as a “V“ above it and a “VP“ above that, which is connected with a root “S“ to the subject of the sentence which is “the thief“, which is marked as a “NP“ above. This is like structural representation for a subject-verb relationship.

Related Posts

Leave a comment