navigatorcros.blogg.se - Duolingo incubator blog

Duolingo incubator blog how to#
Duolingo incubator blog software#
Duolingo incubator blog code#
Duolingo incubator blog series#

Response = polly.synthesize_speech(Text=text, VoiceId=voice_id, OutputFormat='mp3') Polly = client('polly', region_name='us-east-1') You can alternatively write a microservice that handles distribution of the audio files, as detailed later in this post.

Duolingo incubator blog code#

With Amazon Polly and boto3 (in the AWS SDK for Python), just the few lines of code following do the trick in a basic scenario. It involves evaluating the quality of the voice and adding code to your data pipeline so that audio generation is handled automatically. On the other hand, the process when using TTS is much simpler. This step is necessary because the industry standard is to record all sentences in a single session and separate them later. For example, we need to check if all files are in the proper format and correctly separated. Evaluate recordings, providing a data quality assurance check.Set up a contract with the recording company.Record and evaluate the quality of the sample sentences.Find someone to evaluate the quality of pronunciation: We need an independent party from the recording company to create a small sample of sentences, which this party uses to evaluate pronunciation quality of the recordings.Find a company that records audio in the language: The company must find a voice actor who not only speaks the language, but also who speaks with good pronunciation and clarity.

Following is an example showing each step needed for recording sentences and exposing them in the production environment: The process of recording audio with a voice actor tends to be slow and cumbersome. In fact, we have observed that current state-of-the-art TTS voices are as good as natural human speech for the purpose of language learning. In summary, TTS has a clear edge when it comes to operationalizing the audio creation process, and it’s not far behind human recordings for voice quality. Several arguments exist for and against using TTS over human recordings, some of which we discuss with more details following.

Test how well they can distinguish them.

Although speech recognition is not within the scope of this blog post, this example illustrates all aspects involved in the learning experience for pronunciation: With the example following, users have to correctly pronounce the sentence they are given.Īfter a user speaks the sentence, we do speech recognition to check whether the user has pronounced it correctly. The slow version is geared toward beginner learners, who still find it difficult to grasp phonemes in the language that they’re learning.įinally, there are exercises that test pronunciation. The audio then prompts the user to type what is heard. In the example following, the user listens to a sentence, with the option of listening to a slow version. Other exercises test users on their ability to reproduce sentences read by the audio, targeting listening comprehension specifically. This type of multisensory instruction is an effective way of teaching languages because the human memory operates more optimally than when stimulated by a single sensory modality. The audio repeat serves as auditory reinforcement for the written text, providing learners the necessary information about pronunciation. The user can play the audio again by tapping on the speaker icon. When this exercise shows up, the audio for “Yo leo libros” automatically plays.

Duolingo incubator blog how to#

In the example following, the user practices how to use the word “leo” (which means “I read” in Spanish) by being exposed to the sentence “I read books.”

Duolingo incubator blog series#

Each lesson is composed of a series of exercises, which target different linguistic skills and concepts. Users learn languages on Duolingo through gamified, bite-sized lessons. The learning experience for pronunciation Finally, we provide an overview of the infrastructure we built for reliably serving audio to millions of language learners every day. Using this framework, we show that Amazon Polly has provided a superior experience. We also describe our quantitative and qualitative framework for choosing voices to ensure high-quality material for our users. In this post, we outline why we chose to use TTS instead of human recordings. To some, this approach might seem counterintuitive: shouldn’t people learn by listening to a native speaker? Duolingo uses text-to-speech (TTS) to provide high-quality language education. If exposed to incorrect pronunciation, learners develop their listening and speaking skills poorly, which compromises their ability to communicate effectively. When teaching a foreign language, accurate pronunciation matters. In their own words, “Duolingo is the most popular language-learning platform and the most downloaded education app in the world, with more than 170 million users.”

Duolingo incubator blog software#

This is a guest post by André Kenji Horie, a software engineer on Duolingo’s Learning Team.