NEW NOAH UP .15
Rate the quality of synthesized speech samples
Your task is to evaluate the quality of the speech from short (2-4 second) audio files, using the evaluation criteria defined in
these instructions. This task typically requires approximately 120 seconds to complete.
IMPORTANT: After clicking on the audio clip link, wait few second til you see audio playing box.
IMPORTANT: do NOT work on this HIT without carefully reading the instructions first, even if you have worked on our HITs in the past, as instructions often change between experiment
Instructions for speech quality evaluation
Introduction
Your task is to evaluate the subjective quality of the speech from short (2-4 second) audio files. Each HIT can be completed in 120 seconds.
Payment
We have methods that analyze the consistency of your answers with respect to themselves, to those of your fellow workers and to references we know to be accurate. We will use these methods to rank the submitted assignments according to quality.
For this experiment we will pay a base reward of $0.1/HIT for every accepted HIT. We have made available a set of 17 different HITs. You will receive a bonus of:
- $0.05/HIT (for a total of $0.15/HIT) if you submit 12 or more HITs or
- $0.15/HIT (for a total of $0.25/HIT) if you submit 12 or more HITs and your results are among the top 50% or
- $0.25/HIT (for a total of $0.35/HIT) if you submit 12 or more HITs and your results are among the top 10%.
Bonuses will be paid up to 7 days after submission, because we can only rank the submissions once we have a statistically significant number of answers. The base reward will always be paid within 24 hours of submission.
Instructions
You will hear samples of computer generated speech created using different methods. The purpose of this test is to evaluate the quality of each file, so that we (the researchers) can compare the methods and know which ones sound better to a general audience.
Please keep in mind that speech can be unnatural in many ways, and these are only specific examples. In addition, some methods may completely fail and produce totally quiet audio.
Each file should be given a score according to the following scale, known as the MOS (mean opinion score) scale:
Score Quality of the Speech Level of Distortion
5 Excellent Imperceptible
4 Good Just perceptible, but not annoying
3 Fair Perceptible and slightly annoying
2 Poor Annoying, but not objectionable
1 Bad Very annoying and objectionable. Totally silent audio.
IMPORTANT: Note that some of the sentences truncate unexpectedly, which is OK.
Example
The following recording represents clean speech with imperceptible noise or distortion, which is given a reference score of
5.0.
This file has synthesized speech with a reference score of
3.0.
This is an example of significantly distorted speech, with a reference score of
1.0.
Another example of reference score of
1.0, are
silent audios.
Approval/Rejection Policy
To obtain accurate results, we strongly recommend that you wear headphones, work in a quiet environment and increase volume level, otherwise you might not be able to discriminate between files with clearly different features. Our experience shows that it is very difficult to land in the top 50% or top 10% and get a bonus for quality without wearing headphones.
Your results will be collected and evaluated for consistency. We (the requesters) have an estimate of each file's subjective quality that conforms with the references above. Thus, we can detect if someone submits random scores or does not rate according to these instructions, which can lead to work being rejected. You can rest assured that your work will be approved if you rate according to the instructions above.
Answers will be either reviewed or automatically approved within 24 hours.