The challenge: Multi-language, diverse data for ML training
Voice-controlled virtual assistant apps simplify life and work. They give us quick, hands-free access to the information we need, and make it easier to automate repetitive tasks. But, like all technologies that rely on Natural Language Processing (NLP) and Machine Learning (ML), they’re only as good as the data that’s used to train them.
Our client, a leading global hardware and software provider, needed high-quality audio data to train and strengthen their virtual assistant. With our best-in-class expertise in gathering data at speed and scale, Firstsource was asked to conduct in-person research and record and annotate audio data across:
- 10 markets: Australia, China (mainland and Hong Kong), France, Germany, India, Japan, Spain, UK, USA
- 8 languages: English, Cantonese, French, German, Hindi, Japanese, Mandarin, Spanish
The client wanted real-world recordings from 100 people in each market, and the recruitment criteria were specific: A diverse range of respondents—spanning multiple ages and ethnicities—so the assistant could be trained to understand and respond to different accents, dialects, and expressions. All respondents needed to be potential rather than existing users of the client’s technologies.
The scope of the project was as challenging as the scale. Because the client’s virtual assistant is accessible across a range of hardware, they wanted to capture audio data on six different prototype devices. The brief was for 30 recordings per participant, each between 2 and 2.5 minutes long, covering defined conversational themes. And these conversations needed to be recorded in-person: in a room and in a vehicle.
The solution: Rigorous recruitment, moderation, planning, and recording
The success of the project was determined by the way Firstsource approached recruitment, moderation, planning, recording and annotation:
- Rigorous recruitment: Using its extensive global database, Firstsource identified 100 participants, plus reserves, in each market. The Firstsource team managed pre- and post-project logistics including consent forms, travel, accommodation, and incentives. All to a demanding schedule.
- Meticulous moderation: Each conversation needed to be expertly moderated to get to key conversational themes fast while engaging the respondent in a range of scenarios. Tapping into its network of trusted facilitators, Firstsource selected a talented team of 22 bi-lingual (native language and English) moderators. These moderators were briefed and trained using prompt scripts agreed with the client and challenging role-play scenarios.
- Perceptive planning: To get the best possible data, Firstsource put respondents at ease in realistic settings. They rented facilities with a homely rather than a conference room feel and chose locations with car parking so the virtual assistant could be tested in vehicles too.
- Robust recording: Project timing and budget did not allow for re-recordings or editing, so everything had to be right first time. A dedicated recording analyst supported the moderators with real-time quality control. When issues occurred with the prototype devices, Firstsource’s 24/7 technical team resolved them quickly, avoiding delays to the recording schedule.
- Audio annotation: 10+ attributes were captured for each recording (g. age, gender, ethnicity, accent/dialect, recording device type, and smart phone/non-smart phone user). Each audio clip was expertly annotated using JSON.
The research was conducted simultaneously across all 10 markets to accelerate results and meet the client’s tight deadline.
The results: Quality data in record time
- Quality AI/ML training data ahead of the client’s schedule: This enabled the client to launch their new voice assistant faster than they had expected
- 30,000 quality audio recordings: The client was able to train their virtual assistant using 95%+ accurate, error-free data