The MELI Corpus
The Mandarin–English Language Interview (MELI) Corpus is an open-access corpus of interviews with 51 Mandarin–English bilinguals. The corpus serves two primary purposes: (1) providing high-quality acoustic data for the analysis of Mandarin–English bilingual speech, and (2) providing interview data for qualitative analysis of language ideologies within the same group of bilinguals. To support these goals, the MELI corpus contains approximately 15 hours of high-quality recordings in each language. All recordings were captured on separate channels, and the interviewee’s speech has been fully transcribed and force-aligned.
💐 As you explore the corpus, I hope you’ll take a moment to appreciate the people behind the data: their willingness to share their voices, perspectives, and stories. Please engage with the material with care and curiosity.
Detailed information about the corpus is available on the Design, Procedures, Transcription, and Download pages.
Contributors
- All MELI people for sharing their perspectives and voices.
- Suyuan Liu — get in touch at suyuan97@student.ubc.ca
- Molly Babel
- Angelina Yuan
- Dlorah Lyne Reyes Agama
- Jeff Li
- Sarah W.Y. Ong
- and members of the Speech-in-Context Lab
Citing the corpus
When referring to the MELI corpus, it should be spelled out as "the Mandarin English Language Interview Corpus" at least once. It can then be referred to as the "MELI Corpus".
You can cite the corpus directly (preferred):
@data{liubabel2026meli,
author = {Liu, Suyuan and Babel, Molly},
publisher = {Scholars Portal Dataverse},
title = {MELI: Mandarin-English Language Interview},
year = {2026},
version = {Version 1},
doi = {10.5683/SP3/5WMRUO},
url = {https://doi.org/10.5683/SP3/5WMRUO}
}