The MELI Corpus

The Mandarin–English Language Interview (MELI) Corpus is an open-access corpus of interviews with 51 Mandarin–English bilinguals. The corpus serves two primary purposes: (1) providing high-quality acoustic data for the analysis of Mandarin–English bilingual speech, and (2) providing interview data for qualitative analysis of language ideologies within the same group of bilinguals. To support these goals, the MELI corpus contains approximately 15 hours of high-quality recordings in each language. All recordings were captured on separate channels, and the interviewee’s speech has been fully transcribed and force-aligned.

💐 As you explore the corpus, I hope you’ll take a moment to appreciate the people behind the data: their willingness to share their voices, perspectives, and stories. Please engage with the material with care and curiosity.

Detailed information about the corpus is available on the Design, Procedures, Transcription, and Download pages.

Contributors

All MELI people for sharing their perspectives and voices.
Suyuan Liu — get in touch at suyuan97@student.ubc.ca
Molly Babel
Angelina Yuan
Dlorah Lyne Reyes Agama
Jeff Li
Sarah W.Y. Ong
and members of the Speech-in-Context Lab

Citing the corpus

When referring to the MELI corpus, it should be spelled out as "the Mandarin English Language Interview Corpus" at least once. It can then be referred to as the "MELI Corpus".

You can cite the corpus directly (preferred):

@data{liubabel2026meli,
    author = {Liu, Suyuan and Babel, Molly},
    publisher = {Scholars Portal Dataverse},
    title = {MELI: Mandarin-English Language Interview},
    year = {2026},
    version = {Version 1},
    doi = {10.5683/SP3/5WMRUO},
    url = {https://doi.org/10.5683/SP3/5WMRUO}
}