Cantonese Audio Dataset
Cantonese is the traditional prestige variety and standard form of Yue Chinese. It is a spoken language that is totally different from Mandarin. In the other words, even though we can find various packages for written Chinese / Mandarin, there are not many resources catering for Catonese.
Today, I want to introduce one of the public Cantonese audio datasets collected in 1997 and 1998. It contains 93 audio recordings and around 230k vocabularies in total. More importantly, the POS-tagging transcripts are also available to download.
You may find the essay here:
http://compling.hss.ntu.edu.sg/hkcancor/data/LukeWong_Hong-Kong-Cantonese-Corpus.pdf
If you are interesting in audio preprocessing, please find more details in the following link:
http://compling.hss.ntu.edu.sg/hkcancor/
Today, I want to introduce one of the public Cantonese audio datasets collected in 1997 and 1998. It contains 93 audio recordings and around 230k vocabularies in total. More importantly, the POS-tagging transcripts are also available to download.
You may find the essay here:
http://compling.hss.ntu.edu.sg/hkcancor/data/LukeWong_Hong-Kong-Cantonese-Corpus.pdf
If you are interesting in audio preprocessing, please find more details in the following link:
http://compling.hss.ntu.edu.sg/hkcancor/
Comments
Post a Comment