5

I heard that there are roughly 400 sounds in Mandarin plus four tones. Are all combinations of those used in Chinese vocabulary, making it 1600 possibilities per single-sound word? If not, how many such combinations are actually used in any piece of a word?

d33tah
  • 555
  • 3
  • 14
  • possible sounds w/o tones: https://chinese.yabla.com/chinese-pinyin-chart.php looking at dictionaries which are arranged 1st alphabetically and 2nd according to tones it is easy to find syllables which do not have all four tones, e.g.现代汉语词典(1983)1581p。has no cong3,4, hun3, que3,reng3,4, ru1 – user6065 Aug 31 '18 at 19:35
  • use software to extract all syllables which have at least one missing tone from 小马词典, this should cover at least all 6,763 GB2312 characters – user6065 Sep 01 '18 at 00:35
  • @user6065 Why does your first link has the ending i three times? – Rodrigo Sep 03 '18 at 03:24
  • apparently justified by different pronunciation of "vowel" following initial consonants, c,z,s and ch,zh,sh,r – user6065 Sep 03 '18 at 05:46
  • 1
  • Nobody else seems to have mentioned 儿化音 érhuàyīn which from a phonological perspective adds another set of distinct sounds eg 门儿 ménr,会儿 huìr/huǐr,脸盘儿 liǎnpánr. This is also related to questions of dialect/regional variations; erhuayin is very present in Beijing (北京话). – goPlayerJuggler Mar 01 '22 at 08:48

4 Answers4

3

I wrote a program to search through an unofficial online version of 现代汉语词典. The results show 1345 possible combinations.

Here is a list of the 1345 sounds, each with one example Chinese character (not most representative or most frequently used).

Note that I also included the neutral tone (for example ba0:吧) and a few strange ones (e.g. hm:噷, hng:哼, m1:姆, m2:呣).

Using the above list, you can easily find some "missing" combinations, e.g. an2, ang3, ban2, bang2, bei2, ben2, bian2, biao2, bin2, bin3, bing2, ca2, ca4, cang3, cang4, ce1, ce2, ce3, ...

user12075
  • 224
  • 1
  • 3
  • could you post the program as well? – user5389726598465 Sep 02 '18 at 07:08
  • Some dictionaries (e.g. perapera add-on, chineseetymology.org, pleco app) give the sound mǔ for 姆 (though pleco app also gives m). – Rodrigo Sep 03 '18 at 03:31
  • @user5389726598465 You could at least upvote him for his effort, before asking for the program. – Rodrigo Sep 03 '18 at 03:33
  • @Rodrigo You are right 姆 is almost always pronounced as mu3. The other pronunciation listed is m1 instead of m (updated my answer), and is a special pronunciation used in some dialects. – user12075 Sep 03 '18 at 04:36
2

I use this one. But it show all combinations, including ones that aren't words. http://www.quickmandarin.com/chinesepinyintable/

ETA: It has tones, if you want to hear how they sound.

user20347
  • 180
  • 7
1

The list of the 1345 sounds was helpful. I used it to create a conversion program. I just want to point out that it includes some erroneous entries:
qianwa13 瓩
yingmu13
kekao31 1
u2 h
uan2 H
uan4 h
uang2 h
ui1 h

onetime
  • 11
  • 2
  • There do be some characters with strange readings, like 瓩 (qian1wa3), 兛 (qian1ke4). But most of them are not used anymore nowadays. When counting sounds, they should be splitted into two yet. – tsh Feb 28 '20 at 18:56
  • 瓩 (qian1wa3), 兛 (qian1ke4) I had not come across these characters before. Fascinating - as if the Chinese language were not complex enough, we have characters which are abbreviations of two characters. It is one thing to do that informally. If formalized, it would constitute a completely new paradigm (as opposed to the one character - one syllable paradigm). – onetime Feb 29 '20 at 20:09
  • 1
    I am not Chinese. My college-educated Chinese friends tell me that they have never seen 瓩 or 兛 (kilogram) used, and one was a physics major at Beida. He says this is a borrowing from Japanese, where such innovations are not unusual, but it is not standard Chinese. – onetime Feb 29 '20 at 21:14
1

The following *nix command will get all the pinyin sounds from the CCCedict dictionary:

sed 's/#.*$//;/^$/d'  cedict_1_0_ts_utf-8_mdbg_20220301_011354.txt | cut -d "[" -f2 | cut -d "]" -f1 | tr " " "\n" | tr '[:upper:]' '[:lower:]' | grep -e '[12345]' | grep -e '[aeiouv]' | sort | uniq -c | sort -bgr

It:

  • extracts all the strings in the "pinyin" column
  • puts all the different syllables onto a single line for each
  • puts everything to lower case
  • removes everything that doesn't have a tone marking (neutral tones should have "5")
  • removes everything that doesn't have a vowel
  • counts and orders by most frequent

There are a few strange cases with "u:", namely lu:e4 and nu:e4, but that only appears twice in the results.

There are 1535 unique results on the date this is posted. We might think of this as a lower bound, at least for what the Cedict people think is possible. ~1500 is a LOT more than ~1300 though, so I'm not sure what is going on here. I notice a few like:

覅 覅 [fiao4] /contraction of 勿要/must not/please don't/

Which I guess are non-standard or dialectical.