5

I'm curious as to whether anyone has attempted to come up with a quantitative measure of information conveyed by written words, symbols or graphs, and particularly curious if anyone has studied Japanese kanji or Chinese hanzi in this context.

It's often said that Chinese and Japanese are more "information dense" in their written languages, which seems intuitively true given that tweets in Japanese, Chinese and Korean average out as much shorter than other Western languages. But I don't know of any attempt to put actual numbers to the amount of information that can be conveyed by logographs in Japanese and Chinese (I'm not discussing Korean as I have no knowledge of the language.)

Intuitively this seems like a near impossible task, given the amount of ambiguity in, for example, the character:

Which in Japanese can mean "above", "up", "elder", "before", "previous", "based on", "since" and so on, depending on the context. Some of these meanings are shared with Chinese, where it can also mean "to attend a class", "to climb up" or "to go up".

Nonetheless, I wonder if anyone has ever attempted such a quantification for Chinese/Japanese, or indeed in any written language. I haven't managed to find any studies through Google Scholar or through quick googling.

Sir Cornflakes
  • 30,154
  • 3
  • 65
  • 128
Lou
  • 1,280
  • 11
  • 22
  • This question is a douplicate. Search the existing questions. – vectory Jul 14 '19 at 13:50
  • 1
    That level of ambiguity is almost nothing compared to the English word "set"! – curiousdannii Jul 14 '19 at 14:02
  • Character counts in tweet don't mean what you think they mean. For example, modern Korean has 11,172 possible "characters" (which are syllables). Modern English has 52, if you count uppercase/lowercase separately. So of course a single Korean character conveys more information than a single English alphabet. It also usually takes twice as much space. Similar for Japanese/Chinese. – jick Jul 14 '19 at 16:12
  • @vectory I have searched existing questions and was not able to find a similar one, except this: https://linguistics.stackexchange.com/questions/30408/density-of-information-semantic-of-chinese-and-korean-language-versus-european-l which approaches the question from a spoken language perspective. I'm interested in the written perspective. If you have found a duplicate, by all means feel free to link it. – Lou Jul 14 '19 at 16:48
  • @jick We're agreed on that point - my interest is whether there have been attempts to quantify the information conveyed by hanzi/kanji/hanja, as opposed to the information conveyed in other non-logographic writing systems. – Lou Jul 14 '19 at 16:50
  • 3
    I have no studies to link, unfortunately, but you might find the term "information density" useful in your search. – Draconis Jul 14 '19 at 17:50

0 Answers0