6

Looking into the Chinese dictionary I have on my phone, I noticed that 我是 (I am) is listed as a word, while I as a non native speaker would consider this to be two separate words. Whether or not 我是 is listed as a word in dictionaries seems to be rather random since a few other dictionaries don't have it, but it got me thinking about the general state of words in Chinese. This is of course a heated debate even in languages with explicit spaces between their tokens, but I want to believe this is even a larger problem in languages such as Chinese.

I guess a few of my questions are:

  • Do Chinese people generally agree upon how a sentence should be segmented into its compounds?
  • Is a 词 different from a word in any sense?
  • Do even native Chinese generally split characters into words (词), or is the concept of words something we have forced into Chinese because it makes sense in western languages?
  • 2
  • I would not rely on a smart phone dictionary app. The data for such apps might come from dubious sources.
  • Just because something is listed in a dictionary, that doesn't mean that's a word. Dictionaries list common expressions, too, which are common combinations of words.
  • – imrek May 12 '15 at 12:10
  • 3
    As Drunken Master suggested, many smart phone dictionary app have some sort of associative function that 'guess' what you're trying to type next, thus the words / terms shown is not necessary words – Alex May 12 '15 at 14:14
  • Word boundaries are clear from the spoken language, as it is in many Western languages. Here, English is an analytic language that often splits words into components (as in "ice hockey" rather than "icehockey"), but this is not true for most other Germanic or Roman languages. Simularly, when rendering Chinese into Pinyin, word formation follows the spoken language: qingwa, daxuesheng, yidianr, keke-banban. See http://en.wikipedia.org/wiki/Pinyin#Orthography –  May 15 '15 at 19:19