-1

If you expand the "Dialectal data" link here on the Chinese Wiktionary, you see like ~40+ what I'll call "varieties". They are grouped under 2 categories:

  • Variety (parent group, like Mandarin)
  • Location (child group, like Beijing)

Vietnamese is a little more clear cut on the surface, having Northern/Central/Southern dialects, but I imagine there are probably regional variations down below a category too.

Dari is said to be a "variety of the Persian language spoken in Afghanistan". However, Dari has several dialects itself, "the principal dialects spoken in Afghanistan are Herati Dari, Tajiki Dari, Kabuli Dari, Khorasani Dari, and Parsiwan."

Tamil has a bunch of dialects too, as does Yoruba ("Ekiti, Igbomina, Ijebu, Ijesa, Oyo, Ondo, Owo, Ikale, Ilaje, Ikare, Yagba, Gbede, Ijumu, Ife, Ikiri, Isabe, Ijo, and Irun . The standard Yoruba is a blend of two closely related dialects, Oyo and Lagos.")

Question: What is the standard way for representing dialects, such as in structure data / in a database?

At first I was thinking (after seeing the Chinese one), to represent it by location. Wiktionary does this a lot if I remember correctly, for example Korean is "Seoul", Japan is "Tokyo", English is "US" or "Britain", etc.. But in reality, I don't think dialects are tied to a contiguous region. For example, in the US, it seems like "TV broadcaster" in major cities is a dialect, which is one small example of a dialect being geographically non-contiguous.

Maybe that is the reason we call dialects pretty much random names like "Mandarin" or "Dari". Languages even, are just random names too (English, Spanish, named after places originally).

So should dialects be named after places? Do all dialects fit this pattern? Or what about ancient languages like Egyptian or Sumerian/Akkadian or Sanskrit, are their dialects named after places or time periods or something else?

What is a standard way of representing dialects then? What is state-of-the-art, or what is the latest research/thinking on the topic?

Lance
  • 4,342
  • 1
  • 26
  • 56
  • I think you'll find more inconsistency and variation than anything else in this type of representation. – Graham H. Sep 12 '23 at 15:27

1 Answers1

1

It depends on who is assigning the name. For example, the language name Llogoli is a self-designation term of the VaLogoli. It derives from the assumed name of the ancestor of those people, Mulogoli. Some unknown colonial dude interjected the name Maragoli which was a widely-used name for the location and the language, but nowadays people refer to the language and the people as Logori (or Logoori) when speaking English or Swahili. As you probably know, Sioux, Eskimo, Lapp, German, Finn and Berber are exonyms, assigned by other people and picked up by westerners. Every so often, one learns of the indigenous names of languages. Linguists generally use whatever the existing name seems to be, which may be the name used by the speakers, or may be the name used by some neighbors or local rulers.

Linguists tend to be responsive to strong native objections to language terms, therefore as far as I know no linguist would refer to any form of Saami as Lapp (certainly not in public). In the case of Saami languages, the standard names are geography-based, referring to the historically-original area where the language was spoken (e.g. "North Saami" as opposed to "Lule Saami", which refers to a river in Sweden, though the Lule Same root is Jule, not Lule). That language is indigenously known as Julevsámegiella which translates to English as "Lule Saami language", and North Saami is indigenously known as Davvisámegiella which translates to English as "North Saami language". The language formerly known as Saanich is now known as SENĆOŦEN (capitalization is obligatory), though non-linguists may still use Saanich.

Language differences usually relate to some current or historical geographical factor. For example, New Julfa Armenian is spoken in New Julfa, which is an area of Isfahan, Iran. It was originally populated by refugees from "Old" Julfa (Jugha) in Armenia. Parsiwan is a geographical term (related to "Persia") apparently rooted in Old Persian Pârsa which may have originally referred to certain people rather than the territory that they lived in (just as "American" has an intermediate etymology in an Italian name which may derive from a Germanic compound meaning "Home Ruler").

The question of what name should be used, presumably by linguists, is basically a matter of personal opinion. Linguist standards vary over time. Originally, we would speak of Swahili, Zulu, Kongo or Kamba, which are Bantu languages. At some point in the 60's there was a move to adopt the indigenous noun class prefix, therefore Kiswahili. One then has to know whether the prefix is Ki, Iki, Ke, Chi, Tsi, Shi, Si... and also know whether it is Lu (Ru, Ulu etc) or various other possibilities. In some languages, there is variation (ki- or lu-). At some point about 20 years ago it was decided by the linguists (the editors and publishers) that class prefixes should be omitted unless omission would be felt to be "wrong" (as in (Lu)Ganda or (Li)Ngala).

user6726
  • 83,066
  • 4
  • 63
  • 181