If you expand the "Dialectal data" link here on the Chinese Wiktionary, you see like ~40+ what I'll call "varieties". They are grouped under 2 categories:
- Variety (parent group, like Mandarin)
- Location (child group, like Beijing)
Vietnamese is a little more clear cut on the surface, having Northern/Central/Southern dialects, but I imagine there are probably regional variations down below a category too.
Dari is said to be a "variety of the Persian language spoken in Afghanistan". However, Dari has several dialects itself, "the principal dialects spoken in Afghanistan are Herati Dari, Tajiki Dari, Kabuli Dari, Khorasani Dari, and Parsiwan."
Tamil has a bunch of dialects too, as does Yoruba ("Ekiti, Igbomina, Ijebu, Ijesa, Oyo, Ondo, Owo, Ikale, Ilaje, Ikare, Yagba, Gbede, Ijumu, Ife, Ikiri, Isabe, Ijo, and Irun . The standard Yoruba is a blend of two closely related dialects, Oyo and Lagos.")
Question: What is the standard way for representing dialects, such as in structure data / in a database?
At first I was thinking (after seeing the Chinese one), to represent it by location. Wiktionary does this a lot if I remember correctly, for example Korean is "Seoul", Japan is "Tokyo", English is "US" or "Britain", etc.. But in reality, I don't think dialects are tied to a contiguous region. For example, in the US, it seems like "TV broadcaster" in major cities is a dialect, which is one small example of a dialect being geographically non-contiguous.
Maybe that is the reason we call dialects pretty much random names like "Mandarin" or "Dari". Languages even, are just random names too (English, Spanish, named after places originally).
So should dialects be named after places? Do all dialects fit this pattern? Or what about ancient languages like Egyptian or Sumerian/Akkadian or Sanskrit, are their dialects named after places or time periods or something else?
What is a standard way of representing dialects then? What is state-of-the-art, or what is the latest research/thinking on the topic?