0

I'm trying to create a static list of words that can be used for encoding large numbers. The words are selected on a variety of factors and I would like to include a measure of the word's stability over time.

Ic () understand that all languages mutate, but are there metrics or models that I can use to help predict which words are likely to decay over time?

The best I can come up with is searching through Google's ngram database, as this would measure spelling and usage. However, this strikes me as fairly naive.

Indolering
  • 119
  • 3

2 Answers2

1

For orthografically spelled words, Levenshtein distance is a commonly used metric. It can be extended to spoken language by using an IPA rendering of the spoken language.

I am not aware of any metric (in the mathematical sense) of "meaning".

Sir Cornflakes
  • 30,154
  • 3
  • 65
  • 128
1

Going from the body of your question:

but are there metrics or models that I can use to help predict which words are likely to decay over time?

No, none, since words do not generally change much individually: what happens is that sequences of sounds undergo joint mutation, and generally all words containing those sequences will undergo the changes together. This covers strictly the phonetic part of language change (even ignoring the maldefinedness of "decay" in a linguistic context).

Further changes of word properties may be morphological and paradigmatic (morphological levelling, analogy, etc.), as well as individual (or of a narrow group) and semantic (changes in meaning of a word or several words in one of several directions, including amelioration and generalisation). This too is generally unpredictable.

The best tendencies (read: only nearly accurate predictions) can be draughted for languages with taboo word replacement tendencies: in such systems, words considered taboo will be replaced on a regular basis. This works for names of certain predatorial animals, diseases, and names of dead relatives (among other possible taboo categories). This is generally poorly quantifiable and predictable, just like the rest of these processes.

Darkgamma
  • 1,427
  • 8
  • 19
  • I'm not looking for absolutes, just correlations. For example, I didn't see any unfamiliar while skimming the General Service List, the 2K most frequent words circa 1953. I'm guessing that more thoughtful analysis (historical frequency patterns, phoneticity, etc) would yield better results. – Indolering May 16 '17 at 23:03
  • If you had read my answer in its continuous entirety you'd have known as I already answer this well "since words do not generally change much individually: what happens is that sequences of sounds undergo joint mutation, and generally all words containing those sequences will undergo the changes together". Further than that, it's unclear what you're asking for. – Darkgamma May 17 '17 at 18:14