This is from the CMU dictionary, and the numbers reflect degree of stress. It is generally assumed (but not entirely free of controversy) that in English there is a phonemic distinction between primary and second stress, as well as unstressed vowels. For example "latex" has primary stress on the first syllable and secondary stress on the second syllable. The word "latest" has initial primary stress and no stress on the second syllable.
The book Sound pattern of English gives an extensive phonological analysis of stressed in English, and is based on the phonetic values in Kenyon & Knott. It may or may not be useful in understanding the nature of stress in English. The CMU dictionary treats stress as a property of vowels.
It should be noted that vowel reduction is related to stress in English, so there is a chicken-and-egg question, whether the difference between "latest" and "latex" has to do with degrees of stress versus different vowel qualities. One can write [ˈletɛks] and [ˈletəst], using vowel quality as a substitute for degree of stress; if you include detail on how "t" is pronounced ([ˈletʰɛks] and [ˈleɾəst]) you have additional bases for distinguishing the words without appeal to degrees of stress. The argument that it is phonologically a matter of degrees of stress is not seriously in doubt, but in terms of simple-minded recording of data, secondary stress is not strictly necessary.
I found this file regarding possible inputs for the dictionary
- a 20k+ general English dictionary, built by hand at Carnegie Mellon
(extensively proofed and used).
- a 200k+ UCLA-proofed version of the shoup dictionary.
- a 32k subset of the Dragon dictionary.
- a 53k+ dictionary of proper names, synthesiser-generated, unproofed.
- a 200k dictionary generated with Orator, unproofed.
- a 200k dictionary generated with Mitalk, unproofed.
They comment that
All of the above sources were preprocessed and the transcriptions in
the current cmudict.0.1 were selected from the transcriptions in the
sources or a combination thereof. We have removed some potentially
unreliable transcriptions from this dictionary, including those based
on only one source, and will reintroduce them once we have verified
the transcriptions.
It seems to reduce to a judgment by the editor, and no indication what principles were followed, or what the actual sources were (versions? what dictionary?).
What is not questionable is that one cannot compute the pronunciation of an English word from its spelling, therefore any computer dictionary of English has to be based on some other pronouncing dictionary, of which there are many. Most people disagree with some prounciation claims of every dictionary ("that's not how I pronounce it"), and there is no pretext that there CMU dictionary derives from processing a massive corpus of naturalistic speech from some location. They claim that "Snohomish" is "S N AA1 HH AH0 M IH0 SH" = [ˈsnɑhəmɪʃ], but that is not even a possible pronunciation in US English (it's [snoˈhomɪʃ ~ snəˈhomɪʃ]).
It is mentioned as lexical stress here.. But I don't get the idea behind it
– Mohan Singh Feb 02 '23 at 12:56