First and foremost, syllables are a unit of spoken language and have absolutely nothing to do with spelling.
So what is a syllable?
Syllable
A syllable is a unit of speech consisting of a vowel, a diphthong or a syllabic consonant with or without preceding or following consonants. The essential part of a syllable is therefore a vowel or a diphthong or a syllabic consonant.
It's not necessary for a syllable to have a vowel, because we have syllabic consonants that make syllables on their own, for instance, the /m/ in the word 'rhythm' is syllabic: [rɪ.ðm̩] (syllabic consonants are symbolised by a small vertical line below the syllabic consonant). Also note that vowels can make a syllable on their own but consonants can't.
Syllable structures
Typically, a syllable consists of three segments which are grouped into two components:
- Onset: this refers to the consonant(s) before the nucleus (usually a vowel)
- Rhyme: a part of syllable other than the onset. It's further divided into:
- Nucleus: a vowel/diphthong or a syllabic consonant that forms the peak of sonority
- Coda: end of the syllable
So for example the word 'sit' is pronounced /sɪt/, it can be analysed as:
Both twelfths and grudged are monosyllables (one-syllable words).
- twelfths → /twelfθs/: CCVCCCC
- grudged → /ɡrʌd͡ʒd/: CCVCC


Sonority
It refers to the relative loudness of one sound compared to other sounds. The sonority of all phonemes of English can be depicted on a sonority scale (sonority hierarchy). A sonority hierarchy is a hierarchical ranking of speech sounds. Typical order of sonority values is:
Vowels [ɑ, ɔ, ɪ, i] etc > Glides [j, w] > Liquids [ɹ, l] > Nasals [m, n, ŋ] > Fricatives [s, f, θ, ð, z, ʃ] etc > Affricates [d͡ʒ, t͡ʃ]> Plosives [p, b, t, d, k, g]
Vowels are the most sonorous whereas plosives are the least sonorous sounds.
So what does 'sonorous' mean?
Singing is a nice way to illustrate this. Try singing the vowel sound [ɑ], it will be quite easy to prolong it. Now try [s]; it's not difficult either, you can prolong it. However, you cannot prolong plosives, try singing [p] or [t].
The importance of Sonority in syllables is reflected in Sonority Sequencing Principle (SSP). It states that sounds rise in sonority from onset to the nucleus and fall from the nucleus to the coda of a syllable.
Sonority graphs/curves
SSP can be illustrated with sonority graphs/curves:
- /twelfθs/: /t/ is a plosive, /w/ is a glide, /e/ is a vowel (Sonority peak), /l/ is a liquid, /f θ s/ are fricatives (Sonority plateau)

- /ɡrʌd͡ʒd/: /g/ is a plosive, /r/ liquid, /ʌ/ vowel, /d͡ʒ/ affricate and /d/ plosive.

The vowel is the sonority peak (nucleus). Sonority peaks often correspond to the number of syllables. So there will be two peaks in a word like 'very', hence two syllables.
The reason why some clusters like /pl-/, /kl-/, /sl-/ occur and */lp-/, */lk-/, */ls-/ don't occur can also be explained with SSP. For example, an English word can start with /pl-/ (play), but can't start with */lp-/ (*lpay). I have explained that in this answer.