The "ripples on waves" (Bolinger 1964, Chao 1968) interaction that is observed in Mandarin is only one of several ways that lexical tone and utterance-level intonation may interact.
I did a series of production experiments in which I elicited sets of utterances from native speakers of tone languages; the main two variables I manipulated were the tone on the final word or syllable and the intonation-type of the utterance (either declarative or echo question). The following summarizes what I found for echo question as opposed to declarative intonation in each language:
Mandarin: The local contour of the lexical tone on the final syllable was largely preserved but shifted upward in the speaker's vocal range. So a falling lexical tone was still falling but from a higher starting pitch.
Cantonese: The first half of the final syllable preserved information about the lexical tone on that syllable, but starting somewhere in the middle of the syllable the pitch shot upward with a steep incline. So a falling tone started falling but after the initial dip the pitch shot up in the second half of the syllable. There was no major lengthening of the syllable to accommodate both trajectories.
Shiga Japanese: The tone and the intonation were expressed "in series", so that the fall associated with the lexical tone was completed and then the syllable was extended to accommodate the subsequent rise associated with the echo question intonation. In some cases this lengthening nearly tripled the duration of the final syllable.
North Kyeongsang Korean: If the final word was monosyllabic, the falling contour observed in the declarative case was completely "overwritten" by a rising contour. If the final word was disyllabic (where the peak was on the first syllable and the fall carried through to the end of the second syllable in the declarative case) the fall was preserved but it started higher in the speaker's range on the first syllable (much like in Mandarin). In some instances there was a slight rise toward the end of the syllable (like in Cantonese, but nowhere near the magnitude observed in Cantonese or in the monosyllabic case).
Interestingly, in each language there were lexical-tone-specific idiosyncrasies that made it impossible to formally characterize the mapping from the declarative version of a lexical tonal contour to the echo question version beyond the descriptive generalizations above. In other words, for a given language it was not possible to come up with an algorithm that could accurately predict the F0 contour of an echo question given only the F0 contour of its declarative counterpart. For example, There are three tones in Cantonese that are basically level in the declarative environment. In two out of the three cases (tone 3 and tone 6), the rising trajectory in the echo question environment was curved (i.e. more exponential-looking) but in the third case (tone 1) it was straight (i.e. more linear-looking).
@jlawler's comment reminded me that there is also an interesting case in Shingazidja, a Bantu language spoken in the Comoros. Patin (2008) shows data suggesting that yes-no questions in Shingazidja are formed by the insertion of a "superhigh" tone on the penultimate syllable (phonetically the pitch on the syllable is higher than that on lexically H-toned syllables), overwriting whatever is there in the declarative version of the utterance (it gets realized at the same pitch regardless of the tone on that penultimate syllable in the declarative version). Interestingly, when the final syllable of the declarative version of the utterance bears a lexical H tone, the superhigh intonational tone is placed on the antepenultimate syllable!
Bolinger, D. (1964). "Intonation: around the edge of language." Harvard Educational Review 34: 282-296.
Chao, Y. R. (1968). A Grammar of Spoken Chinese. Berkeley, CA, University of California Press.
Patin, C. (2008). Tone and Intonation's Waltz in Shingazidja Polar Questions. 3rd TIE
Conference on Tone and Intonation (TIE3). Lisbonne.