4

After a long time fighting with an \uppercase in a \csname ... \endcsname, I came across this post that mentions the argument being “regurgitated,” which means that to capitalize a control sequence, you need to do:

\uppercase{\expandafter\gdef\csname #1}#2\endcsname

which makes no sense to me.

What is the regurgitation process, and how does it fit in the expansion process? Is it specific to \uppercase (and companion \lowercase)?

1 Answers1

5

The term is used by Knuth in the TeXbook (Chapter 24):

We shall study TeX’s digestive processes, i.e., what TeX does with the lists of tokens that arrive in its “stomach.” Chapter 7 has described the process by which input files are converted to lists of tokens in TeX’s “mouth,” and Chapter 20 explained how expandable tokens are converted to unexpandable ones in TeX’s “gullet” by a process similar to regurgitation. When unexpandable tokens finally reach TeX’s gastro-intestinal tract, the real activity of typesetting begins, and that is what we are going to survey in these summary chapters.

What does this refer to? Primarily to the activity of expanding macros: the definition of macros is stored in memory and when a macro needs to be expanded, tokens that have already been digested (at definition time) return to the “mouth” for further expansion.

A similar process happens with \uppercase or \lowercase which are unexpandable primitives. Their action is to send their argument to the “stomach” where each character token is replaced by its uppercase or lowercase counterpart as defined by the (current) values in the arrays \uccode and \lccode. After this replacement the tokens are “regurgitated” to the mouth for further processing that might involve macro expansion or definitions or everything else. In other words, when \uppercase{<tokens>} is found, TeX suspends its activity, sends down <tokens>, the replacement is done forming a token list <TOKENS> which is sent back to the mouth and TeX restarts examining <TOKENS> as if it had been there to begin with. Tokens that aren't character tokens stay exactly the same.

Thus if you say

\uppercase{\expandafter\gdef\csname a}bc\endcsname

the process will present TeX the token list

\expandafter\gdef\csname Abc\endcsname

and so it will proceed to do

\gdef\Abc

You can't do

\expandafter\gdef\csname\uppercase{a}bc\endcsname

because \uppercase is illegal inside \csname...\endcsname. The only unexpandable tokens that can appear after \csname are character tokens and \endcsname.

Once you understand that \uppercase is unexpandable and how it works, then that construction should start to make sense to you.


With expl3 you can do the same in a more direct way:

\documentclass{article}

\ExplSyntaxOn

\NewDocumentCommand{\mycommand}{m} { \cs_new:cpn { \str_uppercase:f { \str_head:n { #1 } } \str_tail:n { #1 } } { #1 } }

\ExplSyntaxOff

\mycommand{abc}

\show\Abc

This will show

> \Abc=\long macro:
->abc.

which is possibly something you want to consider.

egreg
  • 1,121,712
  • Thanks. The difference between “regurgitation” and expansion is still somewhat unclear, although it feels intuitively that regurgitation is some sort of special handling of primitives (here the \lccode/\uccode translation, which must be wired in in TeX's code) that doesn't play well with expansion – and the key is to understand the timing in which things are processed, regurgitation vs. expansion. Is there any other primitive that triggers this “regurgitation” process, or is really specific to \lowercase / \uppercase? – Sébastien Le Callonnec Mar 10 '24 at 09:04
  • @SébastienLeCallonnec Don't take the “digestive” description too literally. – egreg Mar 10 '24 at 09:20