Why do we use functional composition in the order we do?

Question

Function composition means, roughly, taking the output of a function and applying it to the input of another function. If we define an object C to represent this operation, we could say $C(f,g) = f∘g$ or $C(f,g)(x) = f(g(x))$.

Say we had a similar but slightly different operation $F$ (for “followed by”) such that $F(f,g) = g∘f$ or $F(f,g)(x) = g(f(x))$. We could make a symbol for this operation, maybe $→$, so that what we wrote before reads as $F(f,g) = f → g$. This could read as “$f$ followed by $g$” or “$f$ and then $g$”.

My suspicion is that $F$ is a much more natural and intuitive operation than $C$, since (in English) we read from left to right, so seeing $f*g$ feels more like $f$-then-$g$ than $g$-then-$f$. My second suspicion is that $F$ and $C$ are entirely equivalent and translating between use of one and the other is trivial. To my knowledge and experience, composition is a widely used operation, but followed-by is not. Why?

Are there certain branches of maths, specific proofs, or particular applications that benefit by using $C$ rather than $F$? Is anything known about the origins of this notation? Is followed-by actually used in some context I might not be familiar with? As followed-by is much more intuitive to me and makes certain problems a lot more obvious, might it be a good idea, every time I see $f∘g$ to replace it with $g→f$ or are there cases where that could lead me astray or make things more complicated?

I think of it more as working from the inside outward than as a left or right direction. — Sue VanHattum, Feb 22 '22 at 02:20
Maybe what I mean to say is that to me the notion of "after" is easier to wrap my head around than the notion of "inside", and I suspect that they are equivalent formulations — David Lalo, Feb 22 '22 at 02:39
Indeed. Reasonable question! And, indeed, some decades ago, some people (e.g., Israel Herstein) proposed to write the function symbols on the right, as in $((x)f)g$, so that our left-to-right reading would match the inside-out reading. But it didn't catch on... There are many more left-versus-right and outside-in-versus-inside-out parsing issues in math. Mostly of not-so-much mathematical content, but, now-and-then, a bit. — paul garrett, Feb 22 '22 at 02:39
... and, in fact, as in the situation of looking at group actions on sets, and then the induced action on functions on the sets, there is an inescapable left/right $g$-or-$g^{-1}$ issue. Similarly with linear algebra over non-commutative rings: (notationally) "right" modules over $R$ are "left" modules over "the opposite ring"... And Hom spaces and tensor products have such "problems". Mercifully, we seem not to have to worry so much about "top", "bottom", "front", "back", or other physical-configuration modules... though these are not entirely fake, I gather! Don't know, personally. :) — paul garrett, Feb 22 '22 at 02:44
Another little point that may clarify some aspects of this is to declare explicitly that the notation is a choice of language/notation, for a narrative about some mathematics that is itself not just whatever the notation is. The notation is "a book-keeping system", that is surely not unique, and could have deficiencies, etc., but those problems need not reflect on the underlying mathematical reality (if we believe there is one). :) — paul garrett, Feb 22 '22 at 02:54
Some examples of F: Mathematica has the notation x // f // g for $g(f(x))$. It also has the RightComposition operator (RightComposition[f, g, h] denotes $h \circ g \circ f$). A textbook I once had wrote $xfg$ for $g(f(x))$, function application being denoted by juxtaposition of an element followed by a function. This is something like Reverse Polish Notation, which some calculators and the PostScript language use. I like the // notation and RPN in programming; I did not like $xfg$ because it looked like multiplication. — user1815, Feb 22 '22 at 18:13
The Haskell programming language uses '.' for the usual function composition, but their are multiple symbols to write the argument-swapped version. Documentation for the latter includes phrases like "This function usually isn't necessary, but it can be more readable than some alternatives when used with higher-order functions." For some examples of this, you can look here: https://hoogle.haskell.org/?hoogle=(a-%3Eb)-%3E(b-%3Ec)-%3E(a-%3Ec) — Simon, Feb 23 '22 at 09:48
In Krohn & Rhodes "Algebraic Theory of Machines. I. Prime Decomposition Theorem for Finite Semigroups and Machines" (1965), they use the notation $xgf$ for $f(g(x))$. The notation also extends to relations which are composed as if they were non-deterministic functions! Not being used to the notation, it was a nightmare to read. — TomKern, Feb 23 '22 at 12:50
While English is read left-to-right, it isn’t necessarily true that this leads to a logical “binding” (for want of a better term) that proceeds from left to right. You could compare the inside-outward order to the “of” possessive: the father of the girlfriend of Xavier is structured like $f(g(x))$. (And then the right-composition order resembles the “’s” possessive: Xavier’s girlfriend’s father is structured like $((x)g)f)$.) — Tim Pederick, Feb 24 '22 at 15:57
In my first year of math we had two professors using the two different conventions: Abstract Algebra was left to right and Linear Algebra was right to left. It was a nightmare. — Nicola Ciccoli, Feb 28 '22 at 18:17

score 14 · Answer 1 · answered Feb 22 '22 at 04:33

14

In general applications, defining the evaluation rule of $f\circ g$ by $(f\circ g)(x)=f(g(x))$ has a lower extraneous cognitive load than the way you suggest. As a sample of how inconveniencing that can be, go ahead and proofread the two following sequences of equations to see if there is a mistake in my exposition.

$$((f\circ g)\circ h)(x)=h((f\circ g)(x))=h(g(f(x)))=(g\circ h)(f(x))=(f\circ(g\circ h))(x)$$

$$((f\circ g)\circ h)(x)=(f\circ g)(h(x))=f(g(h(x)))=f((g\circ h)(x))=(f\circ(g\circ h))(x)$$

Esoteric rules of symbolic manipulation aren't bad in and of themselves, especially if it ultimately leads to a clearer understanding of the underlying mathematical structures. But I don't think that's the case here: as far as I can see, all you get is something that is easier to translate into English step-by-step. But IMO, the strength of mathematical symbolic language is that it offers efficiency and clarity away from the vagaries of natural language.

That being said, there are subfields of mathematics where that cognitive load is more intrinsic than extraneous and therefore. For instance, when studying permutation groups, it is unclear whether the product $fg$ of two permutations is most naturally understood as "do $f$ first and then $g$" or symbolically as $(fg)(x)=f(g(x))$. As the comments suggest, I am given to believe that some old-school authors in abstract and linear algebra used postfix functional notation to establish the clarity.

answered Feb 22 '22 at 04:33

Matthew Daly

5,619
1
12
45

Could You elaborate a bit on the examples above? It's not super obvious to me what each line is supposed to demonstrate. Is the first line using regular composition, and the second taking the ∘ symbol to mean followed-by? – David Lalo Feb 22 '22 at 07:17
In any case, I don't find either example simple or obvious, and I find a proof using "followed-by" is actually easier to follow:
1. ((f → g) → h)(x)
2. h((f → g)(x))
3. h(g(f(x)))
|| 4. (f → (g → h))(x) 5. (g → h)(f(x)) 6. h(g(f(x)))

I find the logic in this example easy to follow and double-check since I can reason about it. In English:
1. f-then-g and then h all taking in x
2. the input of h is f-then-g of x
3. we replace "f-then-g of x" with "f(x) is input to g"
|| 4. f then g-then-h all taking in x 5. the input to g-then-h is f(x) 6. g(f(x)) is the input to h
– David Lalo Feb 22 '22 at 08:00
Are there simple symbolic rules used to produce the examples above? And if so, shouldn't there be equivalent symbolic rules using → instead? – David Lalo Feb 22 '22 at 08:05
1

I guess this example was helpful in that making me work through an example, shows that there is an inconsistency since function application (ie the putting of inputs into a function) happens on the right whereas thinking of nested function application as a series of sequential steps we'd expect input on the left – David Lalo Feb 22 '22 at 08:11
Thinking on this further the above proof is sort of just based on the fact that the way we write functions is so often written in this nested way that we take for granted their properties. The major assumption above is that
h[(g(f(x)))] = h(g(f(x))) -- it's a bit hard to notate. But using a "then" notation as our base notation, we could write it as x → (f → (g → h)) = x → ((f→ g) → h) and base our composition notation as the opposite of that
– David Lalo Feb 22 '22 at 08:18
1

@DavidLalo that notation would be inconsistent: does $f\to g$ mean you use $f$ as the argument to $g$, or compose it before $g$? However, you could use something like $x\to(f;(g;h)) \equiv x\to ((f;g);h)$. In functional programming languages you often have both directions available as different operators – in Haskell, h(g(f(x))) can also be written as h $ g $ f x or (h . g . f) x (the . corresponds to the \circ composition operator), or x & f>>>g>>>h. – leftaroundabout Feb 23 '22 at 15:33
1

hmmm interesting. I think I was thinking of something similar to currying a partial function and the order of operations would be determined by parens (although I think the associativity of function application, would make these unnecessary?) My thinking is it's contextual -- f → g is inherently unevaluable (whereas x → f → g, is an element in codomain of g) but represents the map that starts in the domain of f and ends in the codomain of g. But I need to chew on Your point a bit more to really get it. – David Lalo Feb 23 '22 at 16:28
1

In a book on intro to category theory I once started reading they treat inputs x to functions as maps from a one-element set to domain of x. Thinking of it this way I think might remove some problems of "an element of a set and a function are fundamentally different things so why are we using the same operation on both in the same way" – David Lalo Feb 23 '22 at 16:31
1

@DavidLalo well, allowing ambiguous notation if one of the interpretations is "inherently unevaluable" is definitely common enough in applied maths, physics etc.. IMO it's not a good idea though, because in more complicated contexts it may not be obvious at all which version is meant. Again, functional programming takes this to the extreme: functions are values just as any other, and you consider any expression as potentially valid (what's needed is to typecheck it). - Reg. "inputs...as maps from a one-element set": the term is global element. – leftaroundabout Feb 24 '22 at 09:38
2

In programs, functions are usually procedures to execute. Left-to-right evaluation is easier to follow. In javascript some propose an operator |>. x |> f means f(x). If you replace f(x) by x▹f and f◦g with g▹f, it becomes beautiful. (x▹f)▹g = x▹(f▹g). Consider x to be the parameterless function that returns the value of x (as David mentions). – Florian F Feb 24 '22 at 17:36
@leftaroundabout Thanks for the info! As per Your point about ambiguity, I totally agree -- when we do math on paper we don't have compilers that can point this stuff out to us. It's a good idea to use notation that makes it hard to write nonsense, so as to avoid mistakes – David Lalo Feb 24 '22 at 21:38
@FlorianF Sounds like yet another thing to add to my endless list of side projects... – David Lalo Feb 24 '22 at 21:39

Xander Henderson · Accepted Answer · 2022-02-24T20:28:41.743

The answer to the lede question "Why do we use function composition?" is "Because we do." It is the way that, for historical reasons, the notation developed. It is a convention which developed before a lot of set theory, modern algebra, and category theory came along, and this convention was deeply ingrained long before anyone thought to question it.

From a naïve point of view, I also think that it makes a good deal of sense: if $f(x)$ denotes the application of a function $f$ to an input $x$, then $g(f(x))$ must denote the application of a function $g$ to an input $f(x)$. Reading from left-to-right, we have "$g$ of $f$ of $x$", which we might abbreviate by $(g\circ f)(x)$.

From a more set theoretic point of view, this becomes a little harder to reason about. A typical schematic for functional composition in undergraduate texts looks something like the following:

Reading from left-to-right, we might expect the composite function to be "$f$ and then $g$", which we might want to denote by $f \ast g$ or something similar. But the convention is that this map is $g \circ f$—for historical reasons, maps are applied from right-to-left.

For the most part, this doesn't really cause any problems, but there are some places where you might see some difficulties or a different choice of notation:

Category Theory

In chalk-talks, I have seen one or two category theorists use the notation $(x)f$ to denote the image of $x$ under the map $f$. In this setting, $$ (x)(f\circ g) = ((x)f)g, $$ which is what we would usually write at $g(f(x))$. So, if you want to compose on the right (as seems "natural"), you should also apply functions on the right.

From one point of view, I suppose that this is a perfectly reasonable convention to adopt (and I kind of wish that we had adopted this convention in the first place), but it is probably a very bad idea to teach this convention to a neophyte.

Differential Operators

Folk who study differential equations will often use notation like $\partial_{xy}$ in order to mean $$ \partial_{xy} = \partial_y \circ \partial_x, $$ where $$ \partial_{x} f(x,y) = \frac{\partial}{\partial x} f(x,y). $$ Or should that be $$ \partial_{xy} = \partial_x \circ \partial_y?$$ There is room for ambiguity here, and authors are well-served to be explicit about what they mean. On the other hand, in most "nice" cases, the derivatives commute, so maybe it doesn't really matter.

Iterated Function Systems

In my own area of research, a common object of study is an iterated function system. Such a system is simply a collection of maps $\{\varphi_j\}_{j=1}^n$ which map some space into itself. The maps are often composed with each other over and over again (hence the "iterated" part of the term), which leads to all kinds of wild notations for composition. One common notation is to take $\mathbf{j} = (j_1, j_2, \dots, j_k)$ to be a tuple of indices, and then write $$ \varphi^{\mathbf{j}} = \varphi_{j_k} \circ \varphi_{j_{k-1}} \circ \dotsb \circ \varphi_{j_2} \circ \varphi_{j_1}. $$ That is, first apply the map $\varphi_{j_1}$, then apply the map $\varphi_{j_2}$, and so on. However, people mess up this convention all the time. I cannot tell you the number of times that I have seen the convention defined one way, and then applied the other. This doesn't usually cause problems, because the underlying reasoning is generally correct—it is just a problem of transcription—but it can be a concern.

Reverse Polish Notation

I grew up using an HP48 series calculator. These calculators were somewhat unique, in that they implemented computation via reverse Polish notation. The idea is that the arguments of a function are placed onto a "stack", and functions apply to the items on the stack in a "last in, first out" order. So, for example, if I wanted to compute $\sin(\log(5) + 3)$, I might enter the keystrokes

[5]     <- this puts 5 on the bottom of the stack
[log]   <- the "eats" the 5, and puts log(5) on the bottom of the
           stack
[3]     <- this "bumps" log(5) up a spot, and puts 3 on the stack
           below log(5)
[+]     <- this "eats" the bottom two elements of stack, adds
           them together, and puts the result on the bottom of
           the stack
[sin]   <- this "eats" the bottom element of the stack, takes its
           sine, and places the result on the bottom of the stack

Written out, the set of keystrokes is "composition in the reverse order", i.e.

5 log 3 + sin

roughly corresponds to writing $(((5)\log, 3){+})\sin$, which kind of looks like backwards composition. Using the standard notation we might write $\sin( {+}(\log(5), 3))$. In both cases, we think of $+$ as a function $+: \mathbb{R}^2 \to \mathbb{R}$.

This may look arcane at first glance, but once you get used to it, it is a really convenient way to enter computations into a calculator or computer. It eliminates a lot of errors caused by missing parentheses.

TL;DR

The current convention exists for historical reasons. If we were to start all over again, we might adopt a different convention, and there are good reasons to want to do so. However, the convention exists, is well-established, and you would be doing your students a disservice to introduce an alternative notation.

I would add a "folk history" to this: probably we had particular functions first, like sine. Since we would natural write "sine of x", abbreviating this $\sin(x)$ would make sense. Then this convention would naturally lead to $f(x)$ for general functions. As soon as we are writing $f(x)$ rather than $(x)f$, it makes sense to write function composition in the usual order. This is my best guess as to how we ended up where we are. I do wish Herstein's attempted notational revolution had caught on. — Steven Gubkin, Feb 23 '22 at 11:29
Great answer, love these examples! And not just because I've also studied Iterated Function Systems :-) — Brendan W. Sullivan, Feb 23 '22 at 19:58
For the slightly less adventurous category theorists, it's also somewhat common to write $f(x)$ as usual but use a semicolon, as in $f;g$, to denote composition "the other way around". — Daniel Wagner, Feb 24 '22 at 20:07

score 5 · Answer 3 · answered Feb 22 '22 at 21:59

Put in natural language terms, if your language has verb-object order, then verb1-verb2-object naturally represents verb1 (verb2-object). That is, apply verb2 first, then verb1. For instance, if you're writing a recipe, "cook the chopped onion" means "chop the onions, then cook them". If you want someone to peel the onion, then chop it, then cook it, then serve it, that would be "serve the cooked chopped peeled onion". If your language has object-verb order, then it's natural for composition to go the other way. Whatever direction your object to verb order is, that's your composition order, because the verb+object becomes the object for the next verb. Math is somewhat inconsistent as to which order is used; "x squared plus two" is basically object-verb order, while "f of x" is verb-object order.

I think in English, in everyday use, those prenominal modifiers would normally (if at all) be in temporal order, thus: "serve the peeled, chopped (and then) cooked onion(s)". — Pablo H, Feb 23 '22 at 15:27

score 4 · Answer 4 · answered Feb 24 '22 at 21:04

I think it's worth noting that $g\,;f$ is sometimes used instead of $f\circ g$ in certain areas that have connections to computer science:

In category theory: $f\,;g$ will sometimes be used for the composition of morphisms $f: A \to B$, $g: B \to C$, see for example this introductory text
In computer science proper: Again, $g\,;f$ might be used for composition of functions / partial functions / relations. For example: $[\mathit{-}]c[\mathit{-}] := post(c)\,;\supseteq$ (from here).

The main reason why it's not used more widely is simply due to convention: Had the mathematicians assumed some kind of postfix / "method notation" (e.g. $x.f.g.h$) instead of $h(g(f(x)))$, then the order $fst ; snd$ would probably seem more natural than $snd \circ fst$.

Regarding the prefix notation for functions, I believe that we can safely blame that on the early physicists. This is because in physics, one often has to work with some "important" quantities $F$, which depend on the "less important" parameters $a, b, \dots, z$, so that the $F$ comes first in an expression $F(a, b, \dots, z)$. You can see that the parameters really are "less important" from the fact that we often encounter abuse of notation where we are freely jumping between e.g. $F(x, y)$ and $F(r,\phi)$, where the "important quantity" $F$ is denoted by the same symbol, whereas the "less important" parameters don't even range over the same domain.