How to explain what's wrong with this application of the chain rule?

Question

Yesterday a student in my calculus class attempted something like this:

Problem statement: Find the derivative of $3^{(5x+1)}$ with respect to $x$.

Proposed solution:

Let the inner function be given by $g(x)=3,$ and the outer by $f(z)=z^{(5x+1)}$, so that $$f(g(x))=3^{(5x+1)}.$$
$f'(z)=(5x+1)\cdot z^{(5x)}$ and $g'(x)=0$, so by the chain rule, $$\frac{d\left(3^{(5x+1)}\right)}{dx} = f'(g(x))g'(x)=0.$$

I had difficulties explaining what's wrong with this, and basically just told the student "the" right way to do it. Although I now have a rough idea of what's wrong, I'd like to hear from others:

Have you seen similar attempts?
How would you explain to a beginning calculus student what's wrong with this specific solution?

see https://math.stackexchange.com/questions/401122/explaining-the-derivative-of-xx and the answers there for a similar issue — Matthew Towers, Mar 22 '19 at 12:48
Thanks @MatthewTowers. That's indeed almost the same question. Maybe a moderator can mark this as duplicate? — Michael Bächtold, Mar 22 '19 at 15:52

Kevin · Answer 1 · 2019-03-21T21:22:59.823

30

The root of the difficulty is that $x$ appears free in $f(z)$, but we are trying to "capture" it with $g(x)$, which is illegal. When we substitute $g(x)$ into $f(g(x))$, we have a variable clash: $$ f(g(\color{red} x)) = 3^{5\color{blue}x + 1} $$

The red (first) $x$ is a different variable from the blue (second) $x$. This is clearer if we rename the bound variable: $$ f(g(\color{red} y)) = 3^{5\color{blue}x + 1} $$

The original expression had $x$ bound to the $\mathrm d x$, so by unbinding it, we have changed the meaning of the expression: $$ \frac{\mathrm d}{\mathrm d \color{blue} x} f(g(\color{red}y)) \ne \frac{\mathrm d}{\mathrm d \color{red}y} f(g(\color{red}y)) $$

(Incidentally, this is one reason I dislike the notation $f'(x)$, because it hides the variable of differentiation. Students must still be taught it, unfortunately, because Leibniz's notation is verbose in some contexts, but it should only be used as shorthand. Students should understand that it is a shorthand, and that there is still a variable of differentiation, even if it is not shown.)

edited Mar 21 '19 at 21:22

answered Mar 21 '19 at 21:17

Kevin

586
3
8

2

I like this answer, and agree with it in essence. But to make it precise one needs to be a bit more careful. For instance: the variabel $x$ is not really bound in the equation $h(x)=3^{5x+1}$. It would be bound if we wrote $\forall x\in \mathbb{R}\colon h(x)=3^{5x+1}$, and then one might argue that variable capture happens in the composition. Alternatively we use mathematicians lambda calculus notation $h=(x\mapsto 3^{5x+1})$ and declare that $x$ is bound therein. Also: the matter of whether $dx$ binds $x$ is not so simple as you might think and (cont.) – Michael Bächtold Mar 22 '19 at 08:36
2

(cont.) I don't quite understand what you don't like about $f'$, since in fact it needs to hide the name of the variable, since it is bound in $f$. – Michael Bächtold Mar 22 '19 at 08:38
2

@Michael In common usage, $h(x) = 3^{5x+1}$ really does mean $h = (x\mapsto 3^{5x+1})$ most of the time, doesn’t it? It’s just the same widespread abuse of notation as writing “the function $h(x)$” when you really mean “the function $h$”, only milder. (Alternatively, you could say that it’s a different widespread abuse of notation, omitting universal quantification over apparently-unbound variables.) – Alex Shpilkin Mar 22 '19 at 13:42
@AlexShpilkin I guess you are right. Still, I wouldn't explicitly encourage people to think that $\exp(x)=e^x$ literally means the same as $\exp=(x\mapsto e^x)$, unless we also want them to conclude that $\exp(0)=1$ means the same as $\exp=(0\mapsto 1)$. – Michael Bächtold Mar 22 '19 at 16:34
@MichaelBächtold: I have no objection to $f'$, and in principle it ought to be OK to write $f'(x)$ if you can write $f'$. But $f'(x)$ is prone to the kind of error that your student made, unless you are also proposing to spend the extra class time to explain the difference between $f$ and $f(x)$ (because by default, your students will have no idea there is a difference). I would not recommend that at the introductory calculus level, however. It will likely shed more heat than light, and possibly leave your students more confused than they were to begin with. – Kevin Mar 22 '19 at 16:43
(cont) By contrast, Leibniz's notation lends itself easily to the $f(x)$ syntax that your students already know and use. It also makes for a very intuitive chain rule, and primes them for manipulating differentials in more exotic ways in future courses. – Kevin Mar 22 '19 at 16:48
@Kevin: I agree with you that for introductory purposes it is useful to use Leibniz notation; see this for examples. But I'm against the suggestion of treating "$f'(x)$" notation as short-hand for "$\frac{d(f(x))}{dx}$" when $x∈dom(f)$. To see why, consider that "$f'(2x)$" is equal to "$\frac{d(f(2x))}{d(2x)}$", but would students understand it? Worse still, consider that "$f'(2)$" is not equal to "$\frac{d(f(2))}{d(2)}$" because the latter is simply meaningless. – user21820 Mar 24 '19 at 08:50
There appear to be two main ways one can escape this issue: (1) Define $f'$ in the standard way, namely as a function (or partial-function if you prefer) whose value is the pointwise derivative of $f$. (2) Define "$f'(E)$" as short for "$\left. \frac{d(f(x))}{dx} \right|_{x:=E}$" where $x$ is an unused variable, after defining the notation for the evaluation of an expression by substitution. $\ \ $ I would guess that for non-math-majors, (2) is a good approach, but the teachers using it had better know what they are doing... – user21820 Mar 24 '19 at 08:55
@user21820: Yes, that's the difficulty. You can't teach them functions-as-functions unless you want to blow a very large amount of class time on a rather fine distinction which they don't yet need, and you can't just treat $f'(x)$ as a "mechanical" shorthand (for all of the reasons you explain). So the next best thing is that $f'(x)$ is a "magical" shorthand that "means the right thing" in any given context, which is obviously a terrible explanation. That's why I dislike $f'(x)$. – Kevin Mar 24 '19 at 15:53
Agreed; if the syllabus dictates that they must learn differentiation at that level, then the best choice is probably the Leibniz-style calculus as per my linked post. The symmetry of the chain rule is also excellent as it reflects the underlying reason for its truth. – user21820 Mar 24 '19 at 16:36

score 13 · Answer 2 · answered Mar 21 '19 at 16:30

13

f is not a function of (only) z - f here is a function of x as well as z. I think this explanation is intelligible to a calc 1 student, and gets at the heart of the matter.

answered Mar 21 '19 at 16:30

Henry Towsner

11,601
1
36
63

2

Hmm, so the student should reply: as soon as $f(x)$ contains parameters other than $x$ I am not allowed to apply the chain rule? – Michael Bächtold Mar 21 '19 at 16:37
By the way: to my mind $f$ ist not a function of $z$ at all. Maybe you meant $f(z)$? – Michael Bächtold Mar 21 '19 at 16:43
3

@MichaelBächtold: The student should know that if f(x) contains variables other than x then the chain rule doesn't apply. (This might be an opportunity to mention that there is a variant of the chain rule to be learned later for covering such situations.) I decline to get into a pedantic discussion of the distinction between f and f(z). – Henry Towsner Mar 21 '19 at 18:14
1

I'm quite sure you use the chain rule to derive things that contain more than just $x$ in your calculus class, like $\sqrt{x^2+k}$. You might say: that's ok if we treat $k$ as a constant and not as a variable. But then student might then ask: why am I not allowed to treat $x$ as a constant in the definition of $f$? (And a mathematician might add: what's the difference between a variable and a constant?). Apologies if my pedantry offends you. – Michael Bächtold Mar 21 '19 at 18:49
6

@MichaelBächtold: In general, the difference between a variable and a constant is contextual and tricky to make precise, but for purposes of the chain rule in this case, x is a variable because we're taking the derivative with respect to it. I find that students don't usually have difficulty with this point (for instance, one could imagine students getting confused about the difference between the derivative of f(x)=c and f(x)=x, but that's not a particularly common issue), because it's a clean syntactic rule and it's backed up by the notion (x and z are conventionally variables, k isn't). – Henry Towsner Mar 21 '19 at 19:28
1

@MichaelBächtold: Another way to think of it: In the student's proposed solution, $x$ is a free variable within $f$, but the expression was originally $h(x) = 3^{5x+1}$, i.e. $x$ was originally a bound variable. You can't transform a bound variable into a free variable. – Kevin Mar 21 '19 at 20:59

score 8 · Accepted Answer · edited Mar 23 '19 at 00:34

8

$$ \frac{d (3^{5x+1})}{dx} = f'(g(x))g'(x)= \frac{d \left(3^{5x+1}\right)}{d(3)} \times \frac{d (3)}{dx}. $$

However $\dfrac{d (3^{5x+1})}{d(3)}$ is undefined.

edited Mar 23 '19 at 00:34

amWhy

2,095
1
17
34

answered Mar 22 '19 at 11:14

Taemyr

228
1
2

There were many good answers and I found it difficult to decide. I opted for this one since it is so succinct. – Michael Bächtold Mar 23 '19 at 08:57
3

Hence by definition, any substitution ()=constant will blow up and fail for the same reason. (and that's before we get to the other issue with the substitution which the other answers point out: namely that we didn't really do a substitution, because there are still dangling references to x, as well as z) – smci Mar 23 '19 at 15:05
1

@smci Yes, but this explanation also works in the example of $x^x$, which appears in a very similar question. There a student attempted to use $x$ as inner function (i.e. not a constant). – Michael Bächtold Mar 23 '19 at 15:19
6

This is not a valid explanation. Surely, the the chain rule notation works when $u=g(x)$ is a constant! – user52817 Mar 23 '19 at 16:01
2

@MichaelBächtold: It's ironic that you picked the totally wrong explanation on Math Educators SE. If you read a proper rigorous statement of the chain rule, it can be applied only under certain conditions, one of which is that the 'component' derivatives exist... – user21820 Mar 24 '19 at 07:50
@user52817: You're indeed correct that this answer is not at all a valid explanation of what is the error in the question; please see my answer for the correct explanation. – user21820 Mar 24 '19 at 08:31
1

@user21820 Sorry to disagree: you have not understood the answer of Taemyr. – Michael Bächtold Mar 24 '19 at 09:45
@user52817: it depends on what you mean by chain rule notation. There are at least 3 ways to write down the chain rule: $(f\circ g)'=(f'\circ g)\cdot g'$, $d(f(g(x))/dx=f'(g(x))\cdot g'(x)$ and $dz/dx=dz/dy \cdot dy/dx$. If you are talking about the first one then I agree. If you take the perspective of the last one then it doesn't work when $y$ is a constant. – Michael Bächtold Mar 24 '19 at 10:20
@Michael Bächtold: Leibnitz chain rule notation $dz/dx=dz/dy\cdot dy/dx$ works in the case of a constant variable. Say $z=y^2$ and $y=c$. More explicitly, $z(y)=y^2$ and $y(x)=3$. Then $\frac{dz}{dx}=\frac{dz}{dy}\cdot\frac{dy}{dx}=2y\cdot 0$ – user52817 Mar 24 '19 at 13:35
@user52817 What you wrote there might be translated into real Leibniz notation as: $dy^2/d3=dy^2/dy⋅dy/d3$. It already breaks down on the left hand side. – Michael Bächtold Mar 24 '19 at 20:49
@user21820 I've tried to address your concerns with this answer in my post below. – Michael Bächtold Mar 25 '19 at 18:04
@MichaelBächtold: I still disagree, and I think I've adequately explained why in my comment on your answer. If you still don't get my objection, I'm afraid you're going to have to get down into a more rigorous formalization before the issues will become clear. I wrote in chat that the error in this answer is the same as asserting "1/(1/0) = 0" and then saying "but 1/0 is undefined". No, one cannot even write "1/0" anywhere in a mathematical statement because "1/0" is ill-defined (in a context where 1,0 are ordinary reals). – user21820 Mar 25 '19 at 18:39
2

@user21820 You just wrote 1/0 in a mathematical argument and the world didn't implode. But more seriously: Taemyr wrote $d(3^{5x+1})/d3$ in order to point out to the student that it was undefined. I don't understand all the fuzz you are trying to make. – Michael Bächtold Mar 26 '19 at 08:29
1

@MichaelBächtold: You're making a basic logical error. I wrote "1/0" within quotes, not "1/0" without. The string "Michael" is not the same as the entity Michael. One is not justified in writing "1/0" in any mathematical argument, but one can definitely write the quotation of "1/0" when discussing why it is syntactically ill-defined. – user21820 Mar 26 '19 at 09:01
@user21820 actually you wrote 1/0 within quotes not "1/0" within quotes. That would have been ""1/0"". – Michael Bächtold Mar 26 '19 at 10:27
1

@MichaelBächtold: Again you misinterpret, and I'm using standard notation. "Michael" is a string, and that string does not contain quotation marks. In my above comment I wrote the string "Michael" in quotes, just like I wrote "1/0" in quotes in the preceding comments. Please, stop assuming I'm the one making mistakes here. – user21820 Mar 26 '19 at 10:33

kcrisman · Answer 4 · 2019-03-23T01:09:12.460

This is a VERY VERY typical problem. In fact, it's a problem even for $\frac{d}{dx}3^x$, much less your example.

The way I try to deal with this is one of two ways.

What has to happen first? To evaluate $3^{5x+1}$, you have to evaluate $5x+1$ first. So that is the inside function in the chain rule, just like in $\sin(x^2)$ you have $x^2$ to evaluate first, so it is the inside function.
You could rethink how we notate or talk about exponential functions. In particular, Excel has $e^x$ written as exp(x) (I think as an option). So one can ask what the "input" is here.

However, on the plus side the student does seem to have the chain rule down; it's just the exponential notation that is causing trouble. So there is definitely hope here. And again, you should not be surprised at encountering this, so it is worth your time to come up with several possible responses for it in the long run. Good luck!

score 5 · Answer 5 · edited Jun 18 '20 at 08:32

5

The other answers have completely missed the mistake. $ \def\rr{\mathbb{R}} $

Your student's error has nothing to do with exponentiatiation. Consider the following based on exactly the same error:

$\color{red}{\text{Let (???)}}$ $f(y) = x$ and $g(x) = 1$.

Then $1 = \frac{dx}{dx} = (f∘g)'(x) = f'(g(x))·g'(x) = f'(1)·0 = 0$.

The error lies in the very first line! It is extremely obvious once you actually attempt to make it rigorous. Recall that to define a function you must provide a domain as well as a rule that specifies the output for each input in the domain. And of course the rule has to be meaningful in the context where you want to define the function. So see what you get:

$\color{red}{\text{Let (???)}}$ $f : \rr→\rr$ such that $f(y) = x$ for each $y∈\rr$.

Let $g : \rr→\rr$ such that $g(x) = 1$ for every $x∈\rr$.

The definition of $g$ is fine. The definition of $f$ is not fine! What on earth is $x$? The rule has to specify the output for each input $y∈\rr$, so where did $x$ pop up from?

As explained above, the error has nothing to do with differentiation. Rather, it is in the illegal definition of the function!

edited Jun 18 '20 at 08:32

Community

1

answered Mar 24 '19 at 08:06

user21820

2,555
17
30

Furthermore, it is misleading to bring in Leibniz notation when the question is about ordinary functions. Even if we do, the answer given by Taemyr is simply wrong. Given any real/complex variables $x,y,z$, the proper chain rule asserts $\frac{dz}{dx} = \frac{dz}{dy} ·\frac{dy}{dx}$ if $\frac{dz}{dy}$ and $\frac{dy}{dx}$ are both defined. If you cannot prove that the two derivative expressions are defined, then you're not allowed to even write down the so-called chain rule, because it doesn't apply! – user21820 Mar 24 '19 at 08:17
I'm afraid you have not understood Leibniz notation. The accepted answer also works in your example since you would be computing $dx/dx=dx/d1\cdot d1/dx$. The reason this brakes down is since $dx/d1$ is undefined. Most probably you're confusion arises from the fact that you have not understood that there are two different notions of function that mathematicians use: the official modern one (which you are trying to use) and the original one of Leibniz, Bernoulli etc. – Michael Bächtold Mar 24 '19 at 09:44
And by the way: claiming that the other answer have completely missed the mistake shows that you have not read the other answers. For instance Kevins answer is correct and goes in the same direction as yours. But he seems to be able to explain it clearer than you can. – Michael Bächtold Mar 24 '19 at 09:48
@MichaelBächtold: That's totally ridiculous; you cannot write something that is ill-defined. And we're talking about modern mathematical pedagogy here, not any inconsistent stuff that ancient mathematicians used. I think your assumption that I am confused is just rude. – user21820 Mar 24 '19 at 09:49
@MichaelBächtold: Your second comment is also wrong; Kevin's answer goes in a similar direction, but fails to point out that it has nothing to do with differentiation. – user21820 Mar 24 '19 at 09:49
It's not inconsistent stuff. And if you believe it is inconsistent than you should not use it in your answer – Michael Bächtold Mar 24 '19 at 09:50
@MichaelBächtold: I did not use Leibniz notation in my answer! Did you even read it? – user21820 Mar 24 '19 at 09:50
dx/dx is Leibniz notation. Did you even write this? – Michael Bächtold Mar 24 '19 at 09:51
That was explicitly given as an example of error. Please read carefully before talking. – user21820 Mar 24 '19 at 09:52
@MichaelBächtold: And I'm a logician and know very well what is consistent and what is not. Don't assume that what you cannot understand is wrong. – user21820 Mar 24 '19 at 09:58
Congratulations on false sarcasm. And still not admitting your mistakes. – user21820 Mar 24 '19 at 10:40
1

If it makes you feel better: I admit to my mistakes. (Even if I don't know which ones you mean.) And what I wrote was true sarcasm. – Michael Bächtold Mar 24 '19 at 10:42
@MichaelBächtold: It's not about me. It's about your students, whom you are presumably going to teach wrongly because you haven't understood the true error that I'm pointing out to you over here. Your mistakes: (1) The student's error has nothing to do with differentiation. (2) I didn't use Leibniz notation in my explanation, but only in a sample of error elucidating the true error clearer. (3) You still think that Taemyr's answer is correct, despite me explaining in my comment that you cannot write down the chain-rule unless both derivatives in the product are defined. – user21820 Mar 24 '19 at 10:48
The failure to understand how to rigorously define functions (and other mathematical objects) is actually the source of many conceptual errors, not just the one in your question. For instance, the liar paradox arises from the failure to understand what "definition" truly means, and how to validly define new statements. – user21820 Mar 24 '19 at 10:57
1

I upvoted your answer, because it contributes something mathematically to the discussion, even if it completely lacks any tact. I would be curious to know in what sense you think my answer "completely misses the mark": it points out the same error as yours while being slightly more generous to the student (assuming that they were actually defining a function of two variables, rather than just writing something completely meaningless as you suppose). It then shows how the mistake (under this assumption) can be corrected through correct use of the multivariable chain rule. – Steven Gubkin Mar 24 '19 at 14:24
@StevenGubkin: Thanks, and I'd be glad to explain to you what I meant by "missed the mistake", which is precisely and only that the existing answers did not pinpoint the true error. In my many years of teaching, I have noticed that all mathematical misconceptions stem solely from a fundamental lack of logical reasoning. I'm sure you've seen the phenomena where students just mimic the phrases the instructor uses without any true understanding, and where students write things like "$\sqrt{x·y} = \sqrt{x}·\sqrt{y}$". Showing a correct solution does not solve the problem; insisting on proof would. – user21820 Mar 24 '19 at 15:03
By "proof" I do not mean an ambiguous one, nor a symbolic one, but one that is logically structured. This logical structure includes a rigorous format in which new objects are defined, including functions. By failing to point out the true error (the invalid definition of the claimed function $f$), one will not actually make the student have a 100% solid grasp of the error. I don't assume that students are writing things that they actually understand; that's generally false. The easiest way to prove my point is to probe this student to find out what exactly he/she meant; it likely is wrong. – user21820 Mar 24 '19 at 15:08
Based on this viewpoint, including details of the correct differentiation is just distracting and obscures the logical error, so much so that the student will fail to recognize the same error in other completely different topics. That would be pedagogically bad. In my comments I even mentioned the liar paradox as an example, which I'm sure most students and teachers don't realize is indeed an instance of the same error. Why? Because we didn't teach them the core logical reasoning. – user21820 Mar 24 '19 at 15:12
1

@user21820 I can make the accepted answer of Taemyr perfectly rigorous, by interpreting variables like $x,y$ as scalar functions on a manifold and $dx, dy$ as differential forms. I can also make you're answer perfectly rigorous, but if I where your student and you told me "What on earth is $x$? The rule has to specify the output for each input $y\in\mathbb{R}$, so where did $x$ pop up from?" I'd answer $x$ is a real number and $f$ is a map that associates to every input the number $x$. Now, try to explain to me why this is wrong (cont.) – Michael Bächtold Mar 24 '19 at 20:40
2

(cont.) without going into free/bound variables, $\alpha$-conversion and variable capture. That's what Kevin did. Not that I find it wrong to go into this stuff, but I don't think it's so useful for calculus students coming from engineering. Instead, telling them that $3^{5x+1}$ is not a function of $3$ and hence I cannot write $d3^{5x+1}/d3$ seems closer to what they'll need. It's a pity that a hundred years since Frege, logicians have not been able to properly formalise the surrounding notions of variables, constants, and functions of things. – Michael Bächtold Mar 24 '19 at 20:45
@MichaelBächtold: I've taught many students and have no trouble in explaining this. If you assume that I would go into free/bound variables, α-conversion or variable capture, then you again make wrong assumptions. You're also wrong that logicians are unable to formalize variables, constants and functions of things, because I have done so a long time ago. If you really want to understand my approach, you cannot keep adding extra stuff that I didn't. – user21820 Mar 25 '19 at 06:27
If you don't understand what "context" means, or how to teach it at a low level, or why it's the same error as the liar paradox, and you sincerely want to know, come to Basic Mathematics Chat and I'll explain in detail. – user21820 Mar 25 '19 at 06:28
I'll come if you can explain in a few lines why the answer of the student "$x$ is a real number and $f$ is the map that associates to every input that $x$" is wrong. And if you tell me that you have read the discussion on the link in my previous comment. – Michael Bächtold Mar 25 '19 at 07:19
@MichaelBächtold: I've read the linked thread, and my response is that your question is a good one to ask but is actually the 'wrong' question; the issue has nothing to do with the specific choice of foundational system (i.e. the underlying system could be ZFC or CoC or HOL or something else that is sufficiently powerful). You're however correct in your impression that it's not well-known how exactly to rigorously formalize variables that captures closely the intuition. I think the main reason is that most logicians are less concerned with practical logic than theoretical. – user21820 Mar 25 '19 at 07:38
I can give a short explanation why your 'student answer' is invalid, but you will need to understand contexts first, so it will be two comments. In a logical proof (on paper or in the mind), every statement must be made within a clearly identifiable context. One can consider subcontexts (such as under some assumptions), or move between contexts, but one must always be clear on the context one is working in. If $x$ is a single real number in the context where you define $f$ (in the faulty example in my post), then $f$ is clearly a constant function, which is not what you want. – user21820 Mar 25 '19 at 07:45
Every object (including every function) is fixed in its context once declared. There are many constant functions, one for each constant $x$. But each one is constant. If you want the function $f$ to map every input to itself, then you obviously cannot define it via $f(y) = x$. If you want the constant function, then there would be no contradiction because $f'$ would be zero everywhere. Secondly, once $x$ is a real number in the context where you define $f$, then you cannot use $x$ to define $g$ in the same context, because $x$ already refers to something. – user21820 Mar 25 '19 at 07:57
I think I understand contexts well enough to see what you are trying to say. To my mind, the student introduces the context as soon as he says "$x$ is a real number". Now, it is true that we don't want $f$ to be a constant function, and that might be easy to explain in this example. But I don't think you can convince the student so easily if he did the same mistake with $x^x$ as discussed here. cont. – Michael Bächtold Mar 25 '19 at 08:11
Your explanation that "you cannot use $x$ to define $g$" is not correct, since $x$ is bound in $g$. It doesn't matter that $x$ was already in the context. I still don't see how you can really get to the bottom of the issue without discussing variable capture and alpha equivalence. But I'm still intrigued by your claim that you have formalized the idea of variables/constants and functions of things long ago – Michael Bächtold Mar 25 '19 at 08:12
2

@MichaelBächtold: I'll respond in chat. This is getting way too long for a comment thread. – user21820 Mar 25 '19 at 08:13
I'll go to chat as soon as I have some time – Michael Bächtold Mar 25 '19 at 08:14
1

For future readers, the discussion is continued in depth starting from here. – user21820 Mar 25 '19 at 09:54

score 4 · Answer 6 · answered Mar 22 '19 at 12:34

This idea is fine, and you can use the multivariable chain rule to do it this way.

Say we want to differentiate $h(x) = f(x)^{g(x)}$ with respect to $x$. Notice that we can write $h$ as the composite of $p: \mathbb{R} \to \mathbb{R}^2$ defined by $p(t) = (f(t),g(t))$ with the function $E: \mathbb{R}^2 \to \mathbb{R}$ defined by $E(u,v) = u^v$.

By the multivariable chain rule,

$$ \begin{align} Dh\big|_{x} &= DE\big|_{p(x)} \circ Dp\big|_{x}\\ &= \left.\begin{bmatrix} \frac{\partial E}{\partial u} & \frac{\partial E}{\partial v} \end{bmatrix} \right|_{(u,v) = (f(x),g(x))} \circ \left.\begin{bmatrix} \frac{\partial f}{\partial t} \\ \frac{\partial g}{\partial t}\end{bmatrix}\right|_{t = x}\\ &= \left.\begin{bmatrix} vu^{v-1} & \ln(u) u^v \end{bmatrix} \right|_{(u,v) = (f(x),g(x))} \circ \left.\begin{bmatrix} f'(t) \\ g'(t)\end{bmatrix}\right|_{t = x}\\ &= \begin{bmatrix} g(x)(f(x))^{g(x)-1} & \ln(f(x)) (f(x))^{g(x)} \end{bmatrix} \circ \begin{bmatrix} f'(x) \\ g'(x)\end{bmatrix}\\ &=f'(x)g(x)(f(x))^{g(x)-1}+g'(x)\ln(x)f(x)^{g(x)} \end{align} $$

Applying this to the problem in question, we see that $f'(x) =0$, so the first term disappears.

So, in a sense, the student was trying to apply the multivariable chain rule (using the two variables $z$ and $x$), but didn't know how to do that yet. So you could tell them it is a good approach, but they will learn how to properly execute that approach in calc 3.

score 3 · Answer 7 · answered Mar 22 '19 at 10:52

3

I had a similar problem with a student last week and could not succinctly explain why she could not select $e$ as the 'inner' function here: $$f(x) = e^{8x +4}$$

The best explanation I have seen thus far, Paul's Notes, explains it in this way:

Recall that the 'outside' function is the last operation that we would perform in an evaluation. In this case if we were to evaluate this function the last operation would be the exponential. Therefore, the outside function is the exponential function and the inside function is its exponent.

answered Mar 22 '19 at 10:52

Bionic Buffulo

139
3

Because 'e' is a constant? How would you explain it if instead of e, the base were '2'? – JTP - Apologise to Monica Mar 22 '19 at 15:27
Yes this is because 'e' is a constant. Therefore, the explanation would also work when the base is '2'. – Bionic Buffulo Mar 25 '19 at 12:09
Agreed, that was what I was trying to suggest. The f(x) you shared has no ‘inner function’ . – JTP - Apologise to Monica Mar 25 '19 at 12:21

score 2 · Answer 8 · answered Mar 25 '19 at 15:41

Some users have expressed doubt at the validity of the accepted answer, so let me make it rigorous. To do that, we first need to make sense of the notation $\frac{dy}{dx}$. (Which is not a trivial task.)

We start by changing perspective: instead of thinking of variables $x$ and $y$ as numbers, we think of them as smooth real valued functions on some manifold $M$. So $x\colon M\to \mathbb{R}$ and $y\colon M\to \mathbb{R}$. You might rightly ask: why would we do that and which manifold $M$ are you talking about? The answer to the first question is: because I'd like to talk about the differentials $dy$, $dx$ and say everyday stuff like "$y$ is a constant" or "$y$ is a function of $x$". All of that is impossible inside first order logic + ZFC if we simply interpret $x$ and $y$ as elements of $\mathbb{R}$. Concerning the second question: think of $M$ as the physical state space underlying the problem we are trying to model and $x$ and $y$ as observables. If that sounds too unfamiliar: it's similar to how people in probability theory assume an underlying space of outcomes $\Omega$, in order to talk about random variables. (Which are the things we really care about and historically came before $\Omega$, just like $dx,dy$ historically came before manifolds, but I'm drifting of.)

So, having fixed the background manifold $M$, whenever you hear me say variable, what I mean is a thing of type $M\to \mathbb{R}$.

Definition. Given two variables $x$ and $y$, call $y$ a function of $x$ if $dx$ is not zero almost everywhere and there exists a variable $q\colon M \to \mathbb{R}$ such that $$ dy = q \cdot dx. $$

Intuitively, the equation $ dy = q \cdot dx $ says that the change of $y$ is determined by the change of $x$, i.e. that $y$ depends on $x$.

It's not hard to show that $q$ is uniquely determined by $x$ and $y$, hence we decide to denote it with $\frac{dy}{dx}$ and call it the derivative of $y$ wrt. $x$. It was originally called the differential coefficient, cause that's what it is.

According to this definition, $3^{5x+1}$ is a function of $x$ (assuming $x$ is a true variabel, i.e. $dx\neq 0$), but it's also a function of $5x+1$. On the other hand, $3^{5x+1}$ is not a function of $3$ since $d3=0$ (what we call a constant). In particular $\frac{d3^{5x+1}}{d3}$ is undefined, as user21820 has been pointing out emphatically.

We can now state the chain rule in Leibniz form

Theorem. If $z$ is a function of $y$ and $y$ is a function of $x$, then $z$ is also a function $x$ and their differential coefficients satisfy $$ \frac{dz}{dx}=\frac{dz}{dy}\cdot \frac{dy}{dx} $$

The proof of this is trivial.

From this perspective, what the student in my question was trying to do, was to let $z=3^{5x+1}$ and $y=3$. But the theorem does not apply, since $3^{5x+1}$ is not a function $3$. The same point of view can be used in the example discussed here, where a student attempted to differentiate $x^x$ by taking $x$ as inner function. Although that's allowed it just leads to $$ \frac{dx^x}{dx}=\frac{dx^x}{dx}\cdot \frac{dx}{dx} $$ which is of not much use.

This is not to say that I don't appreciate the other answers (also users 21820). Taemyr's is just one of the three perspectives that haven been proposed. It might seem like it needs a lot of background to make it rigorous. But consider that mathematicians understood this stuff for at least 200 years without requiring manifolds to formalize it. And consider that the other approaches also require quite some background to make them rigorous (like quantifiers, variable bindings, the idea of dummy/bound variables etc. or derivatives of functions of multiple variables). Each has it advantages and disadvantages and none seems more right than the others, methinks.

It seems you are finally agreeing with me that the accepted answer is not completely correct, because it literally wrote an equation involving what your answer states to be ill-defined. That from the beginning was my objection to it (see my first comment). I did not say that anything else was amiss with that answer, except that it fails to pinpoint the original student's error (regarding functions, not differentials). — user21820, Mar 25 '19 at 18:34
@user21820 You have a strange sense of humor. But thanks for making me laugh. Unfortunately you also made me give up my last hope of leading an honest conversation with you. — Michael Bächtold, Mar 26 '19 at 08:33
It's your choice, but you've been repeatedly misinterpreting whatever I say, and then blame me for it instead of considering that perhaps you are the one who is wrong. Your last comment insinuates that I am dishonest. That's false. — user21820, Mar 26 '19 at 09:05

score 1 · Answer 9 · answered Mar 26 '19 at 22:31

1

Applying the naive approach of a non-mathematician, to me the expression $z^{(5x+1)}$ points to a bivariate function,

$$f(z,x) = z^{(5x+1)}$$

(because "I see two variables in here"), and with $g(x) = 3$ we have defined

$$f(g(x), x) = 3^{(5x+1)}$$

Then

$$\frac {df(g(x),x)}{dx} = \frac {\partial f(g(x),x)}{\partial g(x)}\cdot \frac {dg(x)}{dx} + \frac {\partial f(g(x),x)}{\partial x}\cdot \frac {dx}{dx}$$

$$=\frac {\partial f(g(x),x)}{\partial g(x)}\cdot 0 + \frac {\partial f(g(x),x)}{\partial x}\cdot 1 = \frac {\partial f(g(x),x)}{\partial x} $$

$$=\frac {\partial}{\partial x} \left(3^{(5x+1)}\right) $$

This appears to be correct, although not useful, since we ended up back in the beginning. Am I doing something wrong here?

answered Mar 26 '19 at 22:31

Alecos Papadopoulos

1,542
8
16

It's unclear whether you know exactly what you're doing. You can't just say "I see two variables in here"! Given any real $x$, the function $f:\mathbb{R}→\mathbb{R}$ defined via $f(y) = x·y$ for $y∈\mathbb{R}$ is a one-input function, not a bivariate function. Yes, the expression "$z^{5x+1}$" has two variables, so it is meaningful only in a context where both $x,z$ are defined (e.g. $x,z∈\mathbb{R}$ and $z>0$), but that is precisely the true error in the asker's question (see my answer); it did not define $x$, and once you define $x$ you can't reuse it in defining $g$. – user21820 Mar 27 '19 at 07:49
Also, it is actually incorrect to write "$\frac{∂f(g(x),x)}{∂g(x)}$". What you want is $\left. \frac{∂f(t,x)}{∂t} \right|_{t:=g(x)}$. To prove that it is incorrect, consider that if you had wanted $\frac{d(f(g(x),g(x)))}{dx}$ instead your 'proof' would have included the term "$\frac{∂f(g(x),g(x))}{∂g(x)}$", which makes the mistake obvious. – user21820 Mar 27 '19 at 07:53
@user21820 On your first comment, let $x \in \mathbb{R},; z>0$ and define the bivariate function $f(z,x) = z^{(5x+1)}$. Further, define $z\equiv g(x)$ and also define $g(x) =3$ with $x$ defined as previously. Is there any problem with these definitions? This is not what the OP's student did of course, my answer was a reflection on why $f(z) = z^{5x+1}$ is not a correct expression, and that the moment you write $z^{5x+1}$ treating $z$ as a variable, what you can have is a bivariate function since $x$ is already defined (even if implicitly) as a variable. – Alecos Papadopoulos Mar 27 '19 at 09:34
@user21820 Regarding your second comment, I was under the impression that in the left-hand side expression we include a variable of a function only once inside the parenthesis, since the $()$ in $f()$ just lists the variables of the function. When do we want or need to write something like $f(y,y)$, what purpose does it serve? – Alecos Papadopoulos Mar 27 '19 at 09:49
You didn't understand my first comment; please read the second sentence again, which is a counter-example to your "I see two variables" thinking. You also don't seem to understand rigorous notation in your last comment. You say "the () in f() just lists the variables of the function", but you didn't even do that; you wrote "$f(g(x),x)$"! Furthermore, you suggest (falsely) that there is no purpose in writing something like "$f(y,y)$". No, it is not only legitimate but also very useful to be able to evaluate a two-input function along its 'diagonal'. – user21820 Mar 27 '19 at 11:25
If you don't understand rigorous definition of functions, feel free to come to the Basic Mathematics chat-room and inquire. The mistake in the question was completely a matter of illegal function definition (or illegal reuse of a variable), and any logician will agree with me. It is misleading to bring in partial derivatives, and your attempt is not even right, as I've pointed out clearly in my second comment. If you sincerely want to understand what you're doing wrong, come to the chat-room where I can explain in detail. – user21820 Mar 27 '19 at 11:44
1

@user21820: writing "$\left.\frac{\partial f(t,x)}{\partial t}\right|{t:=g(x)}$" is actually _as incorrect as writing "$\frac{\partial f(g(x),x)}{\partial g(x)} $", since the notation $\frac{\partial y}{\partial x}$ doesn't make explicit which variables are to be held constant. Jacobi understood this, but his suggestion for fixing this unfortunately caused even more confusion. I'm sure you understand this, since you gave a good answer here. – Michael Bächtold Mar 31 '19 at 10:04
@MichaelBächtold: There are two ways to deal with the error that I pointed out. One way is indeed via the notation in the post you linked to, but the other way that I stated in the comments is actually correct, but you didn't understand it. In this approach, you can write a partial derivative only with a variable name at the denominator, so there is zero ambiguity what it means. But by doing this, the result of the partial differentiation yields an expression, so we then need a syntactic substitution to 'evaluate' that expression. As before, don't assume I'm wrong. – user21820 Mar 31 '19 at 10:17
1

@user21820 I would still consider this wrong, since in practice we want to substitute mathematically equals (and not syntactically equals) for $y$ in $\partial {y}/\partial{x}$. In particular, your suggested use of the notation would either make $\partial {y}/\partial{x}$ undefined, or yield $\partial {y}/\partial{x}=0$ even if we had previously assumed something like $y=x^2$. – Michael Bächtold Mar 31 '19 at 10:58
@MichaelBächtold: Of course you can't anyhow combine notations and approaches since they may be incompatible. I gave one possible way to accommodate the partial derivative notation in a fully rigorous formalization. A lot of your misunderstandings of what I say seem to stem from unnaturally permitting variable shadowing. Observe that if you forbid reusing declared variables, then you are forbidden to have "$y = x^2$" and yet write "$∂y / ∂x$" because (as in defining functions) you must use an unused variable... Seriously, almost all mathematical texts do not have variable shadowing. – user21820 Mar 31 '19 at 11:23
In case it's unclear, when you have "$y = x^2$" you must have the variable "$x$" declared as some type of object, and so (to do it properly) you cannot reuse it as the variable in the denominator of a partial derivative. I am well aware that many multivariable calculus texts are chock-full of ambiguous notation or abuse of notation, but that's... not my notation. – user21820 Mar 31 '19 at 11:30
1

@user21820 Sorry, I don't see where variable shadowing happens in this example. Maybe you can tell me where exactly after I spell it out: "Let $x$ and $y$ be reals and assume $x^2=y$. Then $dy/dx=d(x^2)/dx$..." I wrote "$x^2=y$" instead of $y=x^2$ to emphasise that I don't think of it as variable assignment like most programming languages. I think of it as assuming an equality. Most programming languages would demand something like "$x=3$" before I'm even allowed to write "$y=x^2$". But under such a dogma of "all variables are constants" I see little chance of making sense of $dy/dx$. – Michael Bächtold Mar 31 '19 at 17:08
@MichaelBächtold: I never once said that it's legitimate to write "$dy/dx$" or "$∂y/∂x$" if your $x,y$ are declared as real numbers. If $x,y$ are instead variables varying with some parameter $t$ in my framework (which can be fully formalized), then under certain conditions "$dy/dx$" is defined, and if furthermore $y = x^2$ then $dy/dx = 2x$. But still "$∂y/∂x$" is forbidden because "$∂(\cdots)/∂x$" is syntactically valid only if "$x$" is an unused variable name. Here it has already been used. And, again, you changed my words; I never said "constant", instead I said "declared". – user21820 Mar 31 '19 at 17:18
1

@user21820 I never claimed you said "constant", but I was trying to guess where you might see a problem. You didn't answer where exactly shadowing happens in the sentence I wrote, but that's ok, let's leave it at that. – Michael Bächtold Mar 31 '19 at 17:31
@MichaelBächtold: I did answer it; the shadowing occurs once you write "$∂y/∂x$" after having already declared $x,y$ as variables even in my framework. It's possible to still handle it consistently, but one must be very careful. – user21820 Mar 31 '19 at 17:37
1

@user21820 since I don't know your framework, this discussion seems futile. Write it down somewhere so others can read and judge it. Until then have fun accusing people of making mistakes in your framework. – Michael Bächtold Mar 31 '19 at 17:52
@MichaelBächtold: You're totally wrong again. Regardless of my framework, I can correctly criticize other people for using inconsistent notation. I said very clearly right here why this answer uses inconsistent notation, which has nothing to do with my framework. You were the one who falsely claimed that there was no consistent alternative. – user21820 Apr 01 '19 at 06:39
This is the second time I'm saying that I can explain my framework if you fully understand contexts, but you seem to think you know better and didn't even want to learn what I tried to explain to you in chat. Read what I already explained in chat and clarify if need be. Until then, there's no evidence you sincerely want to learn. – user21820 Apr 01 '19 at 06:44
1

@user21820 I understand contexts and what you wrote in chat. But feel free to assume whatever you want about me. – Michael Bächtold Apr 01 '19 at 08:17
Fine, come to the chat-room if you want me to explain my framework. – user21820 Apr 01 '19 at 10:03

score 1 · Answer 10 · answered Jan 08 '24 at 16:33

This answer may not be a lot different from the other answers, but here is how I would phrase it:

When we use the chain rule to compute $\tfrac{d}{dx} f(g(x))$, the dependence on $x$ should only be through the input $g$. In your function, $f(z) = z^{5x+1}$, the function $f$ depends on $x$ not only through $z$, but also through the $5x+1$ in the exponent. It would be clearer to write it as $f(x,z)$, so this issue would be clear. When you use the chain rule, your $g$ should be big enough to include everything that depends on $x$ inside it.

(Optional second paragraph) If you need to compute $\tfrac{d}{dx} f(x,g(x))$, there is a formula for that: $$\frac{d}{dx} f(x,g(x)) = \frac{\partial f}{\partial x} (x,g(x)) +\frac{dg}{dx}(x) \cdot \frac{\partial f}{\partial y} (x,g(x)).$$

user182601 · Answer 11 · 2024-01-09T00:49:29.003

In writing $f(z)=z^{5x+1}$, the student is (incorrectly) taking $x$ to be a constant.

Now, given also $g(z)=3$ (for all $z \in \mathbb R$), we have $f(g(z))=f(3)=3^{5x+1}$ (where again $x$ is a constant).

Observe that given a small change in $a$, $f(g(a))$ doesn't change. That is, $(fg)^\prime(a)=0$.

And this is indeed what we get when we apply the Chain Rule:

Since $f^\prime(z)=(5x+1)z^{5x}$ and $g^\prime(z)=0$, we have $$(fg)^\prime(z)=f^\prime(g(z))g^\prime(z)=(5x+1)[g(z)]^{5x}\times0=0.$$

How to explain what's wrong with this application of the chain rule?

11 Answers11