Most probability books start by talking about four very different things: mean, variance, skewness and kurtosis and then miraculously there is a common thread across all these - these are all so-called "moments" of a distribution. To make matters even more amazing, there is a mysterious function - the moment-generating function/mgf, which allows one to find the rth moment of a distribution by simply taking the rth derivative of the mgf and evaluating it at zero. Magic! The student can imagine a world back in time when the concepts of mean, variance, skewness, and kurtosis existed and then some big prize was announced for someone who would discover a function that would produce all these and more (higher moments). Whoever then would have discovered the mgf should be a household name today as this discovery would seem like nothing short of miraculous; a discovery up there with that of penicillin, perhaps.
How did early probability pioneers stumble upon the concept of mgf? Why would it be integral of an exponential function? Kind of random, isn't it? Why would anyone think that such a function would even exist, which would miraculously give the four unrelated things we just learned about (mean, variance, skewness, and kurtosis) and more? All seems too good to be true, too glorious and incredible of a discovery. Or did people discover the moment generating function first and decided to name the first 4 of its moments - the mean, variance, skewness, and kurtosis? Shouldn't the Moment Generating Function be taught first and then the mean, variance, skewness, and kurtosis after that as just alternative names to the first four moments, to parallel the likely order of discovery? As things are currently taught I imagine some students telling themselves "How in the world could someone discover the mgf to magically produce the four unrelated things I just learned (mean, variance, skewness, and kurtosis)? How would anyone even think such a thing exists, which magically gives the mean, variance, skewness, and kurtosis? Hopefully the student doesn't follow up by saying that "Perhaps this field is not for me as I can never see myself making such a discovery (which, I suspect, in reality was a never much of a discovery to begin with)".
I don't know the answer, but this thread might help: https://hsm.stackexchange.com/questions/3420/what-is-the-history-of-moment-generating-functions-and-the-more-general-charact
– Matt Krause Feb 17 '20 at 15:49