3

I'm having trouble grasping the viability/value of log'ing a dataset. This post mentioned that it's used to normalize (read: shrink extremes of) a dataset and make it easier to fit a curve. But, doesn't that just distort the data? Is this a trade-off that just warps the data when it comes to the outliers to make it a more generally-useful curve?

Please forgive any misused vocabulary. I'm an engineer, not a statistician.

1 Answers1

2

The question you link to doesn't seem to say anything about fitting a curve to data. The answer says "fit them on the same curve" which is actually talking about displaying the data (see the reference to tick marks), and the graph they're discussing (at the linked article).

[I wouldn't have used the word 'curve' there, myself.]

When it's made clear that the data are log scale (and that should be made clear on the plot, not just in the text), what is being distorted exactly?

From what I've seen log-log plots are used widely in engineering applications, and often for much the same reasons it might be done here.

What they are useful for is making power-relationships linear (the power becomes a slope) and facilitating comparisons in percentage terms (on both variables).

Besides making relationships into a simpler form, which may be more easily discussed, when data span several orders of magnitude, it can be the case that a few large values make it very hard to perceive the broader relationship at all; then a log-transformation may reveal what would otherwise be hidden.

Note that the text discussed the effect of a certain percentage increase in terms of a certain percentage increase in the other variable ("1 percent increase in population generates a slightly smaller (between 0.8 and 0.9 percent) increase in emissions"); this information would correspond to the slope of a line on the log-log graph.

Rather than distorting, it makes it easier to reveal a particular kind of information. One reason why is discussed in the answer to the question you linked to.

Glen_b
  • 282,281
  • (1) That's very interesting. It's hard to divine the usefulness of logs in the greater scheme of things based on naive Internet searches. The power becomes slope? That's fantastic (and amazing, to the ignorant). (2) How can you tell if something is "log scale"? It seems like you're implying that there is a form that may or may not resemble log(x). (3) Those types of plots are probably somewhat more familiar to optimization personnel (whether engineering or not). We might [infrequently] generate the data, but we'd rarely be their target audience. – Dustin Oprea Jan 12 '15 at 04:37
  • $y=ax^p$ $\implies$ $\log y = \log a + p \log x$ ... I've seen such a transformation is used in mechanical engineering, chemical engineering, and electrical engineering many times (to name only a few). Do engineers not calculate any more? Indeed, in the days of slide rules and log tables and nomograms there was really no other practical way to deal with such power relationships, and the list of relationships of that form would fill books. [Indeed, it does fill books; I have books on my shelf where page after page there's another new equation with powers in it.] – Glen_b Jan 12 '15 at 04:58
  • Just an example grabbed at random via an internet search - see here (pdf), figure 9 page 7. Looks like MIT still teach taking logs of power relationships (and using log-log plots to display them) to materials engineers, for example. – Glen_b Jan 12 '15 at 05:05
  • Another: Standard Handbook of Petroleum and Natural Gas engineering (Lyons & Plisga), p6-219 "The primary diagnostic tool for this [the pumping] period is the slope of the log-log plot of net pressure [...] versus pumping time". Indeed a search turns up many more across a wide variety of engineering areas (and none of them directly related to optimization as far as I can see). – Glen_b Jan 12 '15 at 05:20
  • Engineering Flow and Heat Exchange, by Levenspiel, p113, just below eqn 5.18: "Experimentally all we need do is measure the torque at a number of N values, plot on a log-log scale, and evaluate the slope at various N values." ... well, three examples is enough, it looks like log-log plots is clearly still a thing in engineering. – Glen_b Jan 12 '15 at 05:31
  • Further clarification: I'm a computer scientist that isn't involved with image-processing or machine-learning to any significant degree. "Log" generally only comes up when considering computational-complexity. I went back to finish the last couple of classes for my degree, and my current "computational statistics" course is the only class of its kind required. – Dustin Oprea Jan 12 '15 at 05:36
  • Any comment on (2)? Is "log scale" anything that looks like a logarithmic curve when log'd? – Dustin Oprea Jan 12 '15 at 06:45
  • 1
    Sorry, it's not 100% clear what you mean. When there are strongly curving relationships where big values are very big, or when you anticipate a power relationship, or you're interested in describing things in percentage changes, you might use a log-log transformation. In the power-function case you'll get a linear relationship after taking logs, but in the other two situations it might simply make the relationship more clear rather than actually linear. – Glen_b Jan 12 '15 at 07:15
  • That's exactly what I was asking. Thanks, @Glen_b. – Dustin Oprea Jan 12 '15 at 08:31