0

What is the difference between (a) taking the logarithm of a set of numbers (b) dividing the set of numbers by an integer. Both appear to reduce the scale of set of numbers, so can they be used interchangeably?

shaifali Gupta
  • 420
  • 4
  • 17

2 Answers2

1

Both of these are feature transformation techniques but they do different things due to their different mathematical properties. In general it is good to normalize the data we are working with, this is usually done with a linear transformation. However, the logarithm is useful when the range of the data is too large.

For example [1, 10, 100, 1000, 10000, 100000], these data points have too large of a range. If we normalize this data with linear transformation using the max, we get: [1.e-05, 1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00]. You can see that the lower values in this data are all essentially 0. However, with the logarithm of this vector we would get [0, 2.303, 4.605, 6.908, 9.210, 11.513]. We can see that these values now have a much better spread.

x = [1, 10, 100, 1000, 10000, 100000]
x_trans_lin = [t/max(x) for t in x]
x_trans_log = [np.log(t) for t in x]

plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.scatter(range(len(x)), x)

plt.subplot(1,2,2)
plt.scatter(range(len(x)), x_trans_lin, label='linear')
plt.scatter(range(len(x)), x_trans_log, label='log')
plt.legend()
plt.show()

enter image description here

JahKnows
  • 8,866
  • 30
  • 45
0

In short, no: those operations are not equivalent, even if they seem to have the same effect. Just have a look at the Wikipedia page for Logarithm, where there is a nice short comparison of the basic mathematical operations:

enter image description here

n1k31t4
  • 14,858
  • 2
  • 30
  • 49