2

Why do we need three ways for operating?

(I use multiplication for examples)

First way:

df['a'] * 5

Second way:

df['a'].mul(5)

Third way:

df['a'].__mul__(5)

Isn't just two enough, no need an mul, I was wondering can it be like normal ways, like a integer

First way:

3 * 5

Second way:

(3).__mul__(5)

But in regular bases of an inetger:

(3).mul(5)

Would break.

I am just curious, why do we need this much stuff in Pandas, it's same with addition, subtraction and division.

U12-Forward
  • 65,118
  • 12
  • 70
  • 89

3 Answers3

3

* and mul do the same thing, but __mul__ is different.

* and mul perform some checks before delegating to __mul__. There are two things that you should know about.

  1. NotImplemented

There is a special singleton value NotImplemented that is returned by a class's __mul__ in cases where it cannot handle the other operand. This then tells Python to try __rmul__. If that fails too, then a generic TypeError is raised. If you use __mul__ directly, you won't get this logic. Observe:

class TestClass:

    def __mul__(self, other):
        return NotImplemented

TestClass() * 1

Output:

TypeError: unsupported operand type(s) for *: 'TestClass' and 'int'

Compare that with this:

TestClass().__mul__(1)

Output:

NotImplemented

This is why, in general, you should avoid calling the dunder (magic) methods directly: you bypass certain checks that Python does.

  1. Derived class operator handling

Where you attempt to perform something like Base() * Derived(), where Derived inherits from Base, you would expect Base.__mul__(Derived()) to be called first. This can pose problems, since Derived.__mul__ is more likely to know how to handle such situations.

Therefore, when you use *, Python checks whether the right operand's type is more derived than the left's, and if so, calls the right operand's __rmul__ method directly.

Observe:

class Base:

    def __mul__(self, other):
        print('base mul')

class Derived(Base):

    def __rmul__(self, other):
        print('derived rmul')

Base() * Derived()

Output:

derived rmul

Notice that even though Base.__mul__ does not return NotImplemented and can clearly handle an object of type Derived, Python doesn't even look at it first; it delegates to Derived.__rmul__ immediately.

For completeness, there is one difference between * and mul, in the context of pandas: mul is a function, and can therefore be passed around in a variable and used independently. For example:

import pandas as pd

pandas_mul = pd.DataFrame.mul
pandas_mul(pd.DataFrame([[1]]), pd.DataFrame([[2]]))

On the other hand, this will fail:

*(pd.DataFrame([[1]]), pd.DataFrame([[2]]))
gmds
  • 17,927
  • 4
  • 26
  • 51
1

Both the "magic method" __mul__ and the operator * are the same in the underliying python (* just calls __mul__), and as you pointed out it is the way python stadarized handles things. The other method mul is a method that you can use for mapping (use map) and avoiding using a lambda x, y: x*mul for example. Yes, you could still use __mul__ but usually it is not the purpose of those methods (__x__) to be used as normal functions and a simple mul makes the code more clear.

So, you dont really "need" it, but it is nice to have and use.

Netwave
  • 36,219
  • 6
  • 36
  • 71
1

First off, the third way (df['a'].__mul__(5)) should never be used since it's an internal method that's called by a Python class. In general, users don't touch any of the "dunder" methods.

Regarding the other two ways, the first way is obvious; you just multiply the thing. It's standard math.

The second way gets a bit more interesting. One example of how I've used that method is when the function you want to apply is a variable.

For example:

def pandas_math(series, func, val):
    return getattr(series, func)(val)

pandas_math(df['a'], 'mul', 5) will give the same result as df['a'].mul(5) but now you can pass mul as a variable, or whatever other function you want to use. It's much easier than hard-coding all the symbols.

Sam
  • 465
  • 1
  • 3
  • 9
  • You could use the `operator` module if `df['a'].mul` did not exist, so it doesn't really serve a purpose in that regard. – iz_ Apr 19 '19 at 04:34
  • Right, but for some reason Pandas decided to have it built in. I understood the main point being "why would I do `df['a'].mul(5)` instead of `df['a']*5`?". A similar question to "why would one use the `operator` module in the first place?" – Sam Apr 19 '19 at 04:38