1

Watching this piece of code in the book:

def split_train_test_by_id(data, test_ratio, id_column, hash=hashlib.md5):
ids = data[id_column]
in_test_set = ids.apply(lambda id_: test_set_check(id_, test_ratio, hash))
return data.loc[~in_test_set], data.loc[in_test_set]

Never saw this loc[~<..>] before. Probably understanding the functionality, however want to be sure. Also is it working only in pandas or python in general?

Henry Ecker
  • 31,792
  • 14
  • 29
  • 50
zzHQzz
  • 211
  • 3
  • 8

1 Answers1

0

I saw some great comments above, but wanted to make sure that it's clear for a beginner. The ~ flips 1s to 0s and 0s to 1s. It is commonly used with pandas to signify not. In your example, ~in_test_set is similar to saying not in_test_set. The advantage to ~ is that it works with a set of values and is not limited to a single value. See the Python wiki on bitwise operators.

Polkaguy6000
  • 1,000
  • 1
  • 7
  • 14