0

So I'm trying to set up some data ready for some machine learning. I wanted to have all my features in columns and set the features to 0 or 1 if they're included in the row at the 'Tag' column. df.head() here

I have a separate list of tags_features that contains all the features. I am comparing the tags in the 'Tag' column to the tags in tags_features and flipping the 0 to 1 in the respective column for that tag. At least that is what I am trying to do. I am encountering a few issues.

for i in range(df.shape[0]):    
    for j in range(len(tags_features)):
        if(df['Tag'][i][0] or df['Tag'][i][1] or df['Tag'][i][2] == tags_features[j]):
            df.iat[i, j] = 1

Running this I get an error

KeyError Traceback (most recent call last) ~\anaconda3\envs\myenv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3360 try: -> 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err:

~\anaconda3\envs\myenv\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

~\anaconda3\envs\myenv\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_24872/659974376.py in 1 for i in range(df.shape[0]): 2 for j in range(len(tags_features)): ----> 3 if(df['Tag'][i][0] or df['Tag']i or df['Tag'][i][2] == tags_features[j]): 4 df.at[i, j, 1] 5

~\anaconda3\envs\myenv\lib\site-packages\pandas\core\series.py in getitem(self, key) 940 941 elif key_is_scalar: --> 942 return self._get_value(key) 943 944 if is_hashable(key):

~\anaconda3\envs\myenv\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable) 1049 1050 # Similar to Index.get_value, but we do not fall back to positional -> 1051 loc = self.index.get_loc(label) 1052 return self.index._get_values_for_loc(self, loc, label) 1053

~\anaconda3\envs\myenv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3361
return self._engine.get_loc(casted_key) 3362 except KeyError as err: -> 3363 raise KeyError(key) from err 3364 3365 if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 0

Weird part however seems to be in string comparison once I add or.

For example, running

if(df['Tag'][2][0] or df['Tag'][2][1] == "somethingnotrighthere"):
    print("yep")

"yep" will print

Removing the OR will make it run correctly. New to python, can't see what I'm missing and its a fairly simple line

0 Answers0