I am working with a large csv file (1.3 gb), which gives me memory error when I load in my laptop.
I have data of something like this
Col1 Col2 Col3
item1 a1 b1
item2 a2 b2
item3 a3 b3
item1 a1 b1!
item1 a1 b1
item3 a3 b3
I basically want to find if any item1 values has an anomaly, and perform a couple of operations on rows of the same item.
This input should give me output
Col Is_Anomaly
item1 Yes
item2 No
item3 No
Currently, I am going through looping twice over the file to make smaller data frames and work on it. I tried using Dask library as well, but I was unable to get all the operations to work and could not find the proper documentation.
Is there any other way, that I can do it efficiently?