0

I am working with a large csv file (1.3 gb), which gives me memory error when I load in my laptop.

I have data of something like this

Col1 Col2 Col3
item1  a1   b1
item2  a2   b2
item3  a3   b3
item1  a1   b1!
item1  a1   b1
item3  a3   b3

I basically want to find if any item1 values has an anomaly, and perform a couple of operations on rows of the same item.

This input should give me output

Col  Is_Anomaly
item1   Yes
item2   No
item3   No

Currently, I am going through looping twice over the file to make smaller data frames and work on it. I tried using Dask library as well, but I was unable to get all the operations to work and could not find the proper documentation.

Is there any other way, that I can do it efficiently?

Aklank Jain
  • 891
  • 1
  • 10
  • 20

0 Answers0