0

I have a quite large file (excel 120mb/csv 90mb) with data for different companies each company has a company code. I need to develop a tool that creates indexes for one company (defined by the user). So I am aware of methods to filter a dataframe (.query()) but cannot find a way to filter the data during the import process so that I will only load the required company data. The data looks like this:

EID CompCode    V2          Age     
1   1234        1           19
2   1234        3           34
1   2222        4           44

where CompCode is the code of the company, EID is the employee id (consecutive count), V2 is a question where 1-5 are the answers.

I can use excel or csv.

Anyone who can help me out here?

ruedi
  • 5,005
  • 14
  • 49
  • 80
  • 1
    Don't think it's possible with a csv, you could create a `parquet` file and partition by your key columns. or just use a database. – Umar.H Jul 05 '21 at 14:09

0 Answers0