I have about 8000 text files which contain csv data like
CustomerID,Gender,Day,SaleAmount
18,Male,Monday,71.55
24,Female,Monday,219.66
112,Male,Friday,150.44
My code is looping through all the files and then appending it to final.txt-
with open('final.txt', 'wb') as outfile:
for filename in files:
with open(filename, 'rb') as readfile:
shutil.copyfileobj(readfile, outfile)
Now the problem is since each file has it's own header i.e.
+------------+--------+-----+------------+
| CustomerID | Gender | Day | SaleAmount |
+------------+--------+-----+------------+
My final content looks like this -
+------------+--------+--------+------------+
| CustomerID | Gender | Day | SaleAmount |
+------------+--------+--------+------------+
| 18 | Male | Monday | 71.55 |
| 24 | Female | Monday | 219.66 |
| 112 | Male | Friday | 150.44 |
| CustomerID | Gender | Day | SaleAmount |
| 28 | Male | Monday | 7.55 |
| 34 | Female | Monday | 19.66 |
| 12 | Female | Friday | 150.44 |
| CustomerID | Gender | Day | SaleAmount |
| 28 | Male | Monday | 7.55 |
| 34 | Female | Monday | 19.66 |
| 12 | Female | Friday | 150.44 |
+------------+--------+--------+------------+
Is there a way to merge all the 8000 txt files into one keeping just one header using shutil.copyfileobj?
I've tried using pd.read_csv but copyfileobj is twice as fast. Are there any other faster way to do this?
EDIT - I am reading directly from txt files and not dataframes.