-1

I have about 200 large CSV datasets. All the same format. About half of them are > 100mb. I need a way of consolidating them into 1 (or a few databases).

I had a go at getting them into an azure database using data factory but it took a lot of time to upload just one csv never mind 200.

Can anyone recommend a quick and relatively pain-free way of getting these files into a database of some sort ? (ideally one that can support SQL extraction)

a_horse_with_no_name
  • 497,550
  • 91
  • 775
  • 843
Gangrel
  • 393
  • 3
  • 13
  • Pretty much any database? For example mysql https://dev.mysql.com/doc/refman/8.0/en/load-data.html – hidden_4003 Feb 21 '20 at 10:15
  • 1
    The best program for opening large text files (in my experience) is the not-free Textpad though the fanatics are against product recommendations so take what you got and delete the question before you get sacrificed with the goats. – John Feb 21 '20 at 10:16
  • Do you have MS SQL SERVER? – Besarion Feb 21 '20 at 10:16
  • Also try PowerBI which is free and uses mssql internally. – hidden_4003 Feb 21 '20 at 10:17
  • I would write a bash script wich imports file per file into MySQL as @hidden_4003 suggests. – Paflow Feb 21 '20 at 10:26
  • @Besarion I have SQL Server Management studio 17 ? – Gangrel Feb 21 '20 at 10:26
  • @hidden_4003 Thanks but ironically that's the data will end up, although performance issues are preventing me from connecting directly to all of the CSV files and then using the binary extract function on each file. – Gangrel Feb 21 '20 at 10:27
  • @Ryan Benthan What might work then if you an SSIS Package, create a loop over all the files then import them into a table. Just create for For Loop container then you can create an ODBC import from a csv file to the database and because theyre the same format it wont need changing. – Besarion Feb 21 '20 at 10:28
  • @Paflow thanks. What is performance like on this? – Gangrel Feb 21 '20 at 10:29
  • @Besarion I've also read about using python to move the data. Do you think an SSIS package would be more suitable than a python script for example? – Gangrel Feb 21 '20 at 10:31

1 Answers1

1

This would a basic outline for your python script: supplied by

fout=open("out.csv","a")
# first file:
for line in open("sh1.csv"):
    fout.write(line)
# now the rest:    
for num in range(2,201):
    f = open("sh"+str(num)+".csv")
    f.next() # skip the header
    for line in f:
         fout.write(line)
    f.close() # not really needed
fout.close()

Another example

Another example which should work for you

Besarion
  • 129
  • 9