1

I'm looking for a way to concat numbers stored in a variable and using it as variable names in python (or interpolated as a string). Something like this:

for NUMBER in range(2000,2020,1):
    df_NUMBER = pd.read_csv('data_NUMBER.csv')
    df_NUMBER['year'] = NUMBER 

is it posible? In stata would be like:

forvalues NUMBER = 2000(1)2019 {
    import delimited using data_`NUMBER'.csv
    generate year = `NUMBER'
}
diwenzo
  • 13
  • 2
  • 2
    Technically it can be done (in a very convoluted way), but what's wrong with just using a dictionary?! – Błotosmętek Aug 21 '20 at 21:01
  • Are you asking how to dynamically generate variable names? This is a very bad idea. Can you show your intended output and also explain why you want to do it? Or are you asking how to create a string with a changing number in it in order to reference different file names? If that latter, it is trivial with string interpolation. – Dan Aug 21 '20 at 21:02
  • 1
    Does this answer your question? [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) – AMC Aug 21 '20 at 21:09
  • 2
    Don't dynamically create variables. You can only really do it in global scopes, not local scopes, and it is **almost certainly** a poor design decision. – juanpa.arrivillaga Aug 21 '20 at 21:09
  • poor design but if you want to do this and learn the hard way, I think that falls on you! here is what you'd put inside your loop: `exec("df_%d = pd.read_csv(\'data_%d.csv\')" % (NUMBER, NUMBER))` – d_kennetz Aug 21 '20 at 21:17

2 Answers2

0

Don't try create variable names dynamically, it is a very bad idea. Rather just store the dataframes in a collection. In this case a dictionary might make sense. Otherwise just a single longform dataframe which will make analysis easier:

option 1:

dfs = {}
for year in range(2000, 2020):
    dfs[year] = pd.read_csv(f"data_{year}.csv")
    dfs[year]["year"] = year

option 2:

dfs = []
for year in range(2000, 2020):
    df = pd.read_csv(f"data_{year}.csv")
    df["year"] = year
    dfs.append(df)
df_all = pd.concat(dfs).reset_index(drop=True)

You probably want option 2 in this case as you can then go on to perform your analyses on all the years at once using .groupby("year").

Dan
  • 44,224
  • 16
  • 81
  • 148
0

I suggest going the dictionary way:

df = {}
for year in range(2000,2020):
    filename =  'data_{}.csv'.format(year)
    df[year] = pd.read_csv(filename)
    df[year]['year'] = year
Błotosmętek
  • 12,365
  • 18
  • 29