89

How do I get the name of a DataFrame and print it as a string?

Example:

boston (var name assigned to a csv file)

import pandas as pd
boston = pd.read_csv('boston.csv')

print('The winner is team A based on the %s table.) % boston
jpp
  • 147,904
  • 31
  • 244
  • 302
leo
  • 1,385
  • 2
  • 10
  • 10
  • 1
    Do you mean variable name? – Anand S Kumar Jul 30 '15 at 15:03
  • 3
    It's worth reading [this](http://stackoverflow.com/questions/544919/can-i-print-original-variables-name-in-python) and [this](http://stackoverflow.com/questions/592746/how-can-you-print-a-variable-name-in-python), and the comments and links therein. – kwinkunks Jul 30 '15 at 15:07

6 Answers6

70

You can name the dataframe with the following, and then call the name wherever you like:

import pandas as pd
df = pd.DataFrame( data=np.ones([4,4]) )
df.name = 'Ones'

print df.name
>>>
Ones

Hope that helps.

ajsp
  • 2,193
  • 19
  • 32
  • 3
    I need to have the name as a variable. import pandas as pd df = pd.DataFrame( data=np.ones([4,4]) ) df.name = 'df' print df.name >>> df – leo Jul 30 '15 at 16:19
  • 11
    For posterity, as of v 0.18.1 this does [not survive pickling](https://github.com/pandas-dev/pandas/issues/447#issuecomment-10949838) (for v 0.18.1 use `to_pickle`/ `read_pickle` instead of `save`/`load` if trying to reproduce the GitHub comment). – tmthydvnprt Jan 05 '17 at 16:28
  • 6
    A workaround I found is to place your `DataFrame`'s name in the index's name attribute (e.g. `df.index.name = 'Ones'`). This is maintained during pickling. This only works if your `DataFrame`'s index is not already named something useful... – tmthydvnprt Jan 05 '17 at 16:33
  • FYI, this was found while using `DataFrame`s inside `multiprocessing.Pool()` workers. The attributes were not maintained during `.map()` because of the pickling it uses. – tmthydvnprt Jan 05 '17 at 16:50
  • 9
    This is a poor idea because if you as much as `drop` something, the returned object will no longer have a `name` attribute. It's tempting, but will create inexplicable errors down the line. – sapo_cosmico Aug 01 '18 at 15:06
  • 7
    Really veru bad idea. If you call df.name = Ones is the same than df['name] = 'Ones'. it means the valiues for that column will be 'One'. SO it is not a correct answer. You can stor your dataframes within a dictionary and use the key to identify them –  Apr 01 '19 at 15:01
40

Sometimes df.name doesn't work.

you might get an error message:

'DataFrame' object has no attribute 'name'

try the below function:

def get_df_name(df):
    name =[x for x in globals() if globals()[x] is df][0]
    return name
otmezger
  • 9,770
  • 18
  • 60
  • 88
Min
  • 441
  • 4
  • 2
  • 4
    It will throw ` 'DataFrame' object has no attribute 'name'` when it doesn't assign any name – Mohamed Thasin ah Nov 20 '18 at 07:33
  • 4
    Just to make sure people aren't confused: what the snippet here does is to find the dataframe in all currently defined global variables and return its variable name. This is **NOT** guaranteed to work (e.g. your DF is a local variable) and there are no error handling mechanisms in place. You should only use this if you're sure what you're doing! – Zecong Hu Dec 08 '20 at 15:43
23

In many situations, a custom attribute attached to a pd.DataFrame object is not necessary. In addition, note that pandas-object attributes may not serialize. So pickling will lose this data.

Instead, consider creating a dictionary with appropriately named keys and access the dataframe via dfs['some_label'].

df = pd.DataFrame()

dfs = {'some_label': df}
jpp
  • 147,904
  • 31
  • 244
  • 302
13

From here what I understand DataFrames are:

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects.

And Series are:

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

Series have a name attribute which can be accessed like so:

 In [27]: s = pd.Series(np.random.randn(5), name='something')

 In [28]: s
 Out[28]: 
 0    0.541
 1   -1.175
 2    0.129
 3    0.043
 4   -0.429
 Name: something, dtype: float64

 In [29]: s.name
 Out[29]: 'something'

EDIT: Based on OP's comments, I think OP was looking for something like:

 >>> df = pd.DataFrame(...)
 >>> df.name = 'df' # making a custom attribute that DataFrame doesn't intrinsically have
 >>> print(df.name)
 'df'
aznbanana9
  • 869
  • 4
  • 18
1

Here is a sample function: 'df.name = file` : Sixth line in the code below

def df_list():
    filename_list = current_stage_files(PATH)
    df_list = []
    for file in filename_list:
        df = pd.read_csv(PATH+file)
        df.name = file
        df_list.append(df)
    return df_list
dcurrie27
  • 151
  • 1
  • 9
Arjjun
  • 1,059
  • 15
  • 14
0

I am working on a module for feature analysis and I had the same need as yours, as I would like to generate a report with the name of the pandas.Dataframe being analyzed. To solve this, I used the same solution presented by @scohe001 and @LeopardShark, originally in https://stackoverflow.com/a/18425523/8508275, implemented with the inspect library:

import inspect

def aux_retrieve_name(var):
    callers_local_vars = inspect.currentframe().f_back.f_back.f_locals.items()
    return [var_name for var_name, var_val in callers_local_vars if var_val is var]

Note the additional .f_back term since I intend to call it from another function:

def header_generator(df):
    print('--------- Feature Analyzer ----------')
    print('Dataframe name: "{}"'.format(aux_retrieve_name(df)))
    print('Memory usage: {:03.2f} MB'.format(df.memory_usage(deep=True).sum() / 1024 ** 2))
    return

Running this code with a given dataframe, I get the following output:

header_generator(trial_dataframe)

--------- Feature Analyzer ----------
Dataframe name: "trial_dataframe"
Memory usage: 63.08 MB

tbnsilveira
  • 369
  • 2
  • 7