Get the name of a pandas DataFrame

Question

How do I get the name of a DataFrame and print it as a string?

Example:

boston (var name assigned to a csv file)

import pandas as pd
boston = pd.read_csv('boston.csv')

print('The winner is team A based on the %s table.) % boston

It's worth reading [this](http://stackoverflow.com/questions/544919/can-i-print-original-variables-name-in-python) and [this](http://stackoverflow.com/questions/592746/how-can-you-print-a-variable-name-in-python), and the comments and links therein. — kwinkunks, Jul 30 '15 at 15:07

score 70 · Answer 1 · answered Jul 30 '15 at 15:09

70

You can name the dataframe with the following, and then call the name wherever you like:

import pandas as pd
df = pd.DataFrame( data=np.ones([4,4]) )
df.name = 'Ones'

print df.name
>>>
Ones

Hope that helps.

answered Jul 30 '15 at 15:09

ajsp

2,193
19
32

3

I need to have the name as a variable. import pandas as pd df = pd.DataFrame( data=np.ones([4,4]) ) df.name = 'df' print df.name >>> df – leo Jul 30 '15 at 16:19
11

For posterity, as of v 0.18.1 this does [not survive pickling](https://github.com/pandas-dev/pandas/issues/447#issuecomment-10949838) (for v 0.18.1 use `to_pickle`/ `read_pickle` instead of `save`/`load` if trying to reproduce the GitHub comment). – tmthydvnprt Jan 05 '17 at 16:28
6

A workaround I found is to place your `DataFrame`'s name in the index's name attribute (e.g. `df.index.name = 'Ones'`). This is maintained during pickling. This only works if your `DataFrame`'s index is not already named something useful... – tmthydvnprt Jan 05 '17 at 16:33
FYI, this was found while using `DataFrame`s inside `multiprocessing.Pool()` workers. The attributes were not maintained during `.map()` because of the pickling it uses. – tmthydvnprt Jan 05 '17 at 16:50
9

This is a poor idea because if you as much as `drop` something, the returned object will no longer have a `name` attribute. It's tempting, but will create inexplicable errors down the line. – sapo_cosmico Aug 01 '18 at 15:06
7

Really veru bad idea. If you call df.name = Ones is the same than df['name] = 'Ones'. it means the valiues for that column will be 'One'. SO it is not a correct answer. You can stor your dataframes within a dictionary and use the key to identify them – Apr 01 '19 at 15:01

score 40 · Answer 2 · edited Sep 28 '18 at 18:47

40

Sometimes df.name doesn't work.

you might get an error message:

'DataFrame' object has no attribute 'name'

try the below function:

def get_df_name(df):
    name =[x for x in globals() if globals()[x] is df][0]
    return name

edited Sep 28 '18 at 18:47

otmezger

9,770
18
60
88

answered May 31 '18 at 08:42

Min

441
4
2

4

It will throw ` 'DataFrame' object has no attribute 'name'` when it doesn't assign any name – Mohamed Thasin ah Nov 20 '18 at 07:33
4

Just to make sure people aren't confused: what the snippet here does is to find the dataframe in all currently defined global variables and return its variable name. This is **NOT** guaranteed to work (e.g. your DF is a local variable) and there are no error handling mechanisms in place. You should only use this if you're sure what you're doing! – Zecong Hu Dec 08 '20 at 15:43

score 23 · Answer 3 · answered May 31 '18 at 08:49

In many situations, a custom attribute attached to a pd.DataFrame object is not necessary. In addition, note that pandas-object attributes may not serialize. So pickling will lose this data.

Instead, consider creating a dictionary with appropriately named keys and access the dataframe via dfs['some_label'].

df = pd.DataFrame()

dfs = {'some_label': df}

aznbanana9 · Answer 4 · 2015-07-30T18:04:57.853

13

From here what I understand DataFrames are:

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects.

And Series are:

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

Series have a name attribute which can be accessed like so:

 In [27]: s = pd.Series(np.random.randn(5), name='something')

 In [28]: s
 Out[28]: 
 0    0.541
 1   -1.175
 2    0.129
 3    0.043
 4   -0.429
 Name: something, dtype: float64

 In [29]: s.name
 Out[29]: 'something'

EDIT: Based on OP's comments, I think OP was looking for something like:

 >>> df = pd.DataFrame(...)
 >>> df.name = 'df' # making a custom attribute that DataFrame doesn't intrinsically have
 >>> print(df.name)
 'df'

edited Jul 30 '15 at 18:04

answered Jul 30 '15 at 15:11

aznbanana9

869
4
18

2

i need the name to be a variable somewhat like name=
– leo Jul 30 '15 at 16:18
2

What do you mean variable? Like calling `df` prints the name `"df"` instead of printing the dataframe? – aznbanana9 Jul 30 '15 at 16:30
4

Yes. That's what I meant. – leo Jul 30 '15 at 16:36
But how do you want the data frame to be called? – aznbanana9 Jul 30 '15 at 17:53
4

say the name of the file is apple.csv. I want it to get printed like The file came from apple. --- only that apple has to be dynamic depending on the name of the csv file. – leo Jul 31 '15 at 09:50
2

@leo, any solution to this? did you get the dataframe name without the quotes? – IndigoChild Feb 23 '18 at 14:30

score 1 · Answer 5 · edited Oct 07 '21 at 13:12

1

Here is a sample function: 'df.name = file` : Sixth line in the code below

def df_list():
    filename_list = current_stage_files(PATH)
    df_list = []
    for file in filename_list:
        df = pd.read_csv(PATH+file)
        df.name = file
        df_list.append(df)
    return df_list

edited Oct 07 '21 at 13:12

dcurrie27

151
1
9

answered May 25 '20 at 18:33

Arjjun

1,059
15
14

score 0 · Answer 6 · answered May 17 '22 at 00:39

I am working on a module for feature analysis and I had the same need as yours, as I would like to generate a report with the name of the pandas.Dataframe being analyzed. To solve this, I used the same solution presented by @scohe001 and @LeopardShark, originally in https://stackoverflow.com/a/18425523/8508275, implemented with the inspect library:

import inspect

def aux_retrieve_name(var):
    callers_local_vars = inspect.currentframe().f_back.f_back.f_locals.items()
    return [var_name for var_name, var_val in callers_local_vars if var_val is var]

Note the additional .f_back term since I intend to call it from another function:

def header_generator(df):
    print('--------- Feature Analyzer ----------')
    print('Dataframe name: "{}"'.format(aux_retrieve_name(df)))
    print('Memory usage: {:03.2f} MB'.format(df.memory_usage(deep=True).sum() / 1024 ** 2))
    return

Running this code with a given dataframe, I get the following output:

header_generator(trial_dataframe)

--------- Feature Analyzer ----------
Dataframe name: "trial_dataframe"
Memory usage: 63.08 MB

Get the name of a pandas DataFrame

6 Answers6

Linked