Value Label in pandas?

Question

I am fairly new to pandas and come from a statistics background and I am struggling with a conceptual problem: Pandas has columns, who are containing values. But sometimes values have a special meaning - in a statistical program like SPSS or R called a "value labels".

Imagine a column rain with two values 0 (meaning: no rain) and 1 (meaning: raining). Is there a way to assign these labels to that values?

Is there a way to do this in pandas, too? Mainly for platting and visualisation purposes.

Do you want to store the values as strings or assign some special meaning later? i.e. use a lookup or add a new column that maps the values to human friendly values? Or do you just want this information in the legend of your plot? — EdChum, Mar 19 '14 at 08:31
@EdChum Ideally, I want no new column at all - e.g. in SPSS the label is frequently used for displaying data in tables, plots etc. but you can use the numeric value for conditional. At my work, I often have variables with 30+ different "labels" per column - having the associated strings visible would be huge help (e.g. avoiding the "what was the meaning of 21?"-question) — Christian Sauer, Mar 19 '14 at 08:38
You could add it as an attribute which is general to Python and not specific to Pandas and access it for your plots see related: http://stackoverflow.com/questions/14688306/adding-meta-information-metadata-to-pandas-dataframe — EdChum, Mar 19 '14 at 08:42
That would probably not be used by any normal porcudeure, but thanks for the suggestion! — Christian Sauer, Mar 19 '14 at 09:36

score 5 · Accepted Answer · answered Sep 23 '15 at 20:14

There's not need to use a map anymore. Since version 0.15, Pandas allows a categorical data type for its columns. The stored data takes less space, operations on it are faster and you can use labels.

I'm taking an example from the pandas docs:

df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
#Recast grade as a categorical variable
df["grade"] = df["raw_grade"].astype("category")

df["grade"]

#Gives this:
Out[124]: 
0    a
1    b
2    b
3    a
4    a
5    e
Name: grade, dtype: category
Categories (3, object): [a, b, e]

You can also rename categories and add missing categories

Thank you for the update. I will accept it, since it is more correct for new readers — Christian Sauer, Sep 24 '15 at 08:29

score 4 · Answer 2 · answered Mar 19 '14 at 09:27

4

You could have a separate dictionary which maps values to labels:

 d={0:"no rain",1:"raining"}

and then you could access the labelled data by doing

 df.rain_column.apply(lambda x:d[x])

answered Mar 19 '14 at 09:27

grasshopper

3,841
3
22
28

1

`map` might be better for this simple case – EdChum Mar 19 '14 at 09:30
What is the difference in this case? – grasshopper Mar 19 '14 at 09:34
3

Only better in terms of simpler syntax: `df.rain_column.map(d)`, and perhaps faster performance-wise, it depends on data size and type for a dataframe with 100 rows then `apply` is marginally faster (apply 228 us vs map 287us), for one with 10000 rows then map is 26 times faster (map is 512 us vs apply 13 ms) – EdChum Mar 19 '14 at 10:10
Alright, this makes a lot of sense, since apply is more general purpose than map. – grasshopper Mar 19 '14 at 10:12
I will accept cd98 answer which is better for newer versions of pandas, if that's ok for you. – Christian Sauer Sep 24 '15 at 08:30

Value Label in pandas?

2 Answers2