-1

I'm trying to convert a string column in a dataframe to int. The strings should be replaced with an integer as a key value.

Data:

user_id site_id 
100     url1.com 
100     url2.com 
100     url1.com 
101     url2.com 
101     url2.com 
101     url2.com

Wanted output:

user_id site_id 
100     1 
100     2 
100     1 
101     2 
101     2 
101     2

I tried to get all unique urls with:

names = pd.unique(df.site_id.ravel()) 
urls = pd.Series(np.arange(len(names)), names) 

and then

df["site_id"] = df.applymapp(urls.get)
Julien Marrec
  • 10,632
  • 4
  • 40
  • 61
Duesentrieb
  • 382
  • 2
  • 6
  • 17

1 Answers1

1

You want factorize to encode the values to ints:

In [52]:
df['site_id'] = pd.factorize(df['site_id'])[0] + 1
df

Out[52]:
   user_id  site_id
0      100        1
1      100        2
2      100        1
3      101        2
4      101        2
5      101        2

here factorize returns an array:

In [53]:
pd.factorize(df['site_id'])

Out[53]:
(array([0, 1, 0, 1, 1, 1], dtype=int64), Int64Index([1, 2], dtype='int64'))

we want the encoded values in the tuple and add 1 to each:

pd.factorize(df['site_id'])[0] + 1
EdChum
  • 339,461
  • 188
  • 752
  • 538