9

Following in the spirit of this answer, I attempted the following to convert a DataFrame column of datetimes to a column of seconds since the epoch.

df['date'] = (df['date']+datetime.timedelta(hours=2)-datetime.datetime(1970,1,1))
df['date'].map(lambda td:td.total_seconds())

The second command causes the following error which I do not understand. Any thoughts on what might be going on here? I replaced map with apply and that didn't help matters.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-99-7123e823f995> in <module>()
----> 1 df['date'].map(lambda td:td.total_seconds())

/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/pandas-0.12.0_937_gb55c790-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in map(self, arg, na_action)
   1932             return self._constructor(new_values, index=self.index).__finalize__(self)
   1933         else:
-> 1934             mapped = map_f(values, arg)
   1935             return self._constructor(mapped, index=self.index).__finalize__(self)
   1936 

/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/pandas-0.12.0_937_gb55c790-py2.7-macosx-10.8-x86_64.egg/pandas/lib.so in pandas.lib.map_infer (pandas/lib.c:43628)()

<ipython-input-99-7123e823f995> in <lambda>(td)
----> 1 df['date'].map(lambda td:td.total_seconds())

AttributeError: 'float' object has no attribute 'total_seconds'
Community
  • 1
  • 1
Chris
  • 2,911
  • 6
  • 27
  • 36
  • It seems the 'date' column may not be a datetime64 in the first place? – Zeugma Oct 23 '13 at 22:31
  • 1
    Does the column contain any missing values? Missing values usually cause pandas Series to be cast to floats, causing weirdness when you try to interpret them as datetimes. – Abe Oct 23 '13 at 23:04
  • @Abe Indeed you were right. Turns out there is some missing data in some of the records. Not what I anticipated... Ack. – Chris Oct 24 '13 at 01:15

1 Answers1

15

Update:

In 0.15.0 Timedeltas became a full-fledged dtype.

So this becomes possible (as well as the methods below)

In [45]: s = Series(pd.timedelta_range('1 day',freq='1S',periods=5))                         

In [46]: s.dt.components
Out[46]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     1      0        0        0             0             0            0
1     1      0        0        1             0             0            0
2     1      0        0        2             0             0            0
3     1      0        0        3             0             0            0
4     1      0        0        4             0             0            0

In [47]: s.astype('timedelta64[s]')
Out[47]: 
0    86400
1    86401
2    86402
3    86403
4    86404
dtype: float64

Original Answer:

I see that you are on master (and 0.13 is coming out very shortly), so assuming you have numpy >= 1.7. Do this. See here for the docs (this is frequency conversion)

In [5]: df = DataFrame(dict(date = date_range('20130101',periods=10)))

In [6]: df
Out[6]: 
                 date
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2013-01-04 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2013-01-08 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00

In [7]: df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)
Out[7]: 
0   15706 days, 02:00:00
1   15707 days, 02:00:00
2   15708 days, 02:00:00
3   15709 days, 02:00:00
4   15710 days, 02:00:00
5   15711 days, 02:00:00
6   15712 days, 02:00:00
7   15713 days, 02:00:00
8   15714 days, 02:00:00
9   15715 days, 02:00:00
Name: date, dtype: timedelta64[ns]

In [9]: (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)) / np.timedelta64(1,'s')
Out[9]: 
0    1357005600
1    1357092000
2    1357178400
3    1357264800
4    1357351200
5    1357437600
6    1357524000
7    1357610400
8    1357696800
9    1357783200
Name: date, dtype: float64

The contained values are np.timedelta64[ns] objects, they don't have the same methods as timedelta objects, so no total_seconds().

In [10]: s = (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1))

In [11]: s[0]
Out[11]: numpy.timedelta64(1357005600000000000,'ns')

You can astype them to int, and you get back a ns unit.

In [12]: s[0].astype(int)
Out[12]: 1357005600000000000

You can do this as well (but only on an individual unit element).

In [18]: s[0].astype('timedelta64[s]')
Out[18]: numpy.timedelta64(1357005600,'s')
Jeff
  • 117,982
  • 20
  • 211
  • 179
  • 1
    Perfect Jeff, thank you! Once I got rid of the rows with missing data, I could directly apply this to the new problem. ;-) – Chris Oct 24 '13 at 01:26
  • you don't need to remove missing values (NaT); they will be returned as nan – Jeff Oct 24 '13 at 01:29
  • I don't have time right now to edit (and re-verify) the answer, but in fact version 0.13 now allows 'astype' operations to be applied to an entire Series. See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-deltas-conversions, or http://pandas.pydata.org/pandas-docs/stable/whatsnew.html and do a find-on-page for 'astype'. – TimStaley Mar 11 '14 at 19:01
  • I'm searching for a method to convert `Series` of `Timedelta`s to `seconds` for plotting. Your approach with `/ pd.Timedelta(seconds=1)` is a nice one. Stiil, I wonder if there is a better method. What do you think? – soupault May 22 '15 at 14:29
  • 1
    this answer actually is pretty outdated; Timedeltas became fully fledged types in 0.15.0. I will add another method. – Jeff May 22 '15 at 14:41