406

I have one field in a pandas DataFrame that was imported as string format. It should be a datetime variable. How do I convert it to a datetime column and then filter based on date.

Example:

  • DataFrame Name: raw_data
  • Column Name: Mycol
  • Value Format in Column: '05SEP2014:00:00:00.000'
Mazdak
  • 100,514
  • 17
  • 155
  • 179
Chris
  • 11,450
  • 11
  • 37
  • 62

7 Answers7

698

Use the to_datetime function, specifying a format to match your data.

raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
atwalsh
  • 3,452
  • 1
  • 17
  • 38
chrisb
  • 44,957
  • 8
  • 61
  • 62
  • 136
    Note: the `format` argument isn't required. `to_datetime` is smart. Go ahead and try it without trying to match your data. – samthebrand Apr 22 '17 at 18:54
  • 8
    In order to avoid the `SettingWithCopyWarning` use the @darth-behfans https://stackoverflow.com/a/42773096/4487805 – Álvaro Loza Oct 16 '17 at 10:41
  • 4
    What if you just want time and not date? – FaCoffee Oct 30 '17 at 14:45
  • 13
    Not terribly smart. Even if some of the column is unambiguously in dayfirst=True format, it will still default to dayfirst=False for the others in the same column. So, safer to use an explicit format specification or at least the dayfirst parameter. – CPBL Apr 22 '18 at 20:36
  • 21
    Omitting the format string can cause this operation to be slow with lots of records. [This answer](https://stackoverflow.com/a/32034914/2759780) discusses why. Looks like `infer_datetime_format=True` could also increase parsing speed up to ~5-10x (according to pandas docs) if you don't include a format string. – atwalsh May 05 '18 at 19:39
  • To add to what @samthebrand said: `to_datetime` will replace any missing info with the first of the period. `'2019'` becomes 12 am of Jan 1, 2019; `'2019-2'` becomes 12 am of Feb 1, 2019; `'2019-3-2'` becomes 12am of March 2, 2019. Etc. – BallpointBen Jan 13 '19 at 05:02
  • Also setting the `cache=True` when calling `to_datetime` can have speed ups when duplicate dates and timezone offsets. See [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) – dustindorroh May 30 '19 at 12:13
  • @samthebrand I don't recommend letting the 'to_datetime' function infer the date schema. When I did it, it misjudge month by day. I believe It is safer to pass the date mask yourself. Here is the link that helps with the symbols used for this task: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior – lfvv Feb 11 '21 at 16:27
  • I have a problem with this solution. The converted column becomes a Timestamp. Is there a direct way to convert it to datetime.date (that is, only the date portion of the Timestamp)? – O. Mohsen Aug 07 '21 at 06:04
86

If you have more than one column to be converted you can do the following:

df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)
Vlad Bezden
  • 72,691
  • 22
  • 233
  • 168
  • 3
    I needed to do the following to specify format ```states_df[['from_datetime','to_datetime','timestamp']].apply(lambda _: pd.to_datetime(_,format='%Y-%m-%d %H:%M:%S.%f', errors='coerce'))``` – Rudiger Wolf May 30 '21 at 05:30
64

You can use the DataFrame method .apply() to operate on the values in Mycol:

>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'],columns=['Mycol'])
>>> df
                    Mycol
0  05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d%b%Y:%H:%M:%S.%f'))
>>> df
       Mycol
0 2014-09-05
mechanical_meat
  • 155,494
  • 24
  • 217
  • 209
  • 2
    Thanks! This is nice because it is more broadly applicable but the other answer was more direct. I had a hard time deciding which I liked better :) – Chris Nov 05 '14 at 17:56
  • 7
    I like this answer better, because it produces a datetime object as opposed to a pandas.tslib.Timestamp object – wesanyer Dec 07 '15 at 18:51
34

Use the pandas to_datetime function to parse the column as DateTime. Also, by using infer_datetime_format=True, it will automatically detect the format and convert the mentioned column to DateTime.

import pandas as pd
raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)
RobC
  • 20,007
  • 20
  • 62
  • 73
Prateek Sharma
  • 1,003
  • 11
  • 10
  • 1
    combine two or more sheets can be pain in the neck, especially when datetime involved. this infer_datetime_format saved me big time. thx chief! – Mike_Leigh Jun 02 '21 at 15:29
  • Happy to help @Mike_Leigh !! Also, according to the [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html), setting `infer_datetime_format=True` can increase the parsing speed by ~5-10x, in some cases. – Prateek Sharma Jun 05 '21 at 18:47
21
raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

works, however it results in a Python warning of A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

I would guess this is due to some chaining indexing.

Petter Friberg
  • 20,644
  • 9
  • 57
  • 104
Darth BEHFANS
  • 361
  • 4
  • 10
  • 10
    Took me a few tries, yet this works: **raw_data.loc[:,'Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')** – pinegulf Jan 21 '20 at 07:14
  • This worked for me: raw_data.loc[:,'Mycol'] = pd.to_datetime(raw_data.loc[:,'Mycol'], format='%d%b%Y:%H:%M:%S.%f') – Josh Janjua Jun 30 '20 at 16:54
  • df2.loc[:,'datetime'] = pd.to_datetime(df2['datetime']) /usr/lib/python3/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self.obj[item] = s – Arrow_Raider Jul 01 '20 at 21:23
  • Or just reset index on df copy – JessicaRyan May 27 '22 at 08:58
10

Time Saver:

raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'])
Gil Baggio
  • 11,049
  • 3
  • 46
  • 33
-1

It is important to note that pandas.to_datetime will almost never return a datetime.datetime. From the docs

Blockquote

Returns datetime
If parsing succeeded. Return type depends on input:

list-like: DatetimeIndex
Series: Series of datetime64 dtype
scalar: Timestamp

In case when it is not possible to return designated types (e.g. when any element 
of input is before Timestamp.min or after Timestamp.max) return will have 
datetime.datetime type (or corresponding array/Series).

Blockquote

hotplasma
  • 39
  • 3
  • 1
    This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/29994044) – Jan Wilamowski Oct 05 '21 at 07:54