0

I have a table, that looks like this:

date                     id
0   11:09:27 Nov. 26 2020   94857
1   10:49:26 Okt. 26 2020   94853
2   10:48:24 Sept. 26 2020  94852
3   9:26:33 Aug. 26 2020    94856
4   9:26:33 Jul. 26 2020    94851
5   9:24:38 Dez. 26 2020    94850
6   9:24:38 Jan. 26 2020    94849
7   9:09:08 Jun. 27 2019    32148
8   9:02:41 Mai 27 2019 32145
9   9:02:19 Apr. 27 2019    32144
10  9:02:05 Mrz. 27 2019    32143
11  9:02:05 Feb. 27 2019    32140

(initial table)

the date column format now is 'object', I'm trying to change it to 'datetime' using

df['date'] = pd.to_datetime(df['date'], format ='HH:MM:SS-%mm-%dd-%YYYY', errors='coerce')

and receive only NaT as a result. The problem is that the names of the months here are not standart. For example, Mai comes without a dot in the end. What's the best way to convert its format?

Tallie
  • 3
  • 2

3 Answers3

0

The following format works for most of your data:

format="%H:%M:%S %b. %d %Y"

H stands for Hours, M for minutes, S for seconds, b for abbreviated months, and Y for year.

As said by Justin in the comments, your month abbreviations are off. These four characters abbreviations are unconventional, you should format your string to remove the last character of the month if it is 4 characters long. If it is 3 characters long then leave it like it is.

EDIT:

Note that in your dataset, the abbreviations are ended by a ".", hence the dot in the string format.

robinood
  • 442
  • 3
  • 9
  • This failed because of 'Sept.' does not match %b. - Strange, yet it works by default is date precedes time??? – frankr6591 Nov 27 '20 at 15:12
0

Your date column has a complicated format, so just change the format of your pd.to_datetime function:

# 11:09:27 Nov.26 2020 ---> '%I:%M:%S %b.%d %Y'
df['date'] = pd.to_datetime(df['date'], format ='%I:%M:%S %b. %d %Y', errors='coerce')

 
 output: 2020-11-26 11:09:27
ljuk
  • 610
  • 2
  • 10
0

This works for me... even with inconsistency

pd.to_datetime(df.date.str[9:]+' '+df.date.str[0:8])

Input (random generated dates, changed 7 to have Sept.)

                     date
0   19:06:04 Mar. 19 2020
1   17:27:11 Mar. 05 2020
2   07:17:04 May. 05 2020
3   04:53:50 Sep. 23 2020
4   03:43:20 Jun. 23 2020
5   17:35:00 Mar. 06 2020
6   06:04:48 Jan. 15 2020
7  12:26:14 Sept. 18 2020
8   03:21:10 Jun. 03 2020
9   17:37:00 Aug. 26 2020

output

0   2020-03-19 19:06:04
1   2020-03-05 17:27:11
2   2020-05-05 07:17:04
3   2020-09-23 04:53:50
4   2020-06-23 03:43:20
5   2020-03-06 17:35:00
6   2020-01-15 06:04:48
7   2020-09-18 12:26:14
8   2020-06-03 03:21:10
9   2020-08-26 17:37:00

The following works for all months MMM, but fails due to 'Sept.' month. Strange because if date precedes time, by default it parses it correctly (ie. coerce code seems to work when precedes) ???

pd.to_datetime(df['date'].astype(str), format ="%H:%M:%S %b. %d %Y", 

errors='coerce')

frankr6591
  • 1,103
  • 1
  • 7
  • 14