49

Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle this.

In other words, I want to convert a string like "Wed, 23 Sep 2009 22:15:29 GMT" to a python time-structure.

Sridhar Ratnakumar
  • 75,681
  • 63
  • 142
  • 179
Troels Arvin
  • 5,948
  • 2
  • 23
  • 26

4 Answers4

61
>>> import email.utils as eut
>>> eut.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)

If you want a datetime.datetime object, you can do:

def my_parsedate(text):
    return datetime.datetime(*eut.parsedate(text)[:6])
tzot
  • 87,612
  • 28
  • 135
  • 198
  • 5
    Yep, parsedate's probably the best compromise, though its "tolerant RFC 2822 parsing" is not 100% compatible with RFC 2616'2 demanding "MUST" -- e.g., epic fail on RFC 850 format with two-digit years, such as `Sunday, 06-Nov-94 08:49:37 GMT`, yet 2616 says a client MUST be able to parse RFC 850 dates (sigh). – Alex Martelli Sep 24 '09 at 15:19
  • email.Utils.parsedate seems sufficient, thanks. But it's confusing that it's sometimes called email.utils, and sometimes email.Utils. I guess that the email.Utils version is an old legacy variant which has been deprecated(?) – Troels Arvin Sep 24 '09 at 20:43
  • 1
    `email.utils.parsedate is email.Utils.parsedate -> True` It seems that *U*tils is a lazy loader. – jfs Sep 24 '09 at 22:24
  • 3
    Also note that email.util.parsedate() returns a tuple that can be passed directly to time.mktime() (this gives you a int of seconds from the epoch on your computer(local time, not UTC)). – driax Jun 15 '10 at 04:00
  • 2
    @driax: seconds since the Epoch doesn't depend on local timezone e.g., `0` means `1970-01-01T00:00:00Z` -- it is the same time instance around the world (local clock shows different values but the timestamp is exactly the same). Unless input timestring is in UTC (GMT); you should [use `mktime_tz(parsedate_tz())` instead](http://stackoverflow.com/a/26435566/4279) -- otherwise the info about the timezone is lost. – jfs Oct 22 '14 at 04:19
  • @J.F.Sebastian you're absolutely right. Not sure what I was trying say with "local time". I was probably frustrated that I hadn't found a mktime_tz function (what the heck is that doing in email.utils). Oh well :) – driax Oct 23 '14 at 01:32
  • 8
    In more recent versions of python you can use `email.utils.parsedate_to_datetime` – mgilbert Oct 19 '18 at 17:30
  • 1
    Let's keep code readable. If I found 'eut' in code, I'd have to dig around to find out what it is. I suggest you simply do `from email.utils import parsedate` (or, now that I've read the previous comment by @mgilbert `from email.utils import parsedate_to_datetime`). – Michael Scheper Oct 23 '18 at 20:21
  • Also see https://stackoverflow.com/a/8339750/14558 for the version that includes timezone parsing – andrewdotn Mar 07 '19 at 18:09
13

Since Python 3.3 there's email.utils.parsedate_to_datetime which can parse RFC 5322 timestamps (aka IMF-fixdate, Internet Message Format fixed length format, a subset of HTTP-date of RFC 7231).

>>> from email.utils import parsedate_to_datetime
... 
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... parsedate_to_datetime(s)
0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)

There's also undocumented http.cookiejar.http2time which can achieve the same as follows:

>>> from datetime import datetime, timezone
... from http.cookiejar import http2time
... 
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)

It was introduced in Python 2.4 as cookielib.http2time for dealing with Cookie Expires directive which is expressed in the same format.

Community
  • 1
  • 1
saaj
  • 20,370
  • 3
  • 89
  • 99
8
>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)
SilentGhost
  • 287,765
  • 61
  • 300
  • 288
  • yes, and it's fairly easy to extend to handle any format. while `email.utils.parse` is more robust, it's less transparent as well. – SilentGhost Sep 24 '09 at 16:42
  • +1 and thanks. because it said to avoid such comments. much clearer than "utils"-named modules – user237419 Feb 19 '14 at 17:54
2
httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
  • if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
  • if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
  • it can probably parse many date formats
  • httplib is in the core

NOTE:

  • had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
  • parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.

you can do this, if you only have that piece of string and you want to parse it:

>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>> 

but let me exemplify through mime messages:

import mimetools
import StringIO
message = mimetools.Message(
    StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)

or via http messages (responses)

>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)

right?

>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)

there, now we now more about date formats, mime messages, mime tools and their pythonic implementation ;-)

whatever the case, looks better than using email.utils for parsing http headers.

user237419
  • 8,363
  • 4
  • 29
  • 38
  • 2
    Seems at now(Dec. 2016) rfc 822 is deprecated, the email package is a prefered approach per the document. https://docs.python.org/2/library/rfc822.html – StanleyZ Dec 29 '16 at 03:24