5

I'm using Python and pandas to try to download the data in the FDA's 'Adverse drug event reports since 2004'.

Here is my code:

import pandas as pd
import json
from pandas.io.json import json_normalize
my_api_key = '..(my API code which I requested)...'
from_date = '20040101'
to_date = '20041231'
url = 'https://api.fda.gov/drug/event.json?api_key=' + my_api_key + \
    '&search=receivedate:[' + from_date + '+TO+' + to_date + ']'
print url
json_df = pd.read_json(url)

Here is the traceback on the error:

https://api.fda.gov/drug/event.json?api_key=.....&search=receivedate:[20040101+TO+20041231]
---------------------------------------------------------------------------
URLError                                  Traceback (most recent call last)
<ipython-input-47-dcb3adcfb254> in <module>()
      3 #TODO: This is not working.
      4 # URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>
----> 5 json_df = pd.read_json(url)

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
    185     """
    186 
--> 187     filepath_or_buffer, _, _ = get_filepath_or_buffer(path_or_buf)
    188     if isinstance(filepath_or_buffer, compat.string_types):
    189         try:

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/common.pyc in get_filepath_or_buffer(filepath_or_buffer, encoding, compression)
    306 
    307     if _is_url(filepath_or_buffer):
--> 308         req = _urlopen(str(filepath_or_buffer))
    309         if compression == 'infer':
    310             content_encoding = req.headers.get('Content-Encoding', None)

/Users/billtubbs/anaconda/lib/python2.7/urllib2.pyc in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    152     else:
    153         opener = _opener
--> 154     return opener.open(url, data, timeout)
    155 
    156 def install_opener(opener):

/Users/billtubbs/anaconda/lib/python2.7/urllib2.pyc in open(self, fullurl, data, timeout)
    429             req = meth(req)
    430 
--> 431         response = self._open(req, data)
    432 
    433         # post-process response

/Users/billtubbs/anaconda/lib/python2.7/urllib2.pyc in _open(self, req, data)
    447         protocol = req.get_type()
    448         result = self._call_chain(self.handle_open, protocol, protocol +
--> 449                                   '_open', req)
    450         if result:
    451             return result

/Users/billtubbs/anaconda/lib/python2.7/urllib2.pyc in _call_chain(self, chain, kind, meth_name, *args)
    407             func = getattr(handler, meth_name)
    408 
--> 409             result = func(*args)
    410             if result is not None:
    411                 return result

/Users/billtubbs/anaconda/lib/python2.7/urllib2.pyc in https_open(self, req)
   1238         def https_open(self, req):
   1239             return self.do_open(httplib.HTTPSConnection, req,
-> 1240                 context=self._context)
   1241 
   1242         https_request = AbstractHTTPHandler.do_request_

/Users/billtubbs/anaconda/lib/python2.7/urllib2.pyc in do_open(self, http_class, req, **http_conn_args)
   1195         except socket.error, err: # XXX what error?
   1196             h.close()
-> 1197             raise URLError(err)
   1198         else:
   1199             try:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>

Any ideas on what this is caused by or how to fix it?

philshem
  • 17,647
  • 7
  • 68
  • 170
Bill
  • 153
  • 1
  • 5

3 Answers3

4

I'm not sure what underlying HTTP packages pandas uses, but I found that the default settings of the requests package in python does not work with SSL and this site:

import requests
url = 'https://api.fda.gov/drug/event.json?'
print requests.get(url)

gives an error like:

requests.exceptions.SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

But if you add verify=False to the options, then it works correctly:

import requests
url = 'https://api.fda.gov/drug/event.json?'
print requests.get(url, verify=False).text

Requests can also ignore verifying the SSL certificate if you set verify to False. (source)


Probably you can then import requests and then use this line of code:

json_df = pd.read_json(requests.get(url, verify=False).json())
philshem
  • 17,647
  • 7
  • 68
  • 170
  • Thanks for this. It works. Now all I get is a warning about not verifying the certificate which I assume I can ignore. InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html InsecureRequestWarning) – Bill Jul 13 '16 at 17:40
1

I would try to do a request of the data first.

import requests

json_df = pd.read_json(requests.get(url).json())
Hans Nelsen
  • 619
  • 3
  • 6
0

You can try this. Similar to Hans Nelsons' answer:

import json
import pandas as pd
import requests 

my_api_key = '..(my API code which I requested)...'
from_date = '20040101'
to_date = '20041231'
url = 'https://api.fda.gov/drug/event.json?api_key=' + my_api_key + \
    '&search=receivedate:[' + from_date + '+TO+' + to_date + ']'

response = requests.get(url)
data = response.json()
json_results = json.dumps(data)
resultsDf = pd.read_json(json_results)
sauf
  • 1