22

My aim is to extract the html from all the links in the first page after entering the google search term. I work behind a proxy so this is my approach.

1.I first used mechanize to enter the search term in the form , ive set the proxies and robots correctly.

2.After extracting the links , Ive used an opener using urllib2.ProxyHandler globally , to open the urls individually.

However this gives me this error. Not able to figure it out.

urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol
Cœur
  • 34,719
  • 24
  • 185
  • 251
Manoj
  • 951
  • 3
  • 11
  • 36

2 Answers2

21

Instead of copying and editing Python library modules, you can monkey-patch ssl.wrap_socket() in the ssl module by overriding the ssl_version keyword parameter. The following code can be used as-is. Put this at the start of your program before making any requests.

import ssl
from functools import wraps
def sslwrap(func):
    @wraps(func)
    def bar(*args, **kw):
        kw['ssl_version'] = ssl.PROTOCOL_TLSv1
        return func(*args, **kw)
    return bar

ssl.wrap_socket = sslwrap(ssl.wrap_socket)
chnrxn
  • 1,259
  • 1
  • 15
  • 15
4

Its a known bug, how ever some solutions for it are mentioned in the comments of this link. See them , May be helpful to you, bug url.

NIlesh Sharma
  • 5,265
  • 6
  • 34
  • 53
  • Thank you, NIlesh. I found [this](https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/965371/comments/9) to be quite helpful, despite the fact that it might not be the best solution to just abandon TLS2. – Nick Merrill Feb 03 '13 at 08:13