Python Hacking – urlopen timeout issue

Recent playing with Python urllib2 reveals an interesting fact that the timeout parameter of urlopen() does not work sometimes. The interesting issue has successfully pushed me deep into the Python source code for debugging. The final debugging, without surprise, shows a bug of Python socket module implementation. This post shares the whole procedure of the issue, the debugging, the fix and the discussion. May it help your Python hacking:)

0. Issue

After Python 2.6, timeout parameter has been introduced into urllib2 urlopen(). You can use urlopen() in this way – urlObj = urllib2.urlopen(‘an_url’, timeout=20). Ideally, this will timeout the under-layer socket if it takes too long, such as below:

Traceback (most recent call last):
File “”, line 1, in
File “/Users/daveti/Python-2.7.8/Lib/”, line 127, in urlopen
return, data, timeout)
File “/Users/daveti/Python-2.7.8/Lib/”, line 404, in open
response = self._open(req, data)
File “/Users/daveti/Python-2.7.8/Lib/”, line 422, in _open
‘_open’, req)
File “/Users/daveti/Python-2.7.8/Lib/”, line 382, in _call_chain
result = func(*args)
File “/Users/daveti/Python-2.7.8/Lib/”, line 1217, in http_open
return self.do_open(httplib.HTTPConnection, req)
File “/Users/daveti/Python-2.7.8/Lib/”, line 1190, in do_open
r = h.getresponse(buffering=True)
File “/Users/daveti/Python-2.7.8/Lib/”, line 1076, in getresponse
File “/Users/daveti/Python-2.7.8/Lib/”, line 413, in begin
version, status, reason = self._read_status()
File “/Users/daveti/Python-2.7.8/Lib/”, line 369, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File “/Users/daveti/Python-2.7.8/Lib/”, line 487, in readline
data = self._sock.recv(self._rbufsize)
socket.timeout: timed out
>>> >>>

However, I have encountered an URL which invalidates the timeout parameter of urlopen, which had got stuck seemed forever. Fortunately, Ctrl-C still worked like a charm:

>>> obj=urllib2.urlopen(url, timeout=20)
^CTraceback (most recent call last):
File “”, line 1, in
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 127, in urlopen
return, data, timeout)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 410, in open
response = meth(req, response)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 523, in http_response
‘http’, request, response, code, msg, hdrs)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 442, in error
result = self._call_chain(*args)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 382, in _call_chain
result = func(*args)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 629, in http_error_302
return, timeout=req.timeout)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 404, in open
response = self._open(req, data)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 422, in _open
‘_open’, req)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 382, in _call_chain
result = func(*args)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 1222, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 1187, in do_open
r = h.getresponse(buffering=True)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 1045, in getresponse
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 441, in begin
self.msg = HTTPMessage(self.fp, 0)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 25, in __init__
rfc822.Message.__init__(self, fp, seekable)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 108, in __init__
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 280, in readheaders
line = self.fp.readline(_MAXLINE + 1)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 476, in readline
data = self._sock.recv(self._rbufsize)
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 241, in recv
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/”, line 160, in read

The last 2 traced modules are ssl and socket. Then we probably could make an argument that urlopen timeout may not work for SSL connections. So it looks like either the ssl module or the socket module does not have the timeout information. Let’s go deeper by adding some print stuffs into these modules to make it clear.

1. Python Hacking

Download the Python 2.7.8 source code from Building Python follows the traditional GNU 3 steps – configure/make/make install. We do not want to install this hacked version to mess around with the official one. So it is just configure/make. As this issue is related with ssl module, please make sure you have OpenSSL installed on your machine. If the build complains failure about _ssl module, go ahead to hack into, which is used to setup the building prerequisites for different modules, such as the header directory and library directory. If you have to build Python using Mac OS Yosemite, like the poor myself, go ahead to pull my from the code repo (listed at the very end), which should save some time.

The last trace is, which is implemented under Modules/_ssl.c. The read() function eventually is calling the real OpenSSL wrapper, PySSL_SSLread(). Reading thru this function, we will see timeout control there. Hmm, ssl module looks like innocent. So let’s go upper, checking the socket module which is under Lib/ Jumping directly to the line 476, we will find the traced call. Looking around, we will also find the traced call is within a ‘while True‘ loop. Yep, timeout control within ‘while True’ loop? So……

What happened when urlopen got stuck my here is: _sock.recv() tried to receive something and PySSL_SSLread() returned nothing without timeout. Then _sock.recv() was triggered again as well as PySSL_SSLread()……Thanks to the ‘while True‘, the code kept running.

2. Fix

Now we have got the root cause – within the ‘while True’ loop, the socket layer expects to receive something or any exceptions, such as timeout, from the SSL socket layer. Unfortunately, neither anything nor errors was reported to the socket to break the loop in our case. The fix for this is straightforward – adding timeout checking within this ‘while True’. All the debugging code and the fix are available at the puuth code repo at the bottom.

3. Discussion

So we have fixed the urlopen timeout issue, right? Well, for my case, yes; for yours, maybe not. When you check the last traced call within the httplib module, Lib/, you will find the issued call, self.fp.readline(_MAXLINE + 1), is also within a ‘while True‘ loop…Well, we can consider adding timeout here as well. Unfortunately, the socket object is converted into file object (self.fp). So there is no graceful way to have the timeout information within this class.

Actually, I am not going to blame these ‘while True’ loops considering all the ‘while 1’ I have done to play with the socket. It is a common practice to have this loop making sure you have all the things received or sent. And eventually, Python is doing pretty well on the timeout control. Back to the system programming, we have different timeout for different socket operations, such as connect(), recv(), send() and etc. setsockopt() is used to determine the desired timeout value for the desired function call. In Python, the timeout from urlopen() is passed from the urllib2 to the C implementation wrappers, covering all the socket operations. This is amazing and convenient indeed. While ‘while True’ loop counts on the under-layer to have some input or timeout, there was neither input nor exceptions. I can live with the fact that the socket layer returning nothing. But why not timeout?

When timeout is passed from urlopen(), this value is also saved into the socket object, which is seen by the under layers as well. For each socket operation, the time consumption is computed and the compared to the timeout value to see if it should raise a timeout error. You see? Python checks the timeout for each socket operation using the UNCHANGED timeout saved in the socket object. Even if each socket operation is not timed out, given the ‘while True’ loop, we may go far beyond the original timeout epoch! As we do need to save the timeout value from the user and keep it unchanged for use by different socket operations. For the same socket operation, such as recv() within a ‘while True’ loop, we may need another timeout, which decreases after each recv(), one solution, in my opinion, would be helpful in the long term.

4. Code


About daveti

Interested in kernel hacking, compilers, machine learning and guitars.
This entry was posted in Programming, Stuff about Compiler and tagged , , , , , , . Bookmark the permalink.

4 Responses to Python Hacking – urlopen timeout issue

  1. xcoder says:

    Hi, doing penatests with some script on kali linux to my own web site
    command: ./ -url -cf class.phpmailer.php -d cache -ip
    Traceback (most recent call last):
    File “./”, line 175, in
    File “/usr/lib/python2.7/”, line 154, in urlopen
    return, data, timeout)
    File “/usr/lib/python2.7/”, line 421, in open
    protocol = req.get_type()
    File “/usr/lib/python2.7/”, line 283, in get_type
    raise ValueError, “unknown url type: %s” % self.__original
    ValueError: unknown url type:
    can u advice me how to fix it ? its a problem with python?

  2. xcoder says:

    hello, how to use urllib2 ? is it not by default?
    its allready installen in my distro

  3. xcoder says: doing all the same as at this video, but get error with python

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.