如何使用python请求执行限时响应下载?

How to perform time limited response download with python requests?

当用python下载一个大文件时,我不仅要为连接过程设置一个时间限制,还要为下载设置一个时间限制。

我尝试使用以下python代码:

1
2
3
4
5
6
7
import requests

r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', timeout = 0.5, prefetch = False)

print r.headers['content-length']

print len(r.raw.read())

如文档https://requests.readthedocs.org/en/latest/user/quickstart/timeouts中正确指出的,这不起作用(下载不受时间限制)。

如果可能的话,这就太好了:

1
r.raw.read(timeout = 10)

问题是,如何对下载设置时间限制?


答案是:不要使用请求,因为它是阻塞的。使用非阻塞网络I/O,例如eventlet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import eventlet
from eventlet.green import urllib2
from eventlet.timeout import Timeout

url5 = 'http://ipv4.download.thinkbroadband.com/5MB.zip'
url10 = 'http://ipv4.download.thinkbroadband.com/10MB.zip'

urls = [url5, url5, url10, url10, url10, url5, url5]

def fetch(url):
    response = bytearray()
    with Timeout(60, False):
        response = urllib2.urlopen(url).read()
    return url, len(response)

pool = eventlet.GreenPool()
for url, length in pool.imap(fetch, urls):
    if (not length):
        print"%s: timeout!" % (url)
    else:
        print"%s: %s" % (url, length)

产生预期结果:

1
2
3
4
5
6
7
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880


当使用请求的prefetch=False参数时,您可以一次提取任意大小的响应块(而不是一次全部提取)。

您需要做的是告诉请求不要预加载整个请求,并保留您自己的时间来了解到目前为止您花费了多少阅读时间,同时一次提取小块。您可以使用r.raw.read(CHUNK_SIZE)获取块。总的来说,代码看起来是这样的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import requests
import time

CHUNK_SIZE = 2**12  # Bytes
TIME_EXPIRE = time.time() + 5  # Seconds

r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', prefetch=False)

data = ''
buffer = r.raw.read(CHUNK_SIZE)
while buffer:
    data += buffer
    buffer = r.raw.read(CHUNK_SIZE)

    if TIME_EXPIRE < time.time():
        # Quit after 5 seconds.
        data += buffer
        break

r.raw.release_conn()

print"Read %s bytes out of %s expected." % (len(data), r.headers['content-length'])

请注意,有时这可能会比分配给最终r.raw.read(...)的5秒多一点,这可能会延迟任意时间。但至少它不依赖于多线程或套接字超时。


在线程中运行下载,如果没有按时完成,则可以中止下载。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import requests
import threading

URL='http://ipv4.download.thinkbroadband.com/1GB.zip'
TIMEOUT=0.5

def download(return_value):
    return_value.append(requests.get(URL))

return_value = []
download_thread = threading.Thread(target=download, args=(return_value,))
download_thread.start()
download_thread.join(TIMEOUT)

if download_thread.is_alive():
    print 'The download was not finished on time...'
else:
    print return_value[0].headers['content-length']