How to perform time limited response download with python requests?
当用python下载一个大文件时,我不仅要为连接过程设置一个时间限制,还要为下载设置一个时间限制。
我尝试使用以下python代码:
1 2 3 4 5 6 7 | import requests r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', timeout = 0.5, prefetch = False) print r.headers['content-length'] print len(r.raw.read()) |
如文档https://requests.readthedocs.org/en/latest/user/quickstart/timeouts中正确指出的,这不起作用(下载不受时间限制)。
如果可能的话,这就太好了:
1 | r.raw.read(timeout = 10) |
问题是,如何对下载设置时间限制?
答案是:不要使用请求,因为它是阻塞的。使用非阻塞网络I/O,例如eventlet:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import eventlet from eventlet.green import urllib2 from eventlet.timeout import Timeout url5 = 'http://ipv4.download.thinkbroadband.com/5MB.zip' url10 = 'http://ipv4.download.thinkbroadband.com/10MB.zip' urls = [url5, url5, url10, url10, url10, url5, url5] def fetch(url): response = bytearray() with Timeout(60, False): response = urllib2.urlopen(url).read() return url, len(response) pool = eventlet.GreenPool() for url, length in pool.imap(fetch, urls): if (not length): print"%s: timeout!" % (url) else: print"%s: %s" % (url, length) |
产生预期结果:
1 2 3 4 5 6 7 | http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880 http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880 http://ipv4.download.thinkbroadband.com/10MB.zip: timeout! http://ipv4.download.thinkbroadband.com/10MB.zip: timeout! http://ipv4.download.thinkbroadband.com/10MB.zip: timeout! http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880 http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880 |
当使用请求的
您需要做的是告诉请求不要预加载整个请求,并保留您自己的时间来了解到目前为止您花费了多少阅读时间,同时一次提取小块。您可以使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import requests import time CHUNK_SIZE = 2**12 # Bytes TIME_EXPIRE = time.time() + 5 # Seconds r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', prefetch=False) data = '' buffer = r.raw.read(CHUNK_SIZE) while buffer: data += buffer buffer = r.raw.read(CHUNK_SIZE) if TIME_EXPIRE < time.time(): # Quit after 5 seconds. data += buffer break r.raw.release_conn() print"Read %s bytes out of %s expected." % (len(data), r.headers['content-length']) |
请注意,有时这可能会比分配给最终
在线程中运行下载,如果没有按时完成,则可以中止下载。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import requests import threading URL='http://ipv4.download.thinkbroadband.com/1GB.zip' TIMEOUT=0.5 def download(return_value): return_value.append(requests.get(URL)) return_value = [] download_thread = threading.Thread(target=download, args=(return_value,)) download_thread.start() download_thread.join(TIMEOUT) if download_thread.is_alive(): print 'The download was not finished on time...' else: print return_value[0].headers['content-length'] |