Python equivalent of a given wget command
我正在尝试创建一个与此wget命令完全相同的Python函数:
1 | wget -c --read-timeout=5 --tries=0"$URL" |
串联使用的这三个参数导致下载不会失败。
我想在我的Python脚本中复制这些功能,但我不知道从哪里开始......
还有一个很好的Python模块,名为
这表明了设计的简洁性:
1 2 3 4 5 6 | >>> import wget >>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3' >>> filename = wget.download(url) 100% [................................................] 3841532 / 3841532> >> filename 'razorback.mp3' |
请享用。
但是,如果
编辑:您还可以使用
1 2 3 4 | >>> output_directory = <directory_name> >>> filename = wget.download(url, out=output_directory) >>> filename 'razorback.mp3' |
urllib.request应该工作。
只需将其设置为while(未完成)循环,检查本地文件是否已存在,是否确实发送带有RANGE标头的GET,指定下载本地文件的程度。
请务必使用read()附加到本地文件,直到发生错误。
这也可能是Python的重复urllib2恢复下载在网络重新连接时不起作用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import urllib2 attempts = 0 while attempts < 3: try: response = urllib2.urlopen("http://example.com", timeout = 5) content = response.read() f = open("local/index.html", 'w' ) f.write( content ) f.close() break except urllib2.URLError as e: attempts += 1 print type(e) |
我不得不在一个没有正确的选项编译成wget的linux版本上做这样的事情。此示例用于下载内存分析工具'guppy'。我不确定它是否重要,但我保持目标文件的名称与url目标名称相同...
这是我想出的:
1 | python -c"import requests; r = requests.get('https://pypi.python.org/packages/source/g/guppy/guppy-0.1.10.tar.gz') ; open('guppy-0.1.10.tar.gz' , 'wb').write(r.content)" |
这是单行,这里更具可读性:
1 2 3 4 5 | import requests fname = 'guppy-0.1.10.tar.gz' url = 'https://pypi.python.org/packages/source/g/guppy/' + fname r = requests.get(url) open(fname , 'wb').write(r.content) |
这适用于下载tarball。我能够提取包并在下载后下载它。
编辑:
为了解决问题,这里是一个带有打印到STDOUT的进度条的实现。没有
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #!/usr/bin/env python from clint.textui import progress import requests fname = 'guppy-0.1.10.tar.gz' url = 'https://pypi.python.org/packages/source/g/guppy/' + fname r = requests.get(url, stream=True) with open(fname, 'wb') as f: total_length = int(r.headers.get('content-length')) for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length/1024) + 1): if chunk: f.write(chunk) f.flush() |
我经常发现更简单,更健壮的解决方案是在python中简单地执行终端命令。在你的情况下:
1 2 3 | import os url = 'https://www.someurl.com' os.system(f"""wget -c --read-timeout=5 --tries=0"{url}"""") |
像py一样容易:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | class Downloder(): def download_manager(self, url, destination='Files/DownloderApp/', try_number="10", time_out="60"): #threading.Thread(target=self._wget_dl, args=(url, destination, try_number, time_out, log_file)).start() if self._wget_dl(url, destination, try_number, time_out, log_file) == 0: return True else: return False def _wget_dl(self,url, destination, try_number, time_out): import subprocess command=["wget","-c","-P", destination,"-t", try_number,"-T", time_out , url] try: download_state=subprocess.call(command) except Exception as e: print(e) #if download_state==0 => successfull download return download_state |
让我用线程改进一个例子,以备下载许多文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | import math import random import threading import requests from clint.textui import progress # You must define a proxy list # I suggests https://free-proxy-list.net/ proxies = { 0: {'http': 'http://34.208.47.183:80'}, 1: {'http': 'http://40.69.191.149:3128'}, 2: {'http': 'http://104.154.205.214:1080'}, 3: {'http': 'http://52.11.190.64:3128'} } # you must define the list for files do you want download videos = [ "https://i.stack.imgur.com/g2BHi.jpg", "https://i.stack.imgur.com/NURaP.jpg" ] downloaderses = list() def downloaders(video, selected_proxy): print("Downloading file named {} by proxy {}...".format(video, selected_proxy)) r = requests.get(video, stream=True, proxies=selected_proxy) nombre_video = video.split("/")[3] with open(nombre_video, 'wb') as f: total_length = int(r.headers.get('content-length')) for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length / 1024) + 1): if chunk: f.write(chunk) f.flush() for video in videos: selected_proxy = proxies[math.floor(random.random() * len(proxies))] t = threading.Thread(target=downloaders, args=(video, selected_proxy)) downloaderses.append(t) for _downloaders in downloaderses: _downloaders.start() |