How to speed up / parallelize downloads of git submodules using git clone --recursive?
克隆具有大量子模块的Git存储库需要很长时间。在下面的例子中是~100个子模块
1 | git clone --recursive https://github.com/Whonix/Whonix |
Git一个接一个地克隆它们。比要求的时间长得多。让我们假设客户机和服务器都有足够的资源同时响应多个(并行)请求。
如何使用
使用Git2.8(q12016),您将能够开始获取子模块…并行!
见Jonathan Nieder(
Add a framework to spawn a group of processes in parallel, and use
it to run"git fetch --recurse-submodules " in parallel.
为此,
1 | -j, --jobs=<n> |
Number of parallel children to be used for fetching submodules.
Each will fetch from different submodules, such that fetching many submodules will be faster.
By default submodules will be fetched one at a time.
例子:
1 | git fetch --recurse-submodules -j2 |
这个新功能的主要部分在Stefan Beller(
run-command : add an asynchronous parallel child processorThis allows to run external commands in parallel with ordered output
on stderr.If we run external commands in parallel we cannot pipe the output directly
to the our stdout/err as it would mix up. So each process's output will
flow through a pipe, which we buffer. One subprocess can be directly
piped to out stdout/err for a low latency feedback to the user.
当我运行您的命令时,下载68 MB需要338秒的时间。
使用以下依赖GNU并行的python程序进行安装,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | #! /usr/bin/env python # coding: utf-8 from __future__ import print_function import os import subprocess jobs=16 modules_file = '.gitmodules' packages = [] if not os.path.exists('Whonix/' + modules_file): subprocess.call(['git', 'clone', 'https://github.com/Whonix/Whonix']) os.chdir('Whonix') # get list of packages from .gitmodules file with open(modules_file) as ifp: for line in ifp: if not line.startswith('[submodule '): continue package = line.split('"', 1)[1].split('"', 1)[0] #print(package) packages.append(package) def doit(): p = subprocess.Popen(['parallel', '-N1', '-j{0}'.format(jobs), 'git', 'submodule', 'update', '--init', ':::'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) res = p.communicate(' '.join(packages)) print(res[0]) if res[1]: print("error", res[1]) print('git exit value', p.returncode) return p.returncode # sometimes one of the updates interferes with the others and generate lock # errors, so we retry for x in range(10): if doit() == 0: print('zero exit from git after {0} times'.format(x+1)) break else: print('could not get a non-zero exit from git after {0} times'.format( x+1)) |
这个时间被减少到45秒(在同一个系统中,我没有多次运行来平均波动)。
为了检查情况是否正常,我将签出的文件与以下文件进行了"比较":
1 | find Whonix -name".git" -prune -o -type f -print0 | xargs -0 md5sum > /tmp/md5.sum |
在一个目录中
1 | md5sum -c /tmp/md5sum |
在另一个目录中,反之亦然。