关于python:如何使用nbconvert作为git textconv驱动程序来启用Jupyter笔记本的有效版本控制

How to use nbconvert as git textconv driver to enable effective version control of Jupyter Notebooks

我正在尝试做什么以及它与类似问题的区别

我想使用Git版本控制Jupyter笔记本。不幸的是,默认情况下,Git和Jupyter笔记本电脑不能很好地播放。 .ipynb文件是.json文件,不仅包含Python代码本身,还包含大量元数据(例如,单元执行计数)和单元格输出。

大多数现有解决方案(例如,在版本控制下使用IPython笔记本)依赖于从笔记本中删除输出和元数据。这个(i)在进行差异时仍然保持.json文件结构,这是一个难以阅读,并且(ii)意味着不能使用诸如Github上的输出显示之类的功能,因为输出在提交之前被删除。

我的想法如下:每当我运行git diff时,Git会自动使用jupyter nbconvert --to python filename.ipynb从我的*.ipynb源文件转换为*.py普通的python文件。然后它应该只检测影响代码本身的更改(不是执行计数和输出,因为它们被nbconvert删除)而不实际删除它们,它应该使我的差异比未转换的.ipynb文件更可读。我不希望文件的.py版本永久存储;它只应用于git diff。我的理解是,只需将nbconvert指定为 textconv驱动程序即可,但我无法使其工作。

到目前为止我已经完成的步骤

我在/usr/local/bin中创建了一个名为ipynb2py的文件

1
2
#!/bin/bash
jupyter nbconvert --to python $1

我已将以下内容添加到.gitconfig文件中

1
2
[diff"ipynb"]
    textconv = ipynb2py

以及我的.gitattributes文件中的以下内容

1
*.ipynb diff=ipynb

ipynb textconv驱动程序分配给.ipynb格式的所有文件。

现在,我希望git diff能够自动执行转换(我知道这会慢慢减速,但值得为VCing笔记本提供一个可行的选项)每次运行它然后显示一个很好的可读差异,仅基于差异转换后的笔记本状态之间。

当我执行git diff时,它首先说[NbConvertApp] Converting notebook,它告诉我Git正在按预期触发转换。但是,在以fatal: unable to read files to diff结尾的长Python追踪之后,转换失败。

fatal错误消息之前,我收到以下内容

1
2
3
4
5
6
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '
# coding: utf-8

# In[ ]:

import...

当然,我怀疑我的ipynb2py脚本调用nbconvert的方式存在问题,但在我的repo中运行ipynb2py notebook.ipynb的效果非常好,所以这不是原因。

可能导致此错误的原因是什么?除了返回文本文件之外,有效textconv驱动程序有哪些要求?

完成追溯

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
git diff
[NbConvertApp] Converting notebook /var/folders/9t/p55_4b9971j4wwp14_45wy900000gn/T//lR5q08_notebook.ipynb to python
Traceback (most recent call last):
File"/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 14, in parse_json
nb_dict = json.loads(s, **kwargs)
File"/Users/user/anaconda/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File"/Users/user/anaconda/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File"/Users/user/anaconda/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File"/Users/user/anaconda/bin/jupyter-nbconvert", line 11, in <module>
load_entry_point('nbconvert==5.1.1', 'console_scripts', 'jupyter-nbconvert')()
File"/Users/user/anaconda/lib/python3.6/site-packages/jupyter_core/application.py", line 266, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File"/Users/user/anaconda/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File"/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 305, in start
self.convert_notebooks()
File"/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 473, in convert_notebooks
self.convert_single_notebook(notebook_filename)
File"/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 444, in convert_single_notebook
output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
File"/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 373, in export_single_notebook
output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
File"/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 171, in from_filename
return self.from_file(f, resources=resources, **kw)
File"/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 189, in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
File"/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 141, in read
return reads(fp.read(), as_version, **kwargs)
File"/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 74, in reads
nb = reader.reads(s, **kwargs)
File"/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 58, in reads
nb_dict = parse_json(s, **kwargs)
File"/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 17, in parse_json
raise NotJSONError(("Notebook does not appear to be JSON: %r" % s)[:77] +"...")
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '
# coding: utf-8

# In[ ]:

import...
fatal: unable to read files to diff


如果仔细阅读gitattributes的文档(其中描述了textconv config选项),您会注意到转换器程序必须将输出发送到标准输出:

...

Performing text diffs of binary files

Sometimes it is desirable to see the diff of a text-converted version
of some binary files. For example, a word processor document can be
converted to an ASCII text representation, and the diff of the text
shown. Even though this conversion loses some information, the
resulting diff is useful for human viewing (but cannot be applied
directly).

The textconv config option is used to define a program for
performing such a conversion. The program should take a single
argument, the name of a file to convert, and produce the resulting
text on stdout.

...

因此,您必须在转换命令中添加--stdout选项:

ipynb2py

1
2
#!/bin/bash
jupyter nbconvert --to python --stdout"$1"

你有没有试过直接提交笔记本电脑。 当我看到控制jupyter笔记本的版本时,我看到了类似的帖子,但是当我尝试它时,似乎工作正常。

github上的示例笔记本

https://github.com/loegare/Test-Post-Please-Ignore/blob/master/Untitled%20Folder/Data%20Due%20Dilligence.ipynb