关于python：使用Biopython将多个FASTA文件转换为Nexus时出错

Error in converting multiple FASTA files to Nexus using Biopython

我想使用bio.seqio模块将多个fasta格式文件(dna序列)转换为nexus格式，但我得到以下错误：

1
2
3
4
5
6
7
8
9
10
11
12

Traceback (most recent call last):
File"fasta2nexus.py", line 28, in <module>
print(process(fullpath))
File"fasta2nexus.py", line 23, in process
alphabet=IUPAC.ambiguous_dna)
File"/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 1003, in convert
with as_handle(in_file, in_mode) as in_handle:
File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File"/Library/Python/2.7/site-packages/Bio/File.py", line 88, in as_handle
with open(handleish, mode, **kwargs) as fp:
IOError: [Errno 2] No such file or directory: 'c'

我错过了什么？

这是我的代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

##!/usr/bin/env python

from __future__ import print_function # or just use Python 3!

import fileinput
import os
import re
import sys

from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC

test ="/Users/teton/Desktop/test"

files = os.listdir(os.curdir)

def process(filename):
# retuns ("basename","extension"), so [0] picks"basename"
base = os.path.splitext(filename)[0]
return SeqIO.convert(filename,"fasta",
base +".nex","nexus",
alphabet=IUPAC.ambiguous_dna)

for files in os.listdir(test):
for file in files:
fullpath = os.path.join(file)
print(process(fullpath))

相关讨论

这段代码应该可以解决我能看到的大多数问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

from __future__ import print_function # or just use Python 3!

import fileinput
import os
import re
import sys

from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC

test ="/Users/teton/Desktop"

def process(filename):
# retuns ("basename","extension"), so [0] picks"basename"
base = os.path.splitext(filename)[0]
return SeqIO.convert(filename,"fasta",
base +".nex","nexus",
alphabet=IUPAC.ambiguous_dna)

for root, dirs, files in os.walk(test):
for file in files:
fullpath = os.path.join(root, file)
print(process(fullpath))

我改变了一些事情。首先，我订购了你的进口(私人物品)，并确保从Bio.Alphabet进口IUPAC，这样你就可以为你的序列分配正确的字母。接下来，在您的process()函数中，我添加了一行，将扩展名从文件名中分离出来，然后将完整的文件名用于第一个参数，只使用基文件(不带扩展名)来命名nexus输出文件。说到这一点，我假设您将在后面的代码中使用Nexus模块？如果没有，您应该从导入中删除它。

我不知道最后一个片段的意义是什么，所以我没有包括它。不过，在本文中，您似乎在浏览文件树，并再次对每个文件执行process()，然后引用一个名为count的未定义变量。相反，只需运行一次process()，然后在该循环中执行count所指的任何操作。

您可能需要考虑在for循环中添加一些逻辑，以测试os.path.join()返回的文件实际上是一个fasta文件。否则，如果您搜索的某个目录中有任何其他文件类型，并且您使用了cx1(4)，则可能会发生各种奇怪的事情。

编辑

好的，根据您的新代码，我有一些建议。第一，路线

1	files = os.listdir(os.curdir)

是完全不必要的，正如下面的process()函数的定义，您要重新定义files变量。此外，由于您没有调用os.curdir()，所以上述行将失败，您只是将其引用传递给os.listdir()。

底部的代码应该是：

1 2	for file in os.listdir(test): print(process(file))

for file in files是多余的，用一个参数调用os.path.join()毫无用处。