Python chained interval comparison
我正在尝试对两个文件进行连锁比较,并在指定的时间间隔内打印/写出结果。
这就是我目前为止所拥有的。
Test1文件:
1 | A0AUZ9,7,17 #just this one line |
测试2文件:
1 2 3 4 5 | A0AUZ8, DOC_PP1_RVXF_1, 8, 16, PF00149, O24930 A0AUZ9, LIG_BRCT_BRCA1_2, 127, 134, PF00533, O25336 A0AUZ9, LIG_BRCT_BRCA1_1, 127, 132, PF00533, O25336 A0AUZ9, DOC_PP1_RVXF_1, 8, 16, PF00149, O25685 A0AUZ9, DOC_PP1_RVXF_1, 8, 16, PF00149, O25155 |
脚本本身:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | results = [] with open('test1', 'r') as disorder: for lines in disorder: cells = lines.strip().split(',') with open('test2', 'r') as helpy: for lines in helpy: blocks = lines.strip().split(',') if blocks[0] != cells[0]: continue elif cells[1] <= blocks[2] and blocks[3] <= cells[2]: results.append(blocks) with open('test3','wt') as outfile: for i in results: outfile.write("%s " % i) |
我的首选输出将是只有test3中的行,即:
在第一列中有匹配的ID
第3列和第4列中的两个数值介于test1文件中给定的值之间。
我没有输出,也不知道哪里出错了。
它不能按预期工作的原因之一是您正在比较字符串而不是数字。
然而,也许有更好的方法来做你想做的事情。假设第一个文件足够小,可以容纳在内存中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import csv from collections import defaultdict lookup_table = defaultdict(list) with open('test1.txt') as f: reader = csv.reader(f) for row in reader: lookup_table[row[0]].append((int(row[1]),int(row[2]))) with open('test2.txt') as a, open('results.txt', 'w') as b: reader = csv.reader(a) writer = csv.writer(b) for row in reader: record = lookup_table.get(row[0]) if record: if record[0] <= int(row[2]) and record[1] <= int(row[3]): writer.writerow(row) |