copy section of text in file python
我需要从下面的文本文件中提取值:
1 2 3 4 5 6 7 8 | fdsjhgjhg fdshkjhk Start Good Morning Hello World End dashjkhjk dsfjkhk |
我需要提取的值是从开始到结束。
1 2 3 4 5 6 7 8 9 | with open('path/to/input') as infile, open('path/to/output', 'w') as outfile: copy = False for line in infile: if line.strip() =="Start": copy = True elif line.strip() =="End": copy = False elif copy: outfile.write(line) |
我正在使用的上述代码来自这个问题:
使用python在文本文件中提取两个字符串之间的值
此代码不包括字符串"Start"和"End",只是它们内部的内容。 你会如何包括外围弦?
@en_Knight几乎是对的。 这是一个修复,以满足OP的请求,即分隔符包含在输出中:
1 2 3 4 5 6 7 8 9 10 | with open('path/to/input') as infile, open('path/to/output', 'w') as outfile: copy = False for line in infile: if line.strip() =="Start": copy = True if copy: outfile.write(line) # move this AFTER the"if copy" if line.strip() =="End": copy = False |
或者只是在它适用的情况下包含write():
1 2 3 4 5 6 7 8 9 10 11 | with open('path/to/input') as infile, open('path/to/output', 'w') as outfile: copy = False for line in infile: if line.strip() =="Start": outfile.write(line) # add this copy = True elif line.strip() =="End": outfile.write(line) # add this copy = False elif copy: outfile.write(line) |
更新:回答评论中的问题"仅在'开始'之后使用'结束'的第一次出现",将最后的
1 2 3 | elif line.strip() =="End" and copy: outfile.write(line) # add this copy = False |
如果只有一个"开始"但多个"结束"行......这听起来很奇怪,但这就是提问者所要求的。
RegExp方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 | import re with open('input.txt') as f: data = f.read() match = re.search(r' (Start .*? End) ', data, re.M | re.S) if match: with open('output.txt', 'w') as f: f.write(match.group(1)) |
"
1 2 3 4 5 6 7 8 9 | with open('path/to/input') as infile, open('path/to/output', 'w') as outfile: copy = False for line in infile: if line.strip() =="Start": copy = True if copy: # flipped to include end, as Dan H pointed out outfile.write(line) if line.strip() =="End": copy = False |