关于python：使用utf8读取带有.encode的行

read line with .encode with utf8

本问题已经有最佳答案，请猛点这里访问。

我从一个文件中读取行，比如：

The Little Big Things: 163 Wege zur Spitzenleistung (Dein Leben) (German Edition) (Peters, Tom)

Die virtuelle Katastrophe: So führen Sie Teams über Distanz zur
Spitzenleistung (German Edition) (Thomas, Gary)

我用以下代码读取/编码它们：

1	title = line.encode('utf8')

但产出是：

b'Die virtuelle Katastrophe: So f\xc3\xbchren Sie Teams \xc3\xbcber
Distanz zur Spitzenleistung (German Edition) (Thomas, Gary)'

b'The Little Big Things: 163 Wege zur Spitzenleistung (Dein Leben)
(German Edition) (Peters, Tom)'

为什么总是添加"b"？如何正确读取文件以保存"umlauts"？

以下是完整的相关代码段：

1
2
3
4
5
6
7
8
9
10
11
12

# Parse the clippings.txt file
lines = [line.strip() for line in codecs.open(config['CLIPPINGS_FILE'], 'r', 'utf-8-sig')]
for line in lines:
line_count = line_count + 1
if (line_count == 1 or is_title == 1):
# ASSERT: this is a title line
#title = line.encode('ascii', 'ignore')
title = line.encode('utf8')
prev_title = 1
is_title = 0
note_type_result = note_type = l = l_result = location =""
continue

谢谢

相关讨论

方法str.encode将unicode字符串转换为bytes对象：

str.encode(encoding="utf-8", errors="strict")
Return an encoded version of the string as a bytes object. Default encoding is 'utf-8'. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a UnicodeError. Other possible values are 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace' and any other name registered via codecs.register_error(), see section Error Handlers. For a list of possible encodings, see section Standard Encodings.

所以你得到的正是你所期望的。

在大多数机器上，您只需open文件和读取即可。如果文件编码不是系统默认值，则可以将其作为关键字参数传递：

1 2	with open(filename, encoding='utf8') as f: line = f.readline()