关于python：UnicodeDecodeError：’charmap’编解码器无法解码位置386中的字节0x8d：字符映射到＆lt; undefined＆gt;

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 386: character maps to <undefined>

我试图用Slate库读取一个PDF文件，但它抛出了以下错误：

1
2
3
4
5
6
7
8
9
10

import slate

pdf = 'tabla9.pdf'

with open(pdf,encoding="utf-8") as f:

doc = slate.PDF(f)

for page in doc[:2]:
print(page)

完全错误：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

File"C:\Users\user\libro5.py", line 7, in <module>
doc = slate.PDF(f)
File"C:\Python3\lib\slate\classes.py", line 52, in __init__
self.parser = PDFParser(file)
File"C:\Python3\lib\site-packages\pdfminer\pdfparser.py", line 646, in
__init__
PSStackParser.__init__(self, fp)
File"C:\Python3\lib\site-packages\pdfminer\psparser.py", line 189, in
__init__
PSBaseParser.__init__(self, fp)
File"C:\Python3\lib\site-packages\pdfminer\psparser.py", line 134, in
__init__
data = fp.read()
File"C:\Python3\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 10:
invalid continuation byte

classes.py52行：

1
2
3

class PDF(list):
def __init__(self, file, password='', just_text=1, check_extractable=True, char_margin=1.0, line_margin=0.1, word_margin=0.1):
self.parser = PDFParser(file)

pdfparser.py646行：

1 2	def __init__(self, fp): PSStackParser.__init__(self, fp)

psparser.py，第189行：

1
2
3
4

class PSStackParser(PSBaseParser):

def __init__(self, fp):
PSBaseParser.__init__(self, fp)

psparser.py134行：

1
2
3
4
5
6

class PSBaseParser:

"""Most basic PostScript parser that performs only tokenization.
"""
def __init__(self, fp):
data = fp.read()

文件"c:python3libcodecs.py"，第322行，解码中(结果，消耗)=self.u缓冲区解码(数据，self.errors，最终)unicodedecode错误："utf-8"编解码器无法解码位置10中的字节0xe2:无效的继续字节：

1
2
3
4

def decode(self, input, final=False):
# decode input (taking the buffer into account)
data = self.buffer + input
(result, consumed) = self._buffer_decode(data, self.errors, final)

我在Windows10上使用的是python 3.7。

PDF文件是二进制的，不适合以文本模式以编码方式打开它。

尝试：

1	with open(pdf,"rb") as f: