关于python：如何纠正TypeError：在散列之前必须对Unicode对象进行编码？

How to correct TypeError: Unicode-objects must be encoded before hashing?

我有这个错误：

1
2
3
4

Traceback (most recent call last):
File"python_md5_cracker.py", line 27, in <module>
m.update(line)
TypeError: Unicode-objects must be encoded before hashing

当我尝试在python 3.2.2中执行此代码时：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

import hashlib, sys
m = hashlib.md5()
hash =""
hash_file = input("What is the file name in which the hash resides? ")
wordlist = input("What is your wordlist? (Enter the file name) ")
try:
hashdocument = open(hash_file,"r")
except IOError:
print("Invalid file.")
raw_input()
sys.exit()
else:
hash = hashdocument.readline()
hash = hash.replace("
","")

try:
wordlistfile = open(wordlist,"r")
except IOError:
print("Invalid file.")
raw_input()
sys.exit()
else:
pass
for line in wordlistfile:
# Flush the buffer (this caused a massive problem when placed
# at the beginning of the script, because the buffer kept getting
# overwritten, thus comparing incorrect hashes)
m = hashlib.md5()
line = line.replace("
","")
m.update(line)
word_hash = m.hexdigest()
if word_hash == hash:
print("Collision! The word corresponding to the given hash is", line)
input()
sys.exit()

print("The hash given does not correspond to any supplied word in the wordlist.")
input()
sys.exit()

相关讨论

错误已经表明你必须做什么。MD5操作字节，因此必须将unicode字符串编码为bytes，例如使用line.encode('utf-8')。

请先看看那个答案。

现在，错误消息很清楚：只能使用字节，不能使用python字符串(在python<3中以前是unicode)，因此必须使用首选编码对字符串进行编码：utf-32、utf-16、utf-8，甚至是受限制的8位编码(有些人可能称之为codepages)。

从文件中读取时，python 3会自动将单词表文件中的字节解码为unicode。我建议你这样做：

1	m.update(line.encode(wordlistfile.encoding))

因此，推送到MD5算法的编码数据的编码方式与底层文件完全相同。

您可以以二进制模式打开文件：

1
2
3
4
5
6
7
8
9
10
11
12
13

import hashlib

with open(hash_file) as file:
control_hash = file.readline().rstrip("
")

wordlistfile = open(wordlist,"rb")
# ...
for line in wordlistfile:
if hashlib.md5(line.rstrip(b'

')).hexdigest() == control_hash:
# collision

1
2
3
4

import hashlib
string_to_hash = '123'
hash_object = hashlib.sha256(str(string_to_hash).encode('utf-8'))
print('Hash', hash_object.hexdigest())

编码这行代码为我修复了它。

1	m.update(line.encode('utf-8'))

这个程序是上述MD5破解程序的无bug增强版本，它读取包含哈希密码列表的文件，并对照英语字典单词列表中的哈希单词进行检查。希望能有所帮助。

我从下面的链接下载了英语词典网址：https://github.com/dwyl/english-words

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

# md5cracker.py
# English Dictionary https://github.com/dwyl/english-words

import hashlib, sys

hash_file = 'exercise\hashed.txt'
wordlist = 'data_sets\english_dictionary\words.txt'

try:
hashdocument = open(hash_file,'r')
except IOError:
print('Invalid file.')
sys.exit()
else:
count = 0
for hash in hashdocument:
hash = hash.rstrip('
')
print(hash)
i = 0
with open(wordlist,'r') as wordlistfile:
for word in wordlistfile:
m = hashlib.md5()
word = word.rstrip('
')
m.update(word.encode('utf-8'))
word_hash = m.hexdigest()
if word_hash==hash:
print('The word, hash combination is ' + word + ',' + hash)
count += 1
break
i += 1
print('Itiration is ' + str(i))
if count == 0:
print('The hash given does not correspond to any supplied word in the wordlist.')
else:
print('Total passwords identified is: ' + str(count))
sys.exit()