Best way to convert string to bytes in Python 3?
似乎有两种不同的方法可以将字符串转换为字节,如对typeerror的回答所示:"str"不支持缓冲区接口。
这些方法中哪一种比较好或更适合用Python?还是只是个人喜好的问题?
1 2 3 | b = bytes(mystring, 'utf-8') b = mystring.encode('utf-8') |
如果您查看
bytearray(]])
Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.
The optional source parameter can be used to initialize the array in a few different ways:
If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().
If it is an integer, the array will have that size and will be initialized with null bytes.
If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.
If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.
Without an argument, an array of size 0 is created.
因此,
对于字符串的编码,我认为
编辑:我检查了python源代码。如果使用cpython将一个unicode字符串传递给
另外,参见Serdalis的评论——
它比人们想象的要容易:
1 2 3 4 5 | my_str ="hello world" my_str_as_bytes = str.encode(my_str) type(my_str_as_bytes) # ensure it is byte representation my_decoded_str = my_str_as_bytes.decode() type(my_decoded_str) # ensure it is string representation |
绝对最好的方法不是2,而是3。自Python3.0以来,
1 | b = mystring.encode() |
这也会更快,因为默认参数不会在C代码中产生字符串
以下是一些时间安排:
1 2 3 4 5 6 7 8 9 | In [1]: %timeit -r 10 'abc'.encode('utf-8') The slowest run took 38.07 times longer than the fastest. This could mean that an intermediate result is being cached. 10000000 loops, best of 10: 183 ns per loop In [2]: %timeit -r 10 'abc'.encode() The slowest run took 27.34 times longer than the fastest. This could mean that an intermediate result is being cached. 10000000 loops, best of 10: 137 ns per loop |
尽管有警告,但重复运行后的时间非常稳定——偏差仅为约2%。
在没有参数的情况下使用
1 2 3 4 | >>> '???'.encode() Traceback (most recent call last): File"<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) |
您可以使用以下方法简单地将字符串转换为字节:
您可以使用以下方法简单地将字节转换为字符串:
以下函数(取自有效的python)可能有助于将
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def to_bytes(bytes_or_str): if isinstance(bytes_or_str, str): value = bytes_or_str.encode() # uses 'utf-8' for encoding else: value = bytes_or_str return value # Instance of bytes def to_str(bytes_or_str): if isinstance(bytes_or_str, bytes): value = bytes_or_str.decode() # uses 'utf-8' for encoding else: value = bytes_or_str return value # Instance of str |
1 2 | so_string = 'stackoverflow' so_bytes = so_string.encode( ) |