Strange behaviour with floats and string conversion
我已经在python shell中键入了:
1 2 | >>> 0.1*0.1 0.010000000000000002 |
我期望0.1*0.1不是0.01,因为我知道基10中的0.1在基2中是周期性的。
1 2 | >>> len(str(0.1*0.1)) 4 |
我期望得到20个字符,因为我已经看到上面20个字符。为什么我得到4?
1 2 | >>> str(0.1*0.1) '0.01' |
好吧,这就解释了为什么我
1 2 | >>> repr(0.1*0.1) '0.010000000000000002' |
为什么
1 2 3 4 | >>> str(0.01) == str(0.0100000000001) False >>> str(0.01) == str(0.01000000000001) True |
所以这似乎是浮动精度的问题。我以为Python会使用IEEE754单精度浮点数。所以我是这样检查的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #include <stdint.h> #include <stdio.h> // printf union myUnion { uint32_t i; // unsigned integer 32-bit type (on every machine) float f; // a type you want to play with }; int main() { union myUnion testVar; testVar.f = 0.01000000000001f; printf("%f ", testVar.f); testVar.f = 0.01000000000000002f; printf("%f ", testVar.f); testVar.f = 0.01f*0.01f; printf("%f ", testVar.f); } |
我得到:
1 2 3 | 0.010000 0.010000 0.000100 |
Python给了我:
1 2 3 4 5 6 | >>> 0.01000000000001 0.010000000000009999 >>> 0.01000000000000002 0.010000000000000019 >>> 0.01*0.01 0.0001 |
为什么python会给我这些结果?
(我使用的是python 2.6.5。如果您知道Python版本的不同,我也会对它们感兴趣的。)
对
在python 2.x(2.7之前)中,
即使在python 2.7中,
1 2 3 4 5 6 7 | >>> 0.1.hex() '0x1.999999999999ap-4' >>> (0.1 * 0.1).hex() '0x1.47ae147ae147cp-7' >>> 0.01.hex() '0x1.47ae147ae147bp-7' ^ 1 ulp difference |
我可以证实你的行为
1 2 3 4 5 6 7 | ActivePython 2.6.4.10 (ActiveState Software Inc.) based on Python 2.6.4 (r264:75706, Jan 22 2010, 17:24:21) [MSC v.1500 64 bit (AMD64)] on win32 Type"help","copyright","credits" or"license" for more information. >>> repr(0.1) '0.10000000000000001' >>> repr(0.01) '0.01' |
现在,文档声称在python中<2.7
the value of
repr(1.1) was computed asformat(1.1, '.17g')
这只是一个小小的简化。
请注意,这与字符串格式化代码有关——在内存中,所有Python浮点都被存储为C++双打,因此它们之间永远不会有区别。
另外,即使你知道有更好的浮点数,使用全长的浮点数也有点不愉快。事实上,在现代的Python中,一种新的算法被用于浮动格式,它以一种聪明的方式选择最短的表示。
我花了一段时间在源代码中查找这个,所以我将在这里包含详细信息,以防您感兴趣。你可以跳过这一部分。
在
1 2 3 4 5 6 7 8 | static PyObject * float_repr(PyFloatObject *v) { char buf[100]; format_float(buf, sizeof(buf), v, PREC_REPR); return PyString_FromString(buf); } |
这让我们看到了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | format_float(char *buf, size_t buflen, PyFloatObject *v, int precision) { register char *cp; char format[32]; int i; /* Subroutine for float_repr and float_print. We want float numbers to be recognizable as such, i.e., they should contain a decimal point or an exponent. However, %g may print the number as an integer; in such cases, we append".0" to the string. */ assert(PyFloat_Check(v)); PyOS_snprintf(format, 32,"%%.%ig", precision); PyOS_ascii_formatd(buf, buflen, format, v->ob_fval); cp = buf; if (*cp == '-') cp++; for (; *cp != '\0'; cp++) { /* Any non-digit means it's not an integer; this takes care of NAN and INF as well. */ if (!isdigit(Py_CHARMASK(*cp))) break; } if (*cp == '\0') { *cp++ = '.'; *cp++ = '0'; *cp++ = '\0'; return; } <some NaN/inf stuff> } |
我们可以看到
因此,这首先初始化一些变量,并检查
1 | PyOS_snprintf(format, 32,"%%.%ig", precision); |
现在,prec-repr在
1 | PyOS_ascii_formatd(buf, buflen, format, v->ob_fval); |
在隧道的尽头,我们看到了
从python教程:
In versions prior to Python 2.7 and Python 3.1, Python rounded this value to 17 significant digits, giving
‘0.10000000000000001’ . In current versions, Python displays a value based on the shortest decimal fraction that rounds correctly back to the true binary value, resulting simply in‘0.1’ .