关于python：熊猫：ValueError：无法将float NaN转换为整数

Pandas: ValueError: cannot convert float NaN to integer

我收到ValueError：无法将float NaN转换为整数，原因如下：

1 2	df = pandas.read_csv('zoom11.csv') df[['x']] = df[['x']].astype(int)

" x"显然是csv文件中的一列，但我无法在文件中发现任何浮点NaN，也无法理解其含义。
当我将列读为String时，它的值就像-1,0,1，... 2000，对我来说，它们看起来都非常漂亮。
当我将列读为float时，可??以加载它。然后它将值显示为-1.0,0.0等，但仍然没有任何NaN-s
我尝试使用error_bad_lines = False和read_csv中的dtype参数无效。它只是取消加载，但有相同的例外。
文件不小(10+ M行)，因此当我提取一个小的标题部分时，无法手动检查它，那么没有错误，但是在完整文件中会发生。因此它在文件中，但是无法检测到什么。
从逻辑上讲，csv不应缺少值，但是即使有一些垃圾，我也可以跳过这些行。或者至少可以识别它们，但是我看不到扫描文件和报告转换错误的方法。

更新：使用注释/答案中的提示，我可以使用以下方法清理数据：

1
2
3
4
5
6
7
8
9

# x contained NaN
df = df[~df['x'].isnull()]

# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]

# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)

相关讨论

ValueError: cannot convert float NaN to integer

从v0.24开始，您实际上可以。 Pandas引入了Nullable Integer数据类型，该数据类型允许整数与NaN共存。

给定一系列缺少数据的整个浮点数，

1
2
3
4
5
6
7
8
9
10
11

s = pd.Series([1.0, 2.0, np.nan, 4.0])
s

0 1.0
1 2.0
2 NaN
3 4.0
dtype: float64

s.dtype
# dtype('float64')

您可以将其转换为可为null的int类型(从Int16，Int32或Int64中选择一种)，

1
2
3
4
5
6
7
8
9
10
11

s2 = s.astype('Int32') # note the 'I' is uppercase
s2

0 1
1 2
2 NaN
3 4
dtype: Int32

s2.dtype
# Int32Dtype()

您的专栏需要有整数才能进行转换。其他任何事情都会引发TypeError：

1
2
3
4

s = pd.Series([1.1, 2.0, np.nan, 4.0])

s.astype('Int32')
# TypeError: cannot safely cast non-equivalent float64 to int32

我知道已经回答了这个问题，但希望将来为任何人提供替代解决方案：

您可以使用.loc仅按notnull()的值对数据框进行子集化，然后仅对'x'列进行子集化。取相同的向量，并对其apply(int)。

如果列x为浮点型：

1	df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)