How to select rows in a DataFrame between two values, in Python Pandas?
我正在尝试修改DataFrame
但是,我得到了错误
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
我想知道是否有办法在不使用循环的情况下执行此操作。
1 | df = df[(99 <= df['closing_price'] <= 101)] |
还要考虑以下系列:
1 | df = df[df['closing_price'].between(99, 101)] |
您应该使用
1 | df = df[(df['closing_price'] >= 99) & (df['closing_price'] <= 101)] |
有一个更好的替代方法 - 使用query()方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | In [58]: df = pd.DataFrame({'closing_price': np.random.randint(95, 105, 10)}) In [59]: df Out[59]: closing_price 0 104 1 99 2 98 3 95 4 103 5 101 6 101 7 99 8 95 9 96 In [60]: df.query('99 <= closing_price <= 101') Out[60]: closing_price 1 99 5 101 6 101 7 99 |
更新:回答评论:
I like the syntax here but fell down when trying to combine with
expresison;df.query('(mean + 2 *sd) <= closing_price <=(mean + 2 *sd)')
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | In [161]: qry ="(closing_price.mean() - 2*closing_price.std())" +\ ...: " <= closing_price <=" + \ ...: "(closing_price.mean() + 2*closing_price.std())" ...: In [162]: df.query(qry) Out[162]: closing_price 0 97 1 101 2 97 3 95 4 100 5 99 6 100 7 101 8 99 9 95 |
1 | newdf = df.query('closing_price.mean() <= closing_price <= closing_price.std()') |
要么
1 2 3 4 | mean = closing_price.mean() std = closing_price.std() newdf = df.query('@mean <= closing_price <= @std') |
你也可以使用
1 2 3 | emp = pd.read_csv("C:\\py\\programs\\pandas_2\\pandas\\employees.csv") emp[emp["Salary"].between(60000, 61000)] |
Output
如果您正在处理多个值和多个输入,您还可以设置这样的应用函数。 在这种情况下,过滤掉具有特定范围的GPS位置的数据帧。
1 2 3 4 5 6 7 8 9 10 | def filter_values(lat,lon): if abs(lat - 33.77) < .01 and abs(lon - -118.16) < .01: return True elif abs(lat - 37.79) < .01 and abs(lon - -122.39) < .01: return True else: return False df = df[df.apply(lambda x: filter_values(x['lat'],x['lon']),axis=1)] |
而不是这个
1 | df = df[(99 <= df['closing_price'] <= 101)] |
你应该用它
1 | df = df[(df['closing_price']>=99 ) & (df['closing_price']<=101)] |
我们必须使用NumPy的按位逻辑运算符|,&,?,^来进行复合查询。
此外,括号对于运算符优先级很重要。
有关详细信息,请访问该链接
:比较,掩码和布尔逻辑