关于python：在Pandas中按列名选择两组列

Select two sets of columns by column names in Pandas

以"loc vs.iloc vs.ix vs.at vs.iat"中的EDOCX1[0]为例？例如。

1
2
3
4
5
6
7
8
9

df = pd.DataFrame(
{'age':[30, 2, 12, 4, 32, 33, 69],
'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
'height':[165, 70, 120, 80, 180, 172, 150],
'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']},
index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia']
)

现在我想要除了"食物"和"高度"之外的所有列。

我以为像df.loc[:,['age':'color', 'score':'state']]这样的东西可以用，但python返回SyntaxError: invalid syntax。

我知道有一种方法可以解决问题：df.drop(columns = ['food', 'height'])。然而，在我的现实生活中，我有数百列要删除。输入所有的列名效率很低。

我希望在r语言中与dplyr::select(df, -(food:height))或dplyr::select(df, age:color, score:state)类似。

还阅读了"选择/排除大熊猫中的列集"。

相关讨论

首先，找出位于food和height之间的所有列(包括)。

1	c = df.iloc[-1:0].loc[:, 'food':'height'].columns

接下来，用difference/isin/setdiff1d过滤。-

1	df[df.columns.difference(c)]

号

或，

1	df.loc[:, ~df.columns.isin(c)]

或，

1	df[np.setdiff1d(df.columns, c)]

。

1
2
3
4
5
6
7
8

age color score state
Jane 30 blue 4.6 NY
Nick 2 green 8.3 TX
Aaron 12 red 9.0 FL
Penelope 4 white 3.3 AL
Dean 32 gray 1.8 AK
Christina 33 black 9.5 TX
Cornelia 69 red 2.2 TX