关于pandas：python：使用多级索引对数据帧进行子集化

python: subsetting dataframe using multiple level index

本问题已经有最佳答案，请猛点这里访问。

我正在尝试使用多级索引对数据帧进行子集化。例如：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': range(1, 7) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)]})

df2=df.groupby(['state', 'office_id']).agg({'sales': 'sum'})

sales
state office_id
AZ 2 839507
4 373917
6 347225
CA 1 798585
3 890850
5 454423
CO 1 819975
3 202969
5 614011
WA 2 163942
4 369858
6 959285

如您所见，df2包含带有state和office_id的多级索引。对于df2，我想使用multindex对数据帧进行子集化，找到以下内容：

1)只有state = AZ

2)只有office_id <4

3)state = CA和office_id = 5

从历史上看，我会将索引放在数据框中并按列进行子集，但这样做效率不高。

有人可以指点我正确的方向吗？谢谢！

使用索引的基于.get_level_values的索引即一个例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14

df2.loc[(df2.index.get_level_values(0)=='AZ')]
# Also you can specify the name i.e df2.loc[(df2.index.get_level_values('state')=='AZ')]
sales
state office_id
AZ 2 469728
4 398925
6 704669

df2.loc[(df2.index.get_level_values(0)=='CA') & (df2.index.get_level_values(1)<4)]

sales
state office_id
CA 1 105244
3 116514

您还可以使用查询方法：

由于随机数，我的df2有点不同：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

df2
sales
state office_id
AZ 2 399569
4 784863
6 161690
CA 1 324148
3 631289
5 917734
CO 1 380714
3 289783
5 682802
WA 2 941091
4 804442
6 379563

亚利桑那州办事处

1
2
3
4
5
6

df2.query('state =="AZ"')
sales
state office_id
AZ 2 399569
4 784863
6 161690

只有办公室ID少于4：

1
2
3
4
5
6
7
8
9
10

df2.query('office_id < 4')

sales
state office_id
AZ 2 399569
CA 1 324148
3 631289
CO 1 380714
3 289783
WA 2 941091

加州和办公室id = 5

1
2
3
4

df2.query('state =="CA" & office_id == 5')
sales
state office_id
CA 5 917734