关于python：什么是沿多维度切割带有MultiIndex的Pandas系列的有效方法？

What is an efficient way to slice a Pandas Series with a MultiIndex along multiple dimensions?

我迷失在ix，xs，MultiIndex，get_level_values和其他熊猫的海洋中。

我有一个3级多指数系列。什么是基于不同级别的值切割我的系列的有效方法？

我的系列看起来像这样：

1
2
3
4
5
6
7
8

days id start_date
0 S0036-4665(00)04200108 2013-05-18 1
3 S0036-4665(00)04200108 2013-05-18 1
5 S0036-4665(00)04200108 2013-05-18 3
13 S0036-4665(00)04200108 2013-05-18 1
19 S0036-4665(00)04200108 2013-05-18 1
39 S0036-4665(00)04200108 2013-05-18 1
...

显然，id和start_date的值会随着你的名气而变化

我希望能够根据以下内容进行切片：
- 数字范围内的天数
- 特定集合中的id
- 特定日期范围内的start_date

到目前为止，我找到了这个解决方案，建议使用df[df.index.get_level_values('a').isin([5, 7, 10, 13])]，我发现我能做到：

1	s.select(lambda x: x[0] < 20 and (x[1] in set('some id', 'other id') ))

这些都是最好的解决方案吗？我觉得我应该可以用xs或ix做一些事情，但前者似乎只允许你按特定值过滤，而后者只能对系列中的位置进行索引？

相关讨论

这是一个例子; 这需要当前的主人，并将在0.14。
文档在这里：http：//pandas-docs.github.io/pandas-docs-travis/indexing.html#multiindexing-using-slicers

创建一个多指数(这恰好是输入的笛卡尔积，但那
没有必要)

1
2
3
4
5
6

In [28]: s = Series(np.arange(27),
index=MultiIndex.from_product(
[[1,2,3],
['foo','bar','bah'],
date_range('20130101',periods=3)])
).sortlevel()

始终确保您完全排序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

In [29]: s.index.lexsort_depth
Out[29]: 3

In [30]: s
Out[30]:
1 bah 2013-01-01 6
2013-01-02 7
2013-01-03 8
bar 2013-01-01 3
2013-01-02 4
2013-01-03 5
foo 2013-01-01 0
2013-01-02 1
2013-01-03 2
2 bah 2013-01-01 15
2013-01-02 16
2013-01-03 17
bar 2013-01-01 12
2013-01-02 13
2013-01-03 14
foo 2013-01-01 9
2013-01-02 10
2013-01-03 11
3 bah 2013-01-01 24
2013-01-02 25
2013-01-03 26
bar 2013-01-01 21
2013-01-02 22
2013-01-03 23
foo 2013-01-01 18
2013-01-02 19
2013-01-03 20
dtype: int64

这有助于定义减少措辞(这个组合为一个单独的水平
轴)

1	In [33]: idx = pd.IndexSlice

选择我，级别0为2，级别1为bar或foo

1
2
3
4
5
6
7
8
9

In [31]: s.loc[idx[[2],['bar','foo']]]
Out[31]:
2 bar 2013-01-01 12
2013-01-02 13
2013-01-03 14
foo 2013-01-01 9
2013-01-02 10
2013-01-03 11
dtype: int64

与上面相同，但是级别2等于20130102

1
2
3
4
5
6
7

In [32]: s.loc[idx[[2,3],['bar','foo'],'20130102']]
Out[32]:
2 bar 2013-01-02 13
foo 2013-01-02 10
3 bar 2013-01-02 22
foo 2013-01-02 19
dtype: int64

下面是使用布尔索引器而不是级别索引器的示例。

1
2
3
4
5
6
7
8
9
10
11

In [43]: s.loc[idx[[2,3],['bar','foo'],s<20]]
Out[43]:
2 bar 2013-01-01 12
2013-01-02 13
2013-01-03 14
foo 2013-01-01 9
2013-01-02 10
2013-01-03 11
3 foo 2013-01-01 18
2013-01-02 19
dtype: int64

下面是省略某些级别的示例(请注意，这里没有使用idx，因为它们基本上与系列等效;在索引DataFrame时更有用)

1
2
3
4
5
6
7
8
9

In [47]: s.loc[:,['bar','foo'],'20130102']
Out[47]:
1 bar 2013-01-02 4
foo 2013-01-02 1
2 bar 2013-01-02 13
foo 2013-01-02 10
3 bar 2013-01-02 22
foo 2013-01-02 19
dtype: int64