关于data.table:R中不平衡面板上的简单移动平均线

Simple moving average on an unbalanced panel in R

我正在处理一个不平衡、不规则间隔的横截面时间序列。我的目标是获得"数量"向量的滞后移动平均向量,由"主题"分割。

换句话说,假设已对 Subject_1 观察到以下数量:
[1,2,3,4,5]。我首先需要将它滞后 1,得到 [NA,1,2,3,4]。

然后我需要取 3 阶的移动平均线,得到 [NA,NA,NA,(3 2 1)/3,(4 3 2)/3]

以上所有科目都需要完成。

1
2
3
4
5
6
7
8
9
10
11
# Construct example balanced panel DF
panel <- data.frame(
  as.factor(sort(rep(1:6,5))),
  rep(1:5,6),
  rnorm(30)                
)
colnames(panel)<- c("Subject","Day","Quantity")

#Make panel DF unbalanced
panelUNB <- subset(panel,as.numeric(Subject)!= Day)
panelUNB <- panelUNB[-c(15,16),]

如果面板是平衡的,我将首先使用包plm 和函数lag 滞后"数量"变量。
然后我会像这样使用包 zoo:

中的函数 rollmean 来获取滞后 "Quanatity" 的移动平均值

1
2
panel$QuantityMA <- ave(panel$Quantity, panel$Subject, FUN = function(x) rollmean(
                     x,3,align="right",fill=NA,na.rm=TRUE))

这将在应用于平衡的"面板"DF 时产生正确的结果。

问题在于 plmlag 依赖于均匀分布的序列来生成索引变量,而 rollapply 要求所有主题的观察次数(窗口大小)相等。

StackExchange 上有一个带有 data.table 的解决方案,它暗示了我的问题的解决方案:Producing a rolling average of an unbalanced panel data set

也许可以修改此解决方案以生成固定长度的移动平均线,而不是"滚动累积平均线"。


这会给你想要的结果吗?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
library(reshape2)
library(zoo)

# create time series where each subject have an observation at each time step
d1 <- data.frame(subject = rep(letters[1:4], each = 5),
                 day = rep(1:5, 4),
                 quantity = sample(x = 1:4, size = 20, replace = TRUE))
d1

# select some random observations
d2 <- d1[sample(x = seq_len(nrow(d1)), size = 15), ]
d2

# reshape to wide format with dcast
# -> 'automatic' extension from irregular to regular series for each subject,
# _given_ that all time steps are represented.
# Alternative method below more explicit

# fill for structural missings defaults to NA
d3 <- dcast(d2, day ~ subject, value.var ="quantity")
d3

# convert to zoo time series
z1 <- zoo(x = d3[ , -1], order.by = d3$day)

################################
# alternative method to extend time series
# time steps to include are given explicitly

# create a zero-dimensional zoo series
z0 <- zoo(, min(d1$day):max(d1$day))

# extend z1 to contain the same time indices as z0
z1 <- merge(z1, z0)
################################

# lag, defaults to one unit
z2 <- lag(x = z1)
z2

# calculate rolling mean with window width 3
rollmeanr(x = z2, k = 3)

# Handling of NAs:
# from ?rollmean:
#"The default method of rollmean does not handle inputs that contain NAs.
# In such cases, use rollapply instead.":
rollapplyr(data = z2, width = 3, FUN = mean, na.rm = TRUE)


所以,要回答我自己的问题,一种方法是通过 split-lapply(rollingaverage)-unlist:

1
2
3
4
Temp <-with(panelUNB, split(Quantity, Subject))
Temp <- lapply(Temp, FUN=function (x) rollapplyr(
   x,2,align="right",fill=NA,na.rm=TRUE, FUN=mean))
QuantityMA <-unlist(Temp)

然后必须将"QuantityMA"向量添加回主"panelUNB"框架。似乎正在工作。可以使用 ddply 在不平衡面板上完成滞后。

如果有人有其他可能更优雅的解决方案,欢迎分享。