Simple moving average on an unbalanced panel in R
我正在处理一个不平衡、不规则间隔的横截面时间序列。我的目标是获得"数量"向量的滞后移动平均向量,由"主题"分割。
换句话说,假设已对 Subject_1 观察到以下数量:
[1,2,3,4,5]。我首先需要将它滞后 1,得到 [NA,1,2,3,4]。
然后我需要取 3 阶的移动平均线,得到 [NA,NA,NA,(3 2 1)/3,(4 3 2)/3]
以上所有科目都需要完成。
1 2 3 4 5 6 7 8 9 10 11 | # Construct example balanced panel DF panel <- data.frame( as.factor(sort(rep(1:6,5))), rep(1:5,6), rnorm(30) ) colnames(panel)<- c("Subject","Day","Quantity") #Make panel DF unbalanced panelUNB <- subset(panel,as.numeric(Subject)!= Day) panelUNB <- panelUNB[-c(15,16),] |
如果面板是平衡的,我将首先使用包
然后我会像这样使用包
中的函数
1 2 | panel$QuantityMA <- ave(panel$Quantity, panel$Subject, FUN = function(x) rollmean( x,3,align="right",fill=NA,na.rm=TRUE)) |
这将在应用于平衡的"面板"DF 时产生正确的结果。
问题在于
StackExchange 上有一个带有 data.table 的解决方案,它暗示了我的问题的解决方案:Producing a rolling average of an unbalanced panel data set
也许可以修改此解决方案以生成固定长度的移动平均线,而不是"滚动累积平均线"。
这会给你想要的结果吗?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | library(reshape2) library(zoo) # create time series where each subject have an observation at each time step d1 <- data.frame(subject = rep(letters[1:4], each = 5), day = rep(1:5, 4), quantity = sample(x = 1:4, size = 20, replace = TRUE)) d1 # select some random observations d2 <- d1[sample(x = seq_len(nrow(d1)), size = 15), ] d2 # reshape to wide format with dcast # -> 'automatic' extension from irregular to regular series for each subject, # _given_ that all time steps are represented. # Alternative method below more explicit # fill for structural missings defaults to NA d3 <- dcast(d2, day ~ subject, value.var ="quantity") d3 # convert to zoo time series z1 <- zoo(x = d3[ , -1], order.by = d3$day) ################################ # alternative method to extend time series # time steps to include are given explicitly # create a zero-dimensional zoo series z0 <- zoo(, min(d1$day):max(d1$day)) # extend z1 to contain the same time indices as z0 z1 <- merge(z1, z0) ################################ # lag, defaults to one unit z2 <- lag(x = z1) z2 # calculate rolling mean with window width 3 rollmeanr(x = z2, k = 3) # Handling of NAs: # from ?rollmean: #"The default method of rollmean does not handle inputs that contain NAs. # In such cases, use rollapply instead.": rollapplyr(data = z2, width = 3, FUN = mean, na.rm = TRUE) |
所以,要回答我自己的问题,一种方法是通过 split-lapply(rollingaverage)-unlist:
1 2 3 4 | Temp <-with(panelUNB, split(Quantity, Subject)) Temp <- lapply(Temp, FUN=function (x) rollapplyr( x,2,align="right",fill=NA,na.rm=TRUE, FUN=mean)) QuantityMA <-unlist(Temp) |
然后必须将"QuantityMA"向量添加回主"panelUNB"框架。似乎正在工作。可以使用 ddply 在不平衡面板上完成滞后。
如果有人有其他可能更优雅的解决方案,欢迎分享。