关于算法:实时时间序列数据中的峰值信号检测

Peak signal detection in realtime timeseries data

更新:到目前为止性能最好的算法是这个。

这个问题探讨了检测实时时间序列数据中突发峰值的鲁棒算法。

考虑以下数据集:

1
2
3
p = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1 1 1 1.1 0.9 1 1.1 1 1 0.9 1, ...
     1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1 1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1 1 3, ...
     2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];

(matlab format but it's not about the language but about the algorithm)

Plot of data

你可以清楚地看到有三个大峰和一些小峰。此数据集是问题所涉及的时间序列数据集类的特定示例。这类数据集有两个一般特性:

  • 基本噪声具有一般平均值
  • 有很大的"峰值"或"更高的数据点"与噪声有很大的偏差。
  • 我们还假设如下:

    • 峰的宽度不能预先确定。
    • 峰的高度明显偏离其他值
    • 使用的算法必须实时计算(因此随着每个新数据点的变化而变化)

    对于这种情况,需要构造触发信号的边界值。但是,边界值不能是静态的,必须根据算法实时确定。

    我的问题是:什么是实时计算这种阈值的好算法?这种情况有具体的算法吗?最著名的算法是什么?

    强大的算法或有用的见解都受到高度重视。(可以用任何语言回答:这是关于算法的)


    平滑z-得分算法(具有鲁棒阈值的峰值检测)

    我构建了一个算法,它对这些类型的数据集非常有效。它是基于离散原理:如果一个新的数据点是一个给定的x个标准差,远离某个移动平均值,算法信号(也称为z-得分)。该算法具有很强的鲁棒性,因为它构造了一个独立的移动平均值和偏差,这样信号就不会破坏阈值。因此,未来信号的识别精度大致相同,而与先前信号的数量无关。该算法接受3个输入:lag = the lag of the moving windowthreshold = the z-score at which the algorithm signalsinfluence = the influence (between 0 and 1) of new signals on the mean and standard deviation。例如,一个5的lag将使用最后5个观察来平滑数据。如果数据点与移动平均值的标准偏差为3.5,则3.5的threshold将发出信号。0.5的influence给出了正常数据点影响的一半信号。同样,0的influence完全忽略重新计算新阈值的信号。因此,0的影响是最稳健的选项(但假设平稳);将影响选项设置为1是最不稳健的。对于非平稳数据,因此影响选项应介于0和1之间。好的。

    其工作原理如下:好的。

    伪码好的。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    # Let y be a vector of timeseries data of at least length lag+2
    # Let mean() be a function that calculates the mean
    # Let std() be a function that calculates the standard deviaton
    # Let absolute() be the absolute value function

    # Settings (the ones below are examples: choose what is best for your data)
    set lag to 5;          # lag 5 for the smoothing functions
    set threshold to 3.5;  # 3.5 standard deviations for signal
    set influence to 0.5;  # between 0 and 1, where 1 is normal influence, 0.5 is half

    # Initialise variables
    set signals to vector 0,...,0 of length of y;   # Initialize signal results
    set filteredY to y(1),...,y(lag)                # Initialize filtered series
    set avgFilter to null;                          # Initialize average filter
    set stdFilter to null;                          # Initialize std. filter
    set avgFilter(lag) to mean(y(1),...,y(lag));    # Initialize first value
    set stdFilter(lag) to std(y(1),...,y(lag));     # Initialize first value

    for i=lag+1,...,t do
      if absolute(y(i) - avgFilter(i-1)) > threshold*stdFilter(i-1) then
        if y(i) > avgFilter(i-1) then
          set signals(i) to +1;                     # Positive signal
        else
          set signals(i) to -1;                     # Negative signal
        end
        # Make influence lower
        set filteredY(i) to influence*y(i) + (1-influence)*filteredY(i-1);
      else
        set signals(i) to 0;                        # No signal
        set filteredY(i) to y(i);
      end
      # Adjust the filters
      set avgFilter(i) to mean(filteredY(i-lag),...,filteredY(i));
      set stdFilter(i) to std(filteredY(i-lag),...,filteredY(i));
    end

    为您的数据选择好参数的经验法则可以在下面找到。好的。演示

    Demonstration of robust thresholding algorithm好的。

    这里可以找到这个演示的matlab代码。要使用演示,只需运行它并通过单击上面的图表自己创建一个时间序列。算法在绘制lag观察次数后开始工作。好的。结果

    对于原始问题,当使用以下设置时,该算法将给出以下输出:lag = 30, threshold = 5, influence = 0:好的。

    Thresholding algorithm example好的。使用不同编程语言实现:

    • MATLAB(ME)
    • R(me)
    • 戈朗(Xeoncross)
    • Python(R kiselev)
    • Swift(我)
    • groovy(JoshuacWebDeveloper)
    • C++(BRAD)
    • C++(Animesh Pandey)
    • 锈病(暴雪)
    • 斯卡拉(麦克·罗伯茨)
    • 科特林(Leorderprofi)
    • 红宝石(Kimmo Lehto)
    • FORTRAN[用于共振检测](THO)
    • 茱莉亚(马特·坎普)
    • C(海上空投)
    • C(DaviDC)

    配置算法的经验法则

    lag:滞后参数决定了数据平滑程度和算法对数据长期平均值变化的适应性。您的数据越稳定,您应该包括的滞后就越多(这应该提高算法的稳健性)。如果您的数据包含时变趋势,您应该考虑您希望算法适应这些趋势的速度有多快。也就是说,如果将lag设为10,则需要10个"周期",才能将算法的应力调整为长期平均值的任何系统变化。因此,根据数据的趋势行为以及您希望算法的适应性选择lag参数。好的。

    influence:该参数决定了信号对算法检测阈值的影响。如果设为0,则信号对阈值没有影响,以便根据阈值检测未来信号,该阈值使用不受过去信号影响的平均值和标准偏差进行计算。另一种思考方法是,如果将影响值设为0,则隐含地假设平稳性(即,无论有多少信号,时间序列在长期内始终返回到相同的平均值)。如果不是这样,您应该将影响参数放在0到1之间的某个位置,这取决于信号系统地影响数据时变趋势的程度。例如,如果信号导致时间序列长期平均值的结构中断,则影响参数应设置为高(接近1),以便阈值能够快速调整以适应这些变化。好的。

    threshold:阈值参数是移动平均值的标准偏差数,高于移动平均值,算法将新数据点分类为信号。例如,如果新的数据点高于移动平均值4.0标准偏差,并且阈值参数设置为3.5,则算法将把数据点识别为信号。这个参数应该根据您期望的信号数量来设置。例如,如果您的数据是正态分布的,则阈值(或:z-得分)为3.5对应于0.00047的信令概率(来自此表),这意味着您希望每2128个数据点(1/0.00047)接收一次信号。因此,阈值直接影响算法的灵敏度,从而也影响算法信号的频率。检查您自己的数据,并确定一个合理的阈值,使算法在需要的时候发出信号(这里可能需要一些尝试和错误,以达到一个好的阈值)。好的。

    警告:上面的代码每次运行时总是在所有数据点上循环。在执行此代码时,请确保将信号的计算拆分为单独的函数(不带循环)。当新的数据点到达时,更新一次filteredYavgFilterstdFilter。每次有一个新的数据点(如上面的例子)时,不要重新计算所有数据的信号,这将是非常低效和缓慢的!好的。

    修改算法(用于潜在改进)的其他方法包括:好的。

  • 使用中位数而不是平均值
  • 使用可靠的尺度度量,如MAD,而不是标准差
  • 使用信号裕度,这样信号就不会频繁切换。
  • 更改影响参数的工作方式
  • 对上下信号进行不同的处理(不对称处理)
  • 为mean和std创建一个单独的influence参数(在这个swift翻译中完成)
  • (已知)学术引用:

    • 巴斯科佐斯,G.,道斯,J.M.,奥斯汀,J.S.,安图内斯马丁,A.,麦克德莫特,L.,克拉克,A.J.,…&Amp;Orengo,C.(2019年)。综合分析背根神经节长非编码RNA表达,揭示神经损伤后细胞类型特异性和调节异常。疼痛,160(2),463.好的。

    • Perkins,P.,Heber,S.(2018年)。利用基于z-分数的峰值检测算法识别核糖体暂停位点。IEEE第8届国际生物和医学科学计算进展会议(ICCABS),ISBN:978-1-5386-8520-4。好的。

    • Moore,J.、Goffin,P.、Meyer,M.、Lundrigan,P.、Patwari,N.、Sward,K.和Wiese,J.(2018年)。通过感知、注释和可视化空气质量数据管理家庭环境。互动、移动、可穿戴和无处不在技术的ACM会议录,2(3),128。好的。

    • Lo,O.、Buchanan,W.J.、Griffiths,P.和Macfarlane,R.(2018),《改进内部威胁检测、安全和通信网络的距离测量方法》,2018年第卷,文章ID 5906368。好的。

    • Scirea,M.(2017年)。情感音乐的产生及其对玩家体验的影响。哥本哈根IT大学数字设计博士论文。好的。

    • Scirea,M.,Eklund,P.,Togelius,J.,&Risi,S.(2017年)。原始即兴创作:走向共同进化的音乐即兴创作。计算机科学与电子工程(CEEC),2017年(第172-177页)。IEEE。好的。

    • Willems,P.(2017年)。泰文特大学硕士论文《情绪控制老年人情感环境》。好的。

    • Catalbas,M.C.、Cegovnik,T.、Sodnik,J.和Gulten,A.(2017年)。基于眼动的驾驶员疲劳检测,第10届国际电工与电子工程会议(ELECO),第913-917页。好的。

    • Ciocirdel,G.D.和Varga,M.(2016年)。基于维基百科页面浏览量的选举预测。阿姆斯特丹VRIje大学项目文件。好的。

    如果你在某个地方使用这个函数,请相信我或这个答案。如果您对该算法有任何疑问,请将其发表在下面的评论中,或在LinkedIn上与我联系。好的。好啊。


    这里是平滑z-分数算法的Pythonnumpy实现(见上面的答案)。你可以在这里找到要点。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    #!/usr/bin/env python
    # Implementation of algorithm from https://stackoverflow.com/a/22640362/6029703
    import numpy as np
    import pylab

    def thresholding_algo(y, lag, threshold, influence):
        signals = np.zeros(len(y))
        filteredY = np.array(y)
        avgFilter = [0]*len(y)
        stdFilter = [0]*len(y)
        avgFilter[lag - 1] = np.mean(y[0:lag])
        stdFilter[lag - 1] = np.std(y[0:lag])
        for i in range(lag, len(y)):
            if abs(y[i] - avgFilter[i-1]) > threshold * stdFilter [i-1]:
                if y[i] > avgFilter[i-1]:
                    signals[i] = 1
                else:
                    signals[i] = -1

                filteredY[i] = influence * y[i] + (1 - influence) * filteredY[i-1]
                avgFilter[i] = np.mean(filteredY[(i-lag+1):i+1])
                stdFilter[i] = np.std(filteredY[(i-lag+1):i+1])
            else:
                signals[i] = 0
                filteredY[i] = y[i]
                avgFilter[i] = np.mean(filteredY[(i-lag+1):i+1])
                stdFilter[i] = np.std(filteredY[(i-lag+1):i+1])

        return dict(signals = np.asarray(signals),
                    avgFilter = np.asarray(avgFilter),
                    stdFilter = np.asarray(stdFilter))

    下面是对同一数据集的测试,该数据集产生的绘图与RMatlab的原始答案相同。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    # Data
    y = np.array([1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9,
           1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,
           2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1])

    # Settings: lag = 30, threshold = 5, influence = 0
    lag = 30
    threshold = 5
    influence = 0

    # Run algo with settings from above
    result = thresholding_algo(y, lag=lag, threshold=threshold, influence=influence)

    # Plot result
    pylab.subplot(211)
    pylab.plot(np.arange(1, len(y)+1), y)

    pylab.plot(np.arange(1, len(y)+1),
               result["avgFilter"], color="cyan", lw=2)

    pylab.plot(np.arange(1, len(y)+1),
               result["avgFilter"] + threshold * result["stdFilter"], color="green", lw=2)

    pylab.plot(np.arange(1, len(y)+1),
               result["avgFilter"] - threshold * result["stdFilter"], color="green", lw=2)

    pylab.subplot(212)
    pylab.step(np.arange(1, len(y)+1), result["signals"], color="red", lw=2)
    pylab.ylim(-1.5, 1.5)


    一种方法是根据以下观察结果检测峰值:

    • 如果(y(t)>y(t-1))&;&;(y(t)>y(t+1),则时间t为峰值。

    它通过等待上升趋势结束来避免误报。它不完全是"实时的",因为它会错过一个dt的峰值。敏感度可以通过要求比较裕度来控制。在噪声检测和检测的时间延迟之间有一个权衡。您可以通过添加更多参数来丰富模型:

    • 峰值if(y(t)-y(t-dt)>m)&;&;(y(t)-y(t+dt)>m)

    其中dt和m是控制灵敏度和延时的参数。

    这就是您使用上述算法得到的结果:enter image description here

    下面是用python复制绘图的代码:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    import numpy as np
    import matplotlib.pyplot as plt
    input = np.array([ 1. ,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,  1.1,  1. ,  0.8,  0.9,
        1. ,  1.2,  0.9,  1. ,  1. ,  1.1,  1.2,  1. ,  1.5,  1. ,  3. ,
        2. ,  5. ,  3. ,  2. ,  1. ,  1. ,  1. ,  0.9,  1. ,  1. ,  3. ,
        2.6,  4. ,  3. ,  3.2,  2. ,  1. ,  1. ,  1. ,  1. ,  1. ])
    signal = (input > np.roll(input,1)) & (input > np.roll(input,-1))
    plt.plot(input)
    plt.plot(signal.nonzero()[0], input[signal], 'ro')
    plt.show()

    通过设置m = 0.5可以得到一个只有一个假阳性的更清晰的信号:enter image description here


    在信号处理中,峰值检测通常是通过小波变换来实现的。你基本上对你的时间序列数据进行离散小波变换。返回的细节系数中的零交叉将对应于时间序列信号中的峰值。在不同的细节系数水平上检测到不同的峰值振幅,从而获得多级分辨率。


    我们已经使用smoothed作为Z评分算法在我们的数据集,这结果在一oversensitivity或undersensitivity(取决于他们是如何调整参数,与小)中的地面。在我们网站的流量信号,我们已经观察到的一个低频率的基准代表每日周期甚至最好的可能的参数,它仍然显示下面的trailed)在第四节断开,因为大多数的数据点是异常是公认的。

    在原建筑顶Z评分算法,我们来解决这个问题的办法,通过反向过滤。在细节的修改算法和它的应用在电视商业交通的归因是张贴在我们的博客团队。

    enter image description here


    在计算拓扑学中,持久同调的概念导致了–与数字排序一样快–解决方案。它不仅检测峰,它以自然的方式量化峰的"显著性",允许您选择对您有意义的峰。

    算法摘要。在一维设置(时间序列、实值信号)中,可以很容易地用下图描述算法:

    Most persistent peaks

    将功能图(或其子水平集)想象为一幅景观,并考虑从无限水平(或图中的1.8)开始的水位下降。当水位下降时,当地的马克西玛群岛就会出现。在当地的最低点,这些岛屿合并在一起。这个想法的一个细节是,晚些时候出现的那个岛被合并成了更古老的那个岛。一个岛屿的"持久性"是它的出生时间减去死亡时间。蓝条的长度描述了持续性,这就是上面提到的峰值的"显著性"。

    效率。在对函数值进行排序之后,找到一个以线性时间运行的实现并不难——事实上,它是一个简单的循环。因此,这种实现在实践中应该很快,而且也很容易实现。

    参考文献。这里可以找到整个故事的描述和持久同调(计算代数拓扑中的一个字段)的动机参考:https://www.sthu.org/blog/13-perspotology-peakdetection/index.html


    在时间序列峰值检测的简单算法中,G.H.Palshikar发现了另一种算法。

    算法如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    algorithm peak1 // one peak detection algorithms that uses peak function S1

    input T = x1, x2, …, xN, N // input time-series of N points
    input k // window size around the peak
    input h // typically 1 <= h <= 3
    output O // set of peaks detected in T

    begin
    O = empty set // initially empty

        for (i = 1; i < n; i++) do
            // compute peak function value for each of the N points in T
            a[i] = S1(k,i,xi,T);
        end for

        Compute the mean m' and standard deviation s' of all positive values in array a;

        for (i = 1; i < n; i++) do // remove local peaks which are"small" in global context
            if (a[i] > 0 && (a[i] – m') >( h * s')) then O = O + {xi};
            end if
        end for

        Order peaks in O in terms of increasing index in T

        // retain only one peak out of any set of peaks within distance k of each other

        for every adjacent pair of peaks xi and xj in O do
            if |j – i| <= k then remove the smaller value of {xi, xj} from O
            end if
        end for
    end

    优势

    • 本文提供了5种不同的峰值检测算法。
    • 算法处理原始时间序列数据(不需要平滑)

    缺点

    • 难以预先确定kh
    • 峰不能是平的(就像我的测试数据中的第三个峰)

    例子:

    enter image description here


    下面是在Golang中平滑z-分数算法(上面)的一个实现。它假设一片[]int16(PCM 16位样本)。你可以在这里找到要点。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    /*
    Settings (the ones below are examples: choose what is best for your data)
    set lag to 5;          # lag 5 for the smoothing functions
    set threshold to 3.5;  # 3.5 standard deviations for signal
    set influence to 0.5;  # between 0 and 1, where 1 is normal influence, 0.5 is half
    */

    // ZScore on 16bit WAV samples
    func ZScore(samples []int16, lag int, threshold float64, influence float64) (signals []int16) {
        //lag := 20
        //threshold := 3.5
        //influence := 0.5

        signals = make([]int16, len(samples))
        filteredY := make([]int16, len(samples))
        for i, sample := range samples[0:lag] {
            filteredY[i] = sample
        }
        avgFilter := make([]int16, len(samples))
        stdFilter := make([]int16, len(samples))

        avgFilter[lag] = Average(samples[0:lag])
        stdFilter[lag] = Std(samples[0:lag])

        for i := lag + 1; i < len(samples); i++ {

            f := float64(samples[i])

            if float64(Abs(samples[i]-avgFilter[i-1])) > threshold*float64(stdFilter[i-1]) {
                if samples[i] > avgFilter[i-1] {
                    signals[i] = 1
                } else {
                    signals[i] = -1
                }
                filteredY[i] = int16(influence*f + (1-influence)*float64(filteredY[i-1]))
                avgFilter[i] = Average(filteredY[(i - lag):i])
                stdFilter[i] = Std(filteredY[(i - lag):i])
            } else {
                signals[i] = 0
                filteredY[i] = samples[i]
                avgFilter[i] = Average(filteredY[(i - lag):i])
                stdFilter[i] = Std(filteredY[(i - lag):i])
            }
        }

        return
    }

    // Average a chunk of values
    func Average(chunk []int16) (avg int16) {
        var sum int64
        for _, sample := range chunk {
            if sample < 0 {
                sample *= -1
            }
            sum += int64(sample)
        }
        return int16(sum / int64(len(chunk)))
    }


    这里是一个C++实现的平滑Z-得分算法,从这个答案

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    std::vector<int> smoothedZScore(std::vector<float> input)
    {  
        //lag 5 for the smoothing functions
        int lag = 5;
        //3.5 standard deviations for signal
        float threshold = 3.5;
        //between 0 and 1, where 1 is normal influence, 0.5 is half
        float influence = .5;

        if (input.size() <= lag + 2)
        {
            std::vector<int> emptyVec;
            return emptyVec;
        }

        //Initialise variables
        std::vector<int> signals(input.size(), 0.0);
        std::vector<float> filteredY(input.size(), 0.0);
        std::vector<float> avgFilter(input.size(), 0.0);
        std::vector<float> stdFilter(input.size(), 0.0);
        std::vector<float> subVecStart(input.begin(), input.begin() + lag);
        avgFilter[lag] = mean(subVecStart);
        stdFilter[lag] = stdDev(subVecStart);

        for (size_t i = lag + 1; i < input.size(); i++)
        {
            if (std::abs(input[i] - avgFilter[i - 1]) > threshold * stdFilter[i - 1])
            {
                if (input[i] > avgFilter[i - 1])
                {
                    signals[i] = 1; //# Positive signal
                }
                else
                {
                    signals[i] = -1; //# Negative signal
                }
                //Make influence lower
                filteredY[i] = influence* input[i] + (1 - influence) * filteredY[i - 1];
            }
            else
            {
                signals[i] = 0; //# No signal
                filteredY[i] = input[i];
            }
            //Adjust the filters
            std::vector<float> subVec(filteredY.begin() + i - lag, filteredY.begin() + i);
            avgFilter[i] = mean(subVec);
            stdFilter[i] = stdDev(subVec);
        }
        return signals;
    }


    这个问题看起来类似于我在混合/嵌入式系统课程中遇到的问题,但这与当传感器的输入有噪声时检测故障有关。我们使用卡尔曼滤波器来估计/预测系统的隐藏状态,然后使用统计分析来确定故障发生的可能性。我们研究的是线性系统,但存在非线性变量。我记得这种方法具有惊人的适应性,但它需要一个系统动力学模型。


    C++实现

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    #include <iostream>
    #include <vector>
    #include
    #include <unordered_map>
    #include <cmath>
    #include <iterator>
    #include <numeric>

    using namespace std;

    typedef long double ld;
    typedef unsigned int uint;
    typedef std::vector<ld>::iterator vec_iter_ld;

    /**
     * Overriding the ostream operator for pretty printing vectors.
     */
    template<typename T>
    std::ostream &operator<<(std::ostream &os, std::vector<T> vec) {
        os <<"[";
        if (vec.size() != 0) {
            std::copy(vec.begin(), vec.end() - 1, std::ostream_iterator<T>(os,""));
            os << vec.back();
        }
        os <<"]";
        return os;
    }

    /**
     * This class calculates mean and standard deviation of a subvector.
     * This is basically stats computation of a subvector of a window size qual to"lag".
     */
    class VectorStats {
    public:
        /**
         * Constructor for VectorStats class.
         *
         * @param start - This is the iterator position of the start of the window,
         * @param end   - This is the iterator position of the end of the window,
         */
        VectorStats(vec_iter_ld start, vec_iter_ld end) {
            this->start = start;
            this->end = end;
            this->compute();
        }

        /**
         * This method calculates the mean and standard deviation using STL function.
         * This is the Two-Pass implementation of the Mean & Variance calculation.
         */
        void compute() {
            ld sum = std::accumulate(start, end, 0.0);
            uint slice_size = std::distance(start, end);
            ld mean = sum / slice_size;
            std::vector<ld> diff(slice_size);
            std::transform(start, end, diff.begin(), [mean](ld x) { return x - mean; });
            ld sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
            ld std_dev = std::sqrt(sq_sum / slice_size);

            this->m1 = mean;
            this->m2 = std_dev;
        }

        ld mean() {
            return m1;
        }

        ld standard_deviation() {
            return m2;
        }

    private:
        vec_iter_ld start;
        vec_iter_ld end;
        ld m1;
        ld m2;
    };

    /**
     * This is the implementation of the Smoothed Z-Score Algorithm.
     * This is direction translation of https://stackoverflow.com/a/22640362/1461896.
     *
     * @param input - input signal
     * @param lag - the lag of the moving window
     * @param threshold - the z-score at which the algorithm signals
     * @param influence - the influence (between 0 and 1) of new signals on the mean and standard deviation
     * @return a hashmap containing the filtered signal and corresponding mean and standard deviation.
     */
    unordered_map<string, vector<ld>> z_score_thresholding(vector<ld> input, int lag, ld threshold, ld influence) {
        unordered_map<string, vector<ld>> output;

        uint n = (uint) input.size();
        vector<ld> signals(input.size());
        vector<ld> filtered_input(input.begin(), input.end());
        vector<ld> filtered_mean(input.size());
        vector<ld> filtered_stddev(input.size());

        VectorStats lag_subvector_stats(input.begin(), input.begin() + lag);
        filtered_mean[lag - 1] = lag_subvector_stats.mean();
        filtered_stddev[lag - 1] = lag_subvector_stats.standard_deviation();

        for (int i = lag; i < n; i++) {
            if (abs(input[i] - filtered_mean[i - 1]) > threshold * filtered_stddev[i - 1]) {
                signals[i] = (input[i] > filtered_mean[i - 1]) ? 1.0 : -1.0;
                filtered_input[i] = influence * input[i] + (1 - influence) * filtered_input[i - 1];
            } else {
                signals[i] = 0.0;
                filtered_input[i] = input[i];
            }
            VectorStats lag_subvector_stats(filtered_input.begin() + (i - lag), filtered_input.begin() + i);
            filtered_mean[i] = lag_subvector_stats.mean();
            filtered_stddev[i] = lag_subvector_stats.standard_deviation();
        }

        output["signals"] = signals;
        output["filtered_mean"] = filtered_mean;
        output["filtered_stddev"] = filtered_stddev;

        return output;
    };

    int main() {
        vector<ld> input = {1.0, 1.0, 1.1, 1.0, 0.9, 1.0, 1.0, 1.1, 1.0, 0.9, 1.0, 1.1, 1.0, 1.0, 0.9, 1.0, 1.0, 1.1, 1.0,
                            1.0, 1.0, 1.0, 1.1, 0.9, 1.0, 1.1, 1.0, 1.0, 0.9, 1.0, 1.1, 1.0, 1.0, 1.1, 1.0, 0.8, 0.9, 1.0,
                            1.2, 0.9, 1.0, 1.0, 1.1, 1.2, 1.0, 1.5, 1.0, 3.0, 2.0, 5.0, 3.0, 2.0, 1.0, 1.0, 1.0, 0.9, 1.0,
                            1.0, 3.0, 2.6, 4.0, 3.0, 3.2, 2.0, 1.0, 1.0, 0.8, 4.0, 4.0, 2.0, 2.5, 1.0, 1.0, 1.0};

        int lag = 30;
        ld threshold = 5.0;
        ld influence = 0.0;
        unordered_map<string, vector<ld>> output = z_score_thresholding(input, lag, threshold, influence);
        cout << output["signals"] << endl;
    }

    附录1:MatlabR到原始和翻译的答案

    MATLAB的代码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    function [signals,avgFilter,stdFilter] = ThresholdingAlgo(y,lag,threshold,influence)
    % Initialise signal results
    signals = zeros(length(y),1);
    % Initialise filtered series
    filteredY = y(1:lag+1);
    % Initialise filters
    avgFilter(lag+1,1) = mean(y(1:lag+1));
    stdFilter(lag+1,1) = std(y(1:lag+1));
    % Loop over all datapoints y(lag+2),...,y(t)
    for i=lag+2:length(y)
        % If new value is a specified number of deviations away
        if abs(y(i)-avgFilter(i-1)) > threshold*stdFilter(i-1)
            if y(i) > avgFilter(i-1)
                % Positive signal
                signals(i) = 1;
            else
                % Negative signal
                signals(i) = -1;
            end
            % Make influence lower
            filteredY(i) = influence*y(i)+(1-influence)*filteredY(i-1);
        else
            % No signal
            signals(i) = 0;
            filteredY(i) = y(i);
        end
        % Adjust the filters
        avgFilter(i) = mean(filteredY(i-lag:i));
        stdFilter(i) = std(filteredY(i-lag:i));
    end
    % Done, now return results
    end

    例子:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    % Data
    y = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1,...
        1 1 1.1 0.9 1 1.1 1 1 0.9 1 1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1,...
        1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1,...
        1 3 2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];

    % Settings
    lag = 30;
    threshold = 5;
    influence = 0;

    % Get results
    [signals,avg,dev] = ThresholdingAlgo(y,lag,threshold,influence);

    figure; subplot(2,1,1); hold on;
    x = 1:length(y); ix = lag+1:length(y);
    area(x(ix),avg(ix)+threshold*dev(ix),'FaceColor',[0.9 0.9 0.9],'EdgeColor','none');
    area(x(ix),avg(ix)-threshold*dev(ix),'FaceColor',[1 1 1],'EdgeColor','none');
    plot(x(ix),avg(ix),'LineWidth',1,'Color','cyan','LineWidth',1.5);
    plot(x(ix),avg(ix)+threshold*dev(ix),'LineWidth',1,'Color','green','LineWidth',1.5);
    plot(x(ix),avg(ix)-threshold*dev(ix),'LineWidth',1,'Color','green','LineWidth',1.5);
    plot(1:length(y),y,'b');
    subplot(2,1,2);
    stairs(signals,'r','LineWidth',1.5); ylim([-1.5 1.5]);

    R代码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    ThresholdingAlgo <- function(y,lag,threshold,influence) {
      signals <- rep(0,length(y))
      filteredY <- y[0:lag]
      avgFilter <- NULL
      stdFilter <- NULL
      avgFilter[lag] <- mean(y[0:lag])
      stdFilter[lag] <- sd(y[0:lag])
      for (i in (lag+1):length(y)){
        if (abs(y[i]-avgFilter[i-1]) > threshold*stdFilter[i-1]) {
          if (y[i] > avgFilter[i-1]) {
            signals[i] <- 1;
          } else {
            signals[i] <- -1;
          }
          filteredY[i] <- influence*y[i]+(1-influence)*filteredY[i-1]
        } else {
          signals[i] <- 0
          filteredY[i] <- y[i]
        }
        avgFilter[i] <- mean(filteredY[(i-lag):i])
        stdFilter[i] <- sd(filteredY[(i-lag):i])
      }
      return(list("signals"=signals,"avgFilter"=avgFilter,"stdFilter"=stdFilter))
    }

    例子:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    # Data
    y <- c(1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9,
           1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,
           2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1)

    lag       <- 30
    threshold <- 5
    influence <- 0

    # Run algo with lag = 30, threshold = 5, influence = 0
    result <- ThresholdingAlgo(y,lag,threshold,influence)

    # Plot result
    par(mfrow = c(2,1),oma = c(2,2,0,0) + 0.1,mar = c(0,0,2,1) + 0.2)
    plot(1:length(y),y,type="l",ylab="",xlab="")
    lines(1:length(y),result$avgFilter,type="l",col="cyan",lwd=2)
    lines(1:length(y),result$avgFilter+threshold*result$stdFilter,type="l",col="green",lwd=2)
    lines(1:length(y),result$avgFilter-threshold*result$stdFilter,type="l",col="green",lwd=2)
    plot(result$signals,type="S",col="red",ylab="",xlab="",ylim=c(-1.5,1.5),lwd=2)

    本代码(这两个语言要收益,结果为:原始数据的问题

    Thresholding example from Matlab code

    附录2到原始的答案:Matlab演示代码

    (点击创建数据)

    Matlab demo

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    function [] = RobustThresholdingDemo()

    %% SPECIFICATIONS
    lag         = 5;       % lag for the smoothing
    threshold   = 3.5;     % number of st.dev. away from the mean to signal
    influence   = 0.3;     % when signal: how much influence for new data? (between 0 and 1)
                           % 1 is normal influence, 0.5 is half      
    %% START DEMO
    DemoScreen(30,lag,threshold,influence);

    end

    function [signals,avgFilter,stdFilter] = ThresholdingAlgo(y,lag,threshold,influence)
    signals = zeros(length(y),1);
    filteredY = y(1:lag+1);
    avgFilter(lag+1,1) = mean(y(1:lag+1));
    stdFilter(lag+1,1) = std(y(1:lag+1));
    for i=lag+2:length(y)
        if abs(y(i)-avgFilter(i-1)) > threshold*stdFilter(i-1)
            if y(i) > avgFilter(i-1)
                signals(i) = 1;
            else
                signals(i) = -1;
            end
            filteredY(i) = influence*y(i)+(1-influence)*filteredY(i-1);
        else
            signals(i) = 0;
            filteredY(i) = y(i);
        end
        avgFilter(i) = mean(filteredY(i-lag:i));
        stdFilter(i) = std(filteredY(i-lag:i));
    end
    end

    % Demo screen function
    function [] = DemoScreen(n,lag,threshold,influence)
    figure('Position',[200 100,1000,500]);
    subplot(2,1,1);
    title(sprintf(['Draw data points (%.0f max)      [settings: lag = %.0f, '...
        'threshold = %.2f, influence = %.2f]'],n,lag,threshold,influence));
    ylim([0 5]); xlim([0 50]);
    H = gca; subplot(2,1,1);
    set(H, 'YLimMode', 'manual'); set(H, 'XLimMode', 'manual');
    set(H, 'YLim', get(H,'YLim')); set(H, 'XLim', get(H,'XLim'));
    xg = []; yg = [];
    for i=1:n
        try
            [xi,yi] = ginput(1);
        catch
            return;
        end
        xg = [xg xi]; yg = [yg yi];
        if i == 1
            subplot(2,1,1); hold on;
            plot(H, xg(i),yg(i),'r.');
            text(xg(i),yg(i),num2str(i),'FontSize',7);
        end
        if length(xg) > lag
            [signals,avg,dev] = ...
                ThresholdingAlgo(yg,lag,threshold,influence);
            area(xg(lag+1:end),avg(lag+1:end)+threshold*dev(lag+1:end),...
                'FaceColor',[0.9 0.9 0.9],'EdgeColor','none');
            area(xg(lag+1:end),avg(lag+1:end)-threshold*dev(lag+1:end),...
                'FaceColor',[1 1 1],'EdgeColor','none');
            plot(xg(lag+1:end),avg(lag+1:end),'LineWidth',1,'Color','cyan');
            plot(xg(lag+1:end),avg(lag+1:end)+threshold*dev(lag+1:end),...
                'LineWidth',1,'Color','green');
            plot(xg(lag+1:end),avg(lag+1:end)-threshold*dev(lag+1:end),...
                'LineWidth',1,'Color','green');
            subplot(2,1,2); hold on; title('Signal output');
            stairs(xg(lag+1:end),signals(lag+1:end),'LineWidth',2,'Color','blue');
            ylim([-2 2]); xlim([0 50]); hold off;
        end
        subplot(2,1,1); hold on;
        for j=2:i
            plot(xg([j-1:j]),yg([j-1:j]),'r'); plot(H,xg(j),yg(j),'r.');
            text(xg(j),yg(j),num2str(j),'FontSize',7);
        end
    end
    end

    在我的Android项目中我需要类似的东西。我想我可能会回馈Kotlin的实现。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    /**
    * Smoothed zero-score alogrithm shamelessly copied from https://stackoverflow.com/a/22640362/6029703
    * Uses a rolling mean and a rolling deviation (separate) to identify peaks in a vector
    *
    * @param y - The input vector to analyze
    * @param lag - The lag of the moving window (i.e. how big the window is)
    * @param threshold - The z-score at which the algorithm signals (i.e. how many standard deviations away from the moving mean a peak (or signal) is)
    * @param influence - The influence (between 0 and 1) of new signals on the mean and standard deviation (how much a peak (or signal) should affect other values near it)
    * @return - The calculated averages (avgFilter) and deviations (stdFilter), and the signals (signals)
    */
    fun smoothedZScore(y: List<Double>, lag: Int, threshold: Double, influence: Double): Triple<List<Int>, List<Double>, List<Double>> {
        val stats = SummaryStatistics()
        // the results (peaks, 1 or -1) of our algorithm
        val signals = MutableList<Int>(y.size, { 0 })
        // filter out the signals (peaks) from our original list (using influence arg)
        val filteredY = ArrayList<Double>(y)
        // the current average of the rolling window
        val avgFilter = MutableList<Double>(y.size, { 0.0 })
        // the current standard deviation of the rolling window
        val stdFilter = MutableList<Double>(y.size, { 0.0 })
        // init avgFilter and stdFilter
        y.take(lag).forEach { s -> stats.addValue(s) }
        avgFilter[lag - 1] = stats.mean
        stdFilter[lag - 1] = Math.sqrt(stats.populationVariance) // getStandardDeviation() uses sample variance (not what we want)
        stats.clear()
        //loop input starting at end of rolling window
        (lag..y.size - 1).forEach { i ->
            //if the distance between the current value and average is enough standard deviations (threshold) away
            if (Math.abs(y[i] - avgFilter[i - 1]) > threshold * stdFilter[i - 1]) {
                //this is a signal (i.e. peak), determine if it is a positive or negative signal
                signals[i] = if (y[i] > avgFilter[i - 1]) 1 else -1
                //filter this signal out using influence
                filteredY[i] = (influence * y[i]) + ((1 - influence) * filteredY[i - 1])
            } else {
                //ensure this signal remains a zero
                signals[i] = 0
                //ensure this value is not filtered
                filteredY[i] = y[i]
            }
            //update rolling average and deviation
            (i - lag..i - 1).forEach { stats.addValue(filteredY[it]) }
            avgFilter[i] = stats.getMean()
            stdFilter[i] = Math.sqrt(stats.getPopulationVariance()) //getStandardDeviation() uses sample variance (not what we want)
            stats.clear()
        }
        return Triple(signals, avgFilter, stdFilter)
    }

    在Github上可以找到带有验证图的示例项目。

    enter image description here


    下面是我试图根据公认的答案为"平滑z-分数算法"创建一个Ruby解决方案:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    module ThresholdingAlgoMixin
      def mean(array)
        array.reduce(&:+) / array.size.to_f
      end

      def stddev(array)
        array_mean = mean(array)
        Math.sqrt(array.reduce(0.0) { |a, b| a.to_f + ((b.to_f - array_mean) ** 2) } / array.size.to_f)
      end

      def thresholding_algo(lag: 5, threshold: 3.5, influence: 0.5)
        return nil if size < lag * 2
        Array.new(size, 0).tap do |signals|
          filtered = Array.new(self)

          initial_slice = take(lag)
          avg_filter = Array.new(lag - 1, 0.0) + [mean(initial_slice)]
          std_filter = Array.new(lag - 1, 0.0) + [stddev(initial_slice)]
          (lag..size-1).each do |idx|
            prev = idx - 1
            if (fetch(idx) - avg_filter[prev]).abs > threshold * std_filter[prev]
              signals[idx] = fetch(idx) > avg_filter[prev] ? 1 : -1
              filtered[idx] = (influence * fetch(idx)) + ((1-influence) * filtered[prev])
            end

            filtered_slice = filtered[idx-lag..prev]
            avg_filter[idx] = mean(filtered_slice)
            std_filter[idx] = stddev(filtered_slice)
          end
        end
      end
    end

    示例用法:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    test_data = [
      1, 1, 1.1, 1, 0.9, 1, 1, 1.1, 1, 0.9, 1, 1.1, 1, 1, 0.9, 1,
      1, 1.1, 1, 1, 1, 1, 1.1, 0.9, 1, 1.1, 1, 1, 0.9, 1, 1.1, 1,
      1, 1.1, 1, 0.8, 0.9, 1, 1.2, 0.9, 1, 1, 1.1, 1.2, 1, 1.5,
      1, 3, 2, 5, 3, 2, 1, 1, 1, 0.9, 1, 1, 3, 2.6, 4, 3, 3.2, 2,
      1, 1, 0.8, 4, 4, 2, 2.5, 1, 1, 1
    ].extend(ThresholdingAlgoMixin)

    puts test_data.thresholding_algo.inspect

    # Output: [
    #   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    #   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0,
    #   0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
    #   1, 1, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0
    # ]


    这里是修改后的Fortran版本的z-分数算法。它专门用于频率空间中传递函数的峰值(共振)检测(每个变化在代码中都有一个小注释)。

    如果输入向量的下限附近存在共振,则第一次修改会向用户发出警告,该共振由高于某个阈值的标准偏差表示(在本例中为10%)。这仅仅意味着信号不够平坦,检测无法正确初始化滤波器。

    第二个修改是,只将峰值的最高值添加到找到的峰值中。这是通过将每个找到的峰值与其(滞后)前辈及其(滞后)后继者的数量进行比较来实现的。

    第三个变化是考虑到共振峰通常在共振频率周围显示出某种形式的对称性。因此,在当前数据点周围对称地计算平均值和std是很自然的(而不仅仅是对于前一代)。这会导致更好的峰值检测行为。

    修改后的效果是,整个信号必须事先告知函数,这是共振检测的常见情况(类似于Jean-Paul的matlab示例,在该示例中,动态生成数据点是行不通的)。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    function PeakDetect(y,lag,threshold, influence)
        implicit none
        ! Declaring part
        real, dimension(:), intent(in) :: y
        integer, dimension(size(y)) :: PeakDetect
        real, dimension(size(y)) :: filteredY, avgFilter, stdFilter
        integer :: lag, ii
        real :: threshold, influence

        ! Executing part
        PeakDetect = 0
        filteredY = 0.0
        filteredY(1:lag+1) = y(1:lag+1)
        avgFilter = 0.0
        avgFilter(lag+1) = mean(y(1:2*lag+1))
        stdFilter = 0.0
        stdFilter(lag+1) = std(y(1:2*lag+1))

        if (stdFilter(lag+1)/avgFilter(lag+1)>0.1) then ! If the coefficient of variation exceeds 10%, the signal is too uneven at the start, possibly because of a peak.
            write(unit=*,fmt=1001)
    1001        format(1X,'Warning: Peak detection might have failed, as there may be a peak at the edge of the frequency range.',/)
        end if
        do ii = lag+2, size(y)
            if (abs(y(ii) - avgFilter(ii-1)) > threshold * stdFilter(ii-1)) then
                ! Find only the largest outstanding value which is only the one greater than its predecessor and its successor
                if (y(ii) > avgFilter(ii-1) .AND. y(ii) > y(ii-1) .AND. y(ii) > y(ii+1)) then
                    PeakDetect(ii) = 1
                end if
                filteredY(ii) = influence * y(ii) + (1 - influence) * filteredY(ii-1)
            else
                filteredY(ii) = y(ii)
            end if
            ! Modified with respect to the original code. Mean and standard deviation are calculted symmetrically around the current point
            avgFilter(ii) = mean(filteredY(ii-lag:ii+lag))
            stdFilter(ii) = std(filteredY(ii-lag:ii+lag))
        end do
    end function PeakDetect

    real function mean(y)
        !> @brief Calculates the mean of vector y
        implicit none
        ! Declaring part
        real, dimension(:), intent(in) :: y
        integer :: N
        ! Executing part
        N = max(1,size(y))
        mean = sum(y)/N
    end function mean

    real function std(y)
        !> @brief Calculates the standard deviation of vector y
        implicit none
        ! Declaring part
        real, dimension(:), intent(in) :: y
        integer :: N
        ! Executing part
        N = max(1,size(y))
        std = sqrt((N*dot_product(y,y) - sum(y)**2) / (N*(N-1)))
    end function std

    对于我的应用程序来说,该算法的工作方式很有魅力!enter image description here


    这里是平滑的Z-Realm算法的Groovy(Java)实现(见上面的答案)。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    /**
     *"Smoothed zero-score alogrithm" shamelessly copied from https://stackoverflow.com/a/22640362/6029703
     *  Uses a rolling mean and a rolling deviation (separate) to identify peaks in a vector
     *
     * @param y - The input vector to analyze
     * @param lag - The lag of the moving window (i.e. how big the window is)
     * @param threshold - The z-score at which the algorithm signals (i.e. how many standard deviations away from the moving mean a peak (or signal) is)
     * @param influence - The influence (between 0 and 1) of new signals on the mean and standard deviation (how much a peak (or signal) should affect other values near it)
     * @return - The calculated averages (avgFilter) and deviations (stdFilter), and the signals (signals)
     */

    public HashMap<String, List<Object>> thresholdingAlgo(List<Double> y, Long lag, Double threshold, Double influence) {
        //init stats instance
        SummaryStatistics stats = new SummaryStatistics()

        //the results (peaks, 1 or -1) of our algorithm
        List<Integer> signals = new ArrayList<Integer>(Collections.nCopies(y.size(), 0))
        //filter out the signals (peaks) from our original list (using influence arg)
        List<Double> filteredY = new ArrayList<Double>(y)
        //the current average of the rolling window
        List<Double> avgFilter = new ArrayList<Double>(Collections.nCopies(y.size(), 0.0d))
        //the current standard deviation of the rolling window
        List<Double> stdFilter = new ArrayList<Double>(Collections.nCopies(y.size(), 0.0d))
        //init avgFilter and stdFilter
        (0..lag-1).each { stats.addValue(y[it as int]) }
        avgFilter[lag - 1 as int] = stats.getMean()
        stdFilter[lag - 1 as int] = Math.sqrt(stats.getPopulationVariance()) //getStandardDeviation() uses sample variance (not what we want)
        stats.clear()
        //loop input starting at end of rolling window
        (lag..y.size()-1).each { i ->
            //if the distance between the current value and average is enough standard deviations (threshold) away
            if (Math.abs((y[i as int] - avgFilter[i - 1 as int]) as Double) > threshold * stdFilter[i - 1 as int]) {
                //this is a signal (i.e. peak), determine if it is a positive or negative signal
                signals[i as int] = (y[i as int] > avgFilter[i - 1 as int]) ? 1 : -1
                //filter this signal out using influence
                filteredY[i as int] = (influence * y[i as int]) + ((1-influence) * filteredY[i - 1 as int])
            } else {
                //ensure this signal remains a zero
                signals[i as int] = 0
                //ensure this value is not filtered
                filteredY[i as int] = y[i as int]
            }
            //update rolling average and deviation
            (i - lag..i-1).each { stats.addValue(filteredY[it as int] as Double) }
            avgFilter[i as int] = stats.getMean()
            stdFilter[i as int] = Math.sqrt(stats.getPopulationVariance()) //getStandardDeviation() uses sample variance (not what we want)
            stats.clear()
        }

        return [
            signals  : signals,
            avgFilter: avgFilter,
            stdFilter: stdFilter
        ]
    }

    下面是对同一个数据集的测试,其结果与上面的python/numpy实现相同。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
        // Data
        def y = [1d, 1d, 1.1d, 1d, 0.9d, 1d, 1d, 1.1d, 1d, 0.9d, 1d, 1.1d, 1d, 1d, 0.9d, 1d, 1d, 1.1d, 1d, 1d,
             1d, 1d, 1.1d, 0.9d, 1d, 1.1d, 1d, 1d, 0.9d, 1d, 1.1d, 1d, 1d, 1.1d, 1d, 0.8d, 0.9d, 1d, 1.2d, 0.9d, 1d,
             1d, 1.1d, 1.2d, 1d, 1.5d, 1d, 3d, 2d, 5d, 3d, 2d, 1d, 1d, 1d, 0.9d, 1d,
             1d, 3d, 2.6d, 4d, 3d, 3.2d, 2d, 1d, 1d, 0.8d, 4d, 4d, 2d, 2.5d, 1d, 1d, 1d]

        // Settings
        def lag = 30
        def threshold = 5
        def influence = 0


        def thresholdingResults = thresholdingAlgo((List<Double>) y, (Long) lag, (Double) threshold, (Double) influence)

        println y.size()
        println thresholdingResults.signals.size()
        println thresholdingResults.signals

        thresholdingResults.signals.eachWithIndex { x, idx ->
            if (x) {
                println y[idx]
            }
        }


    迭代版本的Python / NumPy的回答stackoverflow.com http:////a/22640362 6029703是这里。本代码是计算的平均和标准偏差的速度比大型数据率(每10万+)。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    def peak_detection_smoothed_zscore_v2(x, lag, threshold, influence):
        '''
        iterative smoothed z-score algorithm
        Implementation of algorithm from https://stackoverflow.com/a/22640362/6029703
        '''
        import numpy as np
        labels = np.zeros(len(x))
        filtered_y = np.array(x)
        avg_filter = np.zeros(len(x))
        std_filter = np.zeros(len(x))
        var_filter = np.zeros(len(x))

        avg_filter[lag - 1] = np.mean(x[0:lag])
        std_filter[lag - 1] = np.std(x[0:lag])
        var_filter[lag - 1] = np.var(x[0:lag])
        for i in range(lag, len(x)):
            if abs(x[i] - avg_filter[i - 1]) > threshold * std_filter[i - 1]:
                if x[i] > avg_filter[i - 1]:
                    labels[i] = 1
                else:
                    labels[i] = -1
                filtered_y[i] = influence * x[i] + (1 - influence) * filtered_y[i - 1]
            else:
                labels[i] = 0
                filtered_y[i] = x[i]
            # update avg, var, std
            avg_filter[i] = avg_filter[i - 1] + 1. / lag * (filtered_y[i] - filtered_y[i - lag])
            var_filter[i] = var_filter[i - 1] + 1. / lag * ((filtered_y[i] - avg_filter[i - 1]) ** 2 - (
                filtered_y[i - lag] - avg_filter[i - 1]) ** 2 - (filtered_y[i] - filtered_y[i - lag]) ** 2 / lag)
            std_filter[i] = np.sqrt(var_filter[i])

        return dict(signals=labels,
                    avgFilter=avg_filter,
                    stdFilter=std_filter)


    我会提供我的朱丽叶的思想实现的算法。胃肠道间质瘤可以在这里找到

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    using Statistics
    using Plots
    function SmoothedZscoreAlgo(y, lag, threshold, influence)
        # Julia implimentation of http://stackoverflow.com/a/22640362/6029703
        n = length(y)
        signals = zeros(n) # init signal results
        filteredY = copy(y) # init filtered series
        avgFilter = zeros(n) # init average filter
        stdFilter = zeros(n) # init std filter
        avgFilter[lag - 1] = mean(y[1:lag]) # init first value
        stdFilter[lag - 1] = std(y[1:lag]) # init first value

        for i in range(lag, stop=n-1)
            if abs(y[i] - avgFilter[i-1]) > threshold*stdFilter[i-1]
                if y[i] > avgFilter[i-1]
                    signals[i] += 1 # postive signal
                else
                    signals[i] += -1 # negative signal
                end
                # Make influence lower
                filteredY[i] = influence*y[i] + (1-influence)*filteredY[i-1]
            else
                signals[i] = 0
                filteredY[i] = y[i]
            end
            avgFilter[i] = mean(filteredY[i-lag+1:i])
            stdFilter[i] = std(filteredY[i-lag+1:i])
        end
        return (signals = signals, avgFilter = avgFilter, stdFilter = stdFilter)
    end


    # Data
    y = [1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9,
           1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,
           2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1]

    # Settings: lag = 30, threshold = 5, influence = 0
    lag = 30
    threshold = 5
    influence = 0

    results = SmoothedZscoreAlgo(y, lag, threshold, influence)
    upper_bound = results[:avgFilter] + threshold * results[:stdFilter]
    lower_bound = results[:avgFilter] - threshold * results[:stdFilter]
    x = 1:length(y)

    yplot = plot(x,y,color="blue", label="Y",legend=:topleft)
    yplot = plot!(x,upper_bound, color="green", label="Upper Bound",legend=:topleft)
    yplot = plot!(x,results[:avgFilter], color="cyan", label="Average Filter",legend=:topleft)
    yplot = plot!(x,lower_bound, color="green", label="Lower Bound",legend=:topleft)
    signalplot = plot(x,results[:signals],color="red",label="Signals",legend=:topleft)
    plot(yplot,signalplot,layout=(2,1),legend=:topleft)

    Results


    在从下面的"让-保罗提出的解决方案,我有一个在他的#算法C实现

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    public class ZScoreOutput
    {
        public List<double> input;
        public List<int> signals;
        public List<double> avgFilter;
        public List<double> filtered_stddev;
    }

    public static class ZScore
    {
        public static ZScoreOutput StartAlgo(List<double> input, int lag, double threshold, double influence)
        {
            // init variables!
            int[] signals = new int[input.Count];
            double[] filteredY = new List<double>(input).ToArray();
            double[] avgFilter = new double[input.Count];
            double[] stdFilter = new double[input.Count];

            var initialWindow = new List<double>(filteredY).Skip(0).Take(lag).ToList();

            avgFilter[lag - 1] = Mean(initialWindow);
            stdFilter[lag - 1] = StdDev(initialWindow);

            for (int i = lag; i < input.Count; i++)
            {
                if (Math.Abs(input[i] - avgFilter[i - 1]) > threshold * stdFilter[i - 1])
                {
                    signals[i] = (input[i] > avgFilter[i - 1]) ? 1 : -1;
                    filteredY[i] = influence * input[i] + (1 - influence) * filteredY[i - 1];
                }
                else
                {
                    signals[i] = 0;
                    filteredY[i] = input[i];
                }

                // Update rolling average and deviation
                var slidingWindow = new List<double>(filteredY).Skip(i - lag).Take(lag+1).ToList();

                var tmpMean = Mean(slidingWindow);
                var tmpStdDev = StdDev(slidingWindow);

                avgFilter[i] = Mean(slidingWindow);
                stdFilter[i] = StdDev(slidingWindow);
            }

            // Copy to convenience class
            var result = new ZScoreOutput();
            result.input = input;
            result.avgFilter       = new List<double>(avgFilter);
            result.signals         = new List<int>(signals);
            result.filtered_stddev = new List<double>(stdFilter);

            return result;
        }

        private static double Mean(List<double> list)
        {
            // Simple helper function!
            return list.Average();
        }

        private static double StdDev(List<double> values)
        {
            double ret = 0;
            if (values.Count() > 0)
            {
                double avg = values.Average();
                double sum = values.Sum(d => Math.Pow(d - avg, 2));
                ret = Math.Sqrt((sum) / (values.Count() - 1));
            }
            return ret;
        }
    }

    使用的例子:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    var input = new List<double> {1.0, 1.0, 1.1, 1.0, 0.9, 1.0, 1.0, 1.1, 1.0, 0.9, 1.0,
        1.1, 1.0, 1.0, 0.9, 1.0, 1.0, 1.1, 1.0, 1.0, 1.0, 1.0, 1.1, 0.9, 1.0, 1.1, 1.0, 1.0, 0.9,
        1.0, 1.1, 1.0, 1.0, 1.1, 1.0, 0.8, 0.9, 1.0, 1.2, 0.9, 1.0, 1.0, 1.1, 1.2, 1.0, 1.5, 1.0,
        3.0, 2.0, 5.0, 3.0, 2.0, 1.0, 1.0, 1.0, 0.9, 1.0, 1.0, 3.0, 2.6, 4.0, 3.0, 3.2, 2.0, 1.0,
        1.0, 0.8, 4.0, 4.0, 2.0, 2.5, 1.0, 1.0, 1.0};

    int lag = 30;
    double threshold = 5.0;
    double influence = 0.0;

    var output = ZScore.StartAlgo(input, lag, threshold, influence);


    这是一个C的执行"让-保罗smoothed Z评分使用的Arduino控制器将决定是否把加速度计在冲击方向上安切洛蒂来的左或右。这很好因为本装置执行的一bounced返回信号。这里是这本峰检测算法的输入和显示设备的影响和冲击,后从右到左的。你可以看到的话,初始穗振荡的传感器。

    enter image description here

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    #include <stdio.h>
    #include <math.h>
    #include <string.h>


    #define SAMPLE_LENGTH 1000

    float stddev(float data[], int len);
    float mean(float data[], int len);
    void thresholding(float y[], int signals[], int lag, float threshold, float influence);


    void thresholding(float y[], int signals[], int lag, float threshold, float influence) {
        memset(signals, 0, sizeof(float) * SAMPLE_LENGTH);
        float filteredY[SAMPLE_LENGTH];
        memcpy(filteredY, y, sizeof(float) * SAMPLE_LENGTH);
        float avgFilter[SAMPLE_LENGTH];
        float stdFilter[SAMPLE_LENGTH];

        avgFilter[lag - 1] = mean(y, lag);
        stdFilter[lag - 1] = stddev(y, lag);

        for (int i = lag; i < SAMPLE_LENGTH; i++) {
            if (fabsf(y[i] - avgFilter[i-1]) > threshold * stdFilter[i-1]) {
                if (y[i] > avgFilter[i-1]) {
                    signals[i] = 1;
                } else {
                    signals[i] = -1;
                }
                filteredY[i] = influence * y[i] + (1 - influence) * filteredY[i-1];
            } else {
                signals[i] = 0;
            }
            avgFilter[i] = mean(filteredY + i-lag, lag);
            stdFilter[i] = stddev(filteredY + i-lag, lag);
        }
    }

    float mean(float data[], int len) {
        float sum = 0.0, mean = 0.0;

        int i;
        for(i=0; i<len; ++i) {
            sum += data[i];
        }

        mean = sum/len;
        return mean;


    }

    float stddev(float data[], int len) {
        float the_mean = mean(data, len);
        float standardDeviation = 0.0;

        int i;
        for(i=0; i<len; ++i) {
            standardDeviation += pow(data[i] - the_mean, 2);
        }

        return sqrt(standardDeviation/len);
    }

    int main() {
        printf("Hello, World!
    ");
        int lag = 100;
        float threshold = 5;
        float influence = 0;
        float y[]=  {1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9,
      ....
    1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,       2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,
           2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1}

        int signal[SAMPLE_LENGTH];

        thresholding(y, signal,  lag, threshold, influence);

        return 0;
    }

    ’s the result和拉的影响= 0

    enter image description here

    伟大的,但需要在一起= 1的影响

    enter image description here

    这是非常好的。


    下面是平滑z分算法的(非惯用)scala版本:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    /**
      * Smoothed zero-score alogrithm shamelessly copied from https://stackoverflow.com/a/22640362/6029703
      * Uses a rolling mean and a rolling deviation (separate) to identify peaks in a vector
      *
      * @param y - The input vector to analyze
      * @param lag - The lag of the moving window (i.e. how big the window is)
      * @param threshold - The z-score at which the algorithm signals (i.e. how many standard deviations away from the moving mean a peak (or signal) is)
      * @param influence - The influence (between 0 and 1) of new signals on the mean and standard deviation (how much a peak (or signal) should affect other values near it)
      * @return - The calculated averages (avgFilter) and deviations (stdFilter), and the signals (signals)
      */
    private def smoothedZScore(y: Seq[Double], lag: Int, threshold: Double, influence: Double): Seq[Int] = {
      val stats = new SummaryStatistics()

      // the results (peaks, 1 or -1) of our algorithm
      val signals = mutable.ArrayBuffer.fill(y.length)(0)

      // filter out the signals (peaks) from our original list (using influence arg)
      val filteredY = y.to[mutable.ArrayBuffer]

      // the current average of the rolling window
      val avgFilter = mutable.ArrayBuffer.fill(y.length)(0d)

      // the current standard deviation of the rolling window
      val stdFilter = mutable.ArrayBuffer.fill(y.length)(0d)

      // init avgFilter and stdFilter
      y.take(lag).foreach(s => stats.addValue(s))

      avgFilter(lag - 1) = stats.getMean
      stdFilter(lag - 1) = Math.sqrt(stats.getPopulationVariance) // getStandardDeviation() uses sample variance (not what we want)

      // loop input starting at end of rolling window
      y.zipWithIndex.slice(lag, y.length - 1).foreach {
        case (s: Double, i: Int) =>
          // if the distance between the current value and average is enough standard deviations (threshold) away
          if (Math.abs(s - avgFilter(i - 1)) > threshold * stdFilter(i - 1)) {
            // this is a signal (i.e. peak), determine if it is a positive or negative signal
            signals(i) = if (s > avgFilter(i - 1)) 1 else -1
            // filter this signal out using influence
            filteredY(i) = (influence * s) + ((1 - influence) * filteredY(i - 1))
          } else {
            // ensure this signal remains a zero
            signals(i) = 0
            // ensure this value is not filtered
            filteredY(i) = s
          }

          // update rolling average and deviation
          stats.clear()
          filteredY.slice(i - lag, i).foreach(s => stats.addValue(s))
          avgFilter(i) = stats.getMean
          stdFilter(i) = Math.sqrt(stats.getPopulationVariance) // getStandardDeviation() uses sample variance (not what we want)
      }

      println(y.length)
      println(signals.length)
      println(signals)

      signals.zipWithIndex.foreach {
        case(x: Int, idx: Int) =>
          if (x == 1) {
            println(idx +"" + y(idx))
          }
      }

      val data =
        y.zipWithIndex.map { case (s: Double, i: Int) => Map("x" -> i,"y" -> s,"name" ->"y","row" ->"data") } ++
        avgFilter.zipWithIndex.map { case (s: Double, i: Int) => Map("x" -> i,"y" -> s,"name" ->"avgFilter","row" ->"data") } ++
        avgFilter.zipWithIndex.map { case (s: Double, i: Int) => Map("x" -> i,"y" -> (s - threshold * stdFilter(i)),"name" ->"lower","row" ->"data") } ++
        avgFilter.zipWithIndex.map { case (s: Double, i: Int) => Map("x" -> i,"y" -> (s + threshold * stdFilter(i)),"name" ->"upper","row" ->"data") } ++
        signals.zipWithIndex.map { case (s: Int, i: Int) => Map("x" -> i,"y" -> s,"name" ->"signal","row" ->"signal") }

      Vegas("Smoothed Z")
        .withData(data)
        .mark(Line)
        .encodeX("x", Quant)
        .encodeY("y", Quant)
        .encodeColor(
          field="name",
          dataType=Nominal
        )
        .encodeRow("row", Ordinal)
        .show

      return signals
    }

    下面是一个返回与python和groovy版本相同结果的测试:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    val y = List(1d, 1d, 1.1d, 1d, 0.9d, 1d, 1d, 1.1d, 1d, 0.9d, 1d, 1.1d, 1d, 1d, 0.9d, 1d, 1d, 1.1d, 1d, 1d,
      1d, 1d, 1.1d, 0.9d, 1d, 1.1d, 1d, 1d, 0.9d, 1d, 1.1d, 1d, 1d, 1.1d, 1d, 0.8d, 0.9d, 1d, 1.2d, 0.9d, 1d,
      1d, 1.1d, 1.2d, 1d, 1.5d, 1d, 3d, 2d, 5d, 3d, 2d, 1d, 1d, 1d, 0.9d, 1d,
      1d, 3d, 2.6d, 4d, 3d, 3.2d, 2d, 1d, 1d, 0.8d, 4d, 4d, 2d, 2.5d, 1d, 1d, 1d)

    val lag = 30
    val threshold = 5d
    val influence = 0d

    smoothedZScore(y, lag, threshold, influence)

    vegas chart of result

    这里要点


    如果边界值或其他标准取决于未来值,那么唯一的解决方案(没有时间机器或其他对未来值的了解)是延迟任何决策,直到有足够的未来值。如果你想要一个高于平均值的水平,比如20点,那么你必须等到你比任何峰值决策至少提前19点,否则下一个新的点可能会在19点之前完全超出你的阈值。

    你现在的情节没有任何高峰…除非你事先知道下一个点不是1E99,在重新调整你的图的Y维之后,直到那个点为止,它都是平的。


    除了将最大值与平均值进行比较,还可以将最大值与相邻的最小值进行比较,其中,最小值仅定义在噪声阈值之上。如果局部最大值大于相邻最小值的3倍(或其他置信系数),则该最大值为峰值。使用更宽的移动窗口,峰值测定更准确。上面使用了一个以窗口中间为中心的计算,顺便说一下,而不是在窗口的末尾计算(=lag)。

    注意,最大值必须被视为之前信号的增加。之后减少。


    如果你必须在你的数据库表的数据,这是一个简单的SQL版本的Z评分算法。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    with data_with_zscore as (
        select
            date_time,
            value,
            value / (avg(value) over ()) as pct_of_mean,
            (value - avg(value) over ()) / (stdev(value) over ()) as z_score
        from {{tablename}}  where datetime > '2018-11-26' and datetime < '2018-12-03'
    )


    -- select all
    select * from data_with_zscore

    -- select only points greater than a certain threshold
    select * from data_with_zscore where z_score > abs(2)


    该函数的名称和它的scipy.signal.find_peaks,AS,是有用的。但它很重要。它的参数widththresholddistance和以上所有prominenceto get a良好的峰值提取。

    根据我的测试和文档,观"珥是有用的概念"让好峰,峰位置和噪声取消。

    珥(topographic)是什么?它是"最小高度descend必要得到从高峰到任何高等地形",它可以在这里:湖泊

    的想法是:

    The higher the prominence, the more"important" the peak is.