how replace NaN columns with calculated CAGR values
我有一个带有 NaN 值的数据框。我想将 NaN 值替换为 CAGR 值
1 2 3 4 5 | val1 val2 val3 val4 val5 0 100 100 100 100 100 1 90 110 80 110 50 2 70 150 70 NaN NaN 3 NaN NaN NaN NaN NaN |
CAGR(复合年增长率)
=(最终值/第一个值)**(1/年数)
例如,val1 的 CAGR 为 -23%。所以 val1 的最后一个值为 53.9
val4 列的 CAGR 值为 10%
所以 row2 NaN 将是 121 并且 row3 NaN 替换为 133
如何自动替换 NaN?
问题是
1) 我如何计算每列的 CAGR?
我使用了 isnull() 所以,我发现哪一行是空的。但我不知道如何除计算 CAGR 的行。
2) 如何用计算值替换 NaN?
谢谢。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | from __future__ import division # for python2.7 import numpy as np # tab delimited data a = '''100 100 100 100 100 90 110 80 110 50 70 150 70 NaN NaN NaN NaN NaN NaN NaN ''' # parse and make a numpy array data = np.array( [[np.nan if aaa=='NaN' else int(aaa) for aaa in aa.split('\\t')] for aa in a.splitlines()] ) for col in range(5): Nyears = np.isnan(data[:,col]).argmax()-1 # row index for the last non-NaN value endvalue = data[Nyears,col] cagr = (endvalue / 100) ** (1 / Nyears) print Nyears, endvalue, cagr for year in np.argwhere(np.isnan(data[:,col])): data[year,col] = data[year-1,col] * cagr print data |
我明白了:
1 2 3 4 | [[ 100. 100. 100. 100. 100. ] [ 90. 110. 80. 110. 50. ] [ 70. 150. 70. 121. 25. ] [ 58.56620186 183.71173071 58.56620186 133.1 12.5 ]] |