Python (Pandas) - Working with numeric data but adding non numeric data back
我有一个类似这样的csv文件:
1 2 3 4 | Build,Avg,Min,Max BuildA,56.190,39.123,60.1039 BuildX,57.11,40.102,60.200 BuildZER,55.1134,35.129404123,60.20121 |
我想得到每一列的平均值、最小值、最大值,并让这些数据中的每一个作为新行。我排除了非数字列(构建列),然后运行统计信息。我通过这样做来实现:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | df = pd.read_csv('fakedata.csv') columns = [] builds = [] for column in df.columns: if(df[column].dtype == 'float64'): columns.append(column) else: builds.append(column) save = df[builds] df = df[columns] print(df) df.loc['Min']= df.min() df.loc['Average']= df.mean() df.loc['Max']= df.max() |
号
如果我当时把这些数据写进一个csv,它会是:
1 2 3 4 5 6 7 | ,Avg,Min,Max 0,56.19,39.123,60.1039 1,57.11,40.102,60.2 2,55.1134,35.129404123,60.20121 Min,55.1134,35.129404123,60.1039 Average,55.8817,37.3709520615,60.1522525 Max,57.11,40.102,60.20121 |
这接近我想要的,但我希望构建列再次成为列1,并且构建名称位于最小值、平均值、最大值之上。基本上是这样:
1 2 3 4 5 6 7 | Builds,Avg,Min,Max BuildA,56.19,39.123,60.1039 BuildX,57.11,40.102,60.2 BuildZER,55.1134,35.129404123,60.20121 Min,55.1134,35.129404123,60.1039 Average,55.8817,37.3709520615,60.1522525 Max,57.11,40.102,60.20121 |
。
我试图通过以下方式来实现这一目标:
1 2 3 | df.insert(0,'builds', save) with open('fakedata.csv', 'w') as f: df.to_csv(f) |
但这给了我这个csv:
1 2 3 4 5 6 7 | ,builds,Avg,Min,Max 0,Build1,56.19,39.123,60.1039 1,Build2,57.11,40.102,60.2 2,Build3,55.1134,35.129404123,60.20121 Min,,55.1134,35.129404123,60.1039 Average,,55.8817,37.3709520615,60.1522525 Max,,57.11,40.102,60.20121 |
。
我怎么修这个?
IIUC:
1 | df_out = pd.concat([df.set_index('Build'),df.set_index('Build').agg(['max','min','mean'])]).rename(index={'max':'Max','min':'Min','mean':'Average'}).reset_index() |
输出:
1 2 3 4 5 6 7 | index Avg Min Max 0 BuildA 56.1900 39.123000 60.10390 1 BuildX 57.1100 40.102000 60.20000 2 BuildZER 55.1134 35.129404 60.20121 3 Max 57.1100 40.102000 60.20121 4 Min 55.1134 35.129404 60.10390 5 Average 56.1378 38.118135 60.16837 |
号