windows event log message conversion to dictionary then to pandas columns
我有一个包含Windows事件日志消息字段的pandas列,如下所示。如何浏览和删除所有非键值样式对?
消息列包含类似的数据,但可能更多的key:value类型,因为这只是一个事件ID。
1 2 3 4 | message ['subject':'none','security id':'s-1-5-12','account name':'myaccountname','account domain':'domain', 'logon id':'0x3e6', ' process information':'none', 'new process id':'0x1a53', 'new process name':'c:\windows\system32\ipconfig.exe', 'token elevation type':'%%1932','creator process id':'0x1b33', 'process command line':'none', ' token elevation type indicates the type of token that was assigned to the new process in accordance with user account control policy.',' type 1 is a full token with no privileges removed or groups disabled. a full token is only used if user account control is disabled or if the user is the built-in administrator account or a service account.', ' type 2 is an elevated token with no privileges removed or groups disabled. an elevated token is used when user account control is enabled and the user chooses to start the program using run as administrator. an elevated token is also used when an application is configured to always require administrative privilege or to always require maximum privilege', ' and the user is a member of the administrators group.',' type 3 is a limited token with administrative privileges removed and administrative groups disabled. the limited token is used when user account control is enabled', ' the application does not require administrative privilege', ' and the user does not choose to start the program using run as administrator.'] ['subject':'none','security id':'s-1-5-13','account name':'myaccountname','account domain':'domain', 'logon id':'0x3e6', ' process information':'none', 'new process id':'0x1a53', 'new process name':'c:\windows\system32 et.exe', 'token elevation type':'%%1932','creator process id':'0x1b33', 'process command line':'none', ' token elevation type indicates the type of token that was assigned to the new process in accordance with user account control policy.',' type 1 is a full token with no privileges removed or groups disabled. a full token is only used if user account control is disabled or if the user is the built-in administrator account or a service account.', ' type 2 is an elevated token with no privileges removed or groups disabled. an elevated token is used when user account control is enabled and the user chooses to start the program using run as administrator. an elevated token is also used when an application is configured to always require administrative privilege or to always require maximum privilege', ' and the user is a member of the administrators group.',' type 3 is a limited token with administrative privileges removed and administrative groups disabled. the limited token is used when user account control is enabled', ' the application does not require administrative privilege', ' and the user does not choose to start the program using run as administrator.'] |
预期输出:
1 2 | subject security id account name logon id process information new processs id new process name token elevation type creator process id process command line none s-1-5-12 myaccountname 0x3e6 none 0x1a53 c:\windows\system32\ipconfig.exe %%1932 0x1b33 none |
如果我能从我的数据中得到非键:值对,我知道我可以使用这个方法。
用于分隔列的大熊猫字典列表
您可以使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | print (df) message 0 {'a':'none','b':'2', ' token.', ' type 1'} import yaml print (df.message.apply(yaml.load)) 0 {' token.': None, ' type 1': None, 'b': '2', ... Name: message, dtype: object df.message = df.message.apply(lambda x: {k: v for k, v in yaml.load(x).items() if v}) print (df) message 0 {'b': '2', 'a': 'none'} |
用你的数据:
1 | df = pd.DataFrame({'message':["{'subject':'none', 'security id':'s-1-5-12', 'account name':'myaccountname','account domain':'domain', 'logon id':'0x3e6', ' process information':'none', 'new process id':'0x1a53', 'new process name':'c:\windows\system32\ipconfig.exe', 'token elevation type':'%%1932', 'creator process id':'0x1b33','process command line':'none', ' token elevation type indicates the type of token that was assigned to the new process in accordance with user account control policy.', ' type 1 is a full token with no privileges removed or groups disabled. a full token is only used if user account control is disabled or if the user is the built-in administrator account or a service account.', ' type 2 is an elevated token with no privileges removed or groups disabled. an elevated token is used when user account control is enabled and the user chooses to start the program using run as administrator. an elevated token is also used when an application is configured to always require administrative privilege or to always require maimum privilege', ' and the user is a member of the administrators group.',' type 3 is a limited token with administrative privileges removed and administrative groups disabled. the limited token is used when user account control is enabled', ' the application does not require administrative privilege', ' and the user does not choose to start the program using run as administrator.'}"]}) |
1 2 3 4 5 6 7 8 9 10 11 12 13 | import yaml df.message = df.message.apply(lambda x: {k: v for k, v in yaml.load(x).items() if v}) df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index) print (df1) process information account domain account name creator process id \ 0 none domain myaccountname 0x1b33 logon id new process id new process name \ 0 0x3e6 0x1a53 c:\windows\system32\ipconfig.exe process command line security id subject token elevation type 0 none s-1-5-12 none %%1932 |
编辑:
1 2 3 4 5 6 7 8 9 10 11 12 13 | import yaml df.message=df.message.str[0].apply(lambda x:{k:v for k,v in yaml.load('{'+x+'}').items() if v}) df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index) print (df1) process information account domain account name creator process id \ 0 none domain myaccountname 0x1b33 logon id new process id new process name \ 0 0x3e6 0x1a53 c:\windows\system32\ipconfig.exe process command line security id subject token elevation type 0 none s-1-5-12 none %%1932 |