one-hot encode : list of column_values has to encode
我在列表中有一个列名称,我想对列表中的列进行一次热编码。我想从数据集中对分类变量进行编码。我尝试了几个过程,但它给了我一个错误。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from sklearn import preprocessing #training_set_ed is where my .csv file is stored edited_training_set = 'edited_dataset/test_set.csv' trainig_set_ed = pd.read_csv(edited_training_set) column_header = ['cat_var_1','cat_var_2','cat_var_3','cat_var_4','cat_var_5','cat_var_6', 'cat_var_7','cat_var_8','cat_var_9','cat_var_10','cat_var_11','cat_var_12','cat_var_13', 'cat_var_14','cat_var_15','cat_var_16','cat_var_17','cat_var_18'] clfs = {c:LabelEncoder() for c in column_header} for col,clf in clfs.items(): trainig_set_ed[col] = clfs[col].fit_transform(trainig_set_ed[col]) trainig_set_ed.to_csv('edited_dataset/train_set_encode.csv',sep='\t',encoding='utf-8') |
投掷错误
Traceback (most recent call last):
File"preprocessing.py", line 83, in
trainig_set_ed[col] = clfs[col].fit_transform(trainig_set_ed[col])
File"/root/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2139, in getitem
return self._getitem_column(key)
File"/root/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2146, in _getitem_column
return self._get_item_cache(key)
File"/root/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 1842, in _get_item_cache
values = self._data.get(item)
File"/root/.local/lib/python2.7/site-packages/pandas/core/internals.py", line 3838, in get
loc = self.items.get_loc(item)
File"/root/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2524, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File"pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File"pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File"pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File"pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'cat_var_6'
谢谢!
演示:
来源DF:
1 2 3 4 5 6 | In [93]: df Out[93]: a b c 0 aaa xxx ddd 1 bbb zzz bbb 2 ccc aaa aaa |
解决方案:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | In [94]: from sklearn.preprocessing import LabelEncoder ...: ...: cols = ['a','b','c'] ...: clfs = {c:LabelEncoder() for c in cols} ...: In [95]: for col, clf in clfs.items(): ...: df[col] = clfs[col].fit_transform(df[col]) ...: In [96]: df Out[96]: a b c 0 0 1 2 1 1 2 1 2 2 0 0 |
逆变换:
1 2 3 4 5 6 7 8 | In [97]: clfs['a'].inverse_transform(df['a']) Out[97]: array(['aaa', 'bbb', 'ccc'], dtype=object) In [98]: clfs['b'].inverse_transform(df['b']) Out[98]: array(['xxx', 'zzz', 'aaa'], dtype=object) In [99]: clfs['c'].inverse_transform(df['c']) Out[99]: array(['ddd', 'bbb', 'aaa'], dtype=object) |