关于python:sckit-learn fit() 在数据正常化后导致错误

sckit-learn fit() leads to error after normalising the data

我一直在尝试:

  • 从数据集创建X要素和Y依赖项
  • 拆分数据集
  • 使数据正常化
  • 使用SciKit学习中的SVR进行培训
  • 下面是使用随机值填充的熊猫数据帧的代码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    import pandas as pd
    import numpy as np
    df = pd.DataFrame(np.random.rand(20,5), columns=["A","B","C","D","E"])
    a = list(df.columns.values)
    a.remove("A")

    X = df[a]
    y = df["A"]

    X_train = X.iloc[0: floor(2 * len(X) /3)]
    X_test = X.iloc[floor(2 * len(X) /3):]
    y_train = y.iloc[0: floor(2 * len(y) /3)]
    y_test = y.iloc[floor(2 * len(y) /3):]

    # normalise

    from sklearn import preprocessing

    X_trainS = preprocessing.scale(X_train)
    X_trainN = pd.DataFrame(X_trainS, columns=a)

    X_testS = preprocessing.scale(X_test)
    X_testN = pd.DataFrame(X_testS, columns=a)

    y_trainS = preprocessing.scale(y_train)
    y_trainN = pd.DataFrame(y_trainS)

    y_testS = preprocessing.scale(y_test)
    y_testN = pd.DataFrame(y_testS)

    import sklearn
    from sklearn.svm import SVR

    clf = SVR(kernel='rbf', C=1e3, gamma=0.1)

    pred = clf.fit(X_trainN,y_trainN).predict(X_testN)

    出现此错误:

    C:\Anaconda3\lib\site-packages\pandas\core\index.py:542:
    FutureWarning: slice indexers when using iloc should be integers and
    not floating point "and not floating point",FutureWarning)
    --------------------------------------------------------------------------- ValueError Traceback (most recent call
    last) in ()
    34 clf = SVR(kernel='rbf', C=1e3, gamma=0.1)
    35
    ---> 36 pred = clf.fit(X_trainN,y_trainN).predict(X_testN)
    37

    C:\Anaconda3\lib\site-packages\sklearn\svm\base.py in fit(self, X, y,
    sample_weight)
    174
    175 seed = rnd.randint(np.iinfo('i').max)
    --> 176 fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
    177 # see comment on the other call to np.iinfo in this file
    178

    C:\Anaconda3\lib\site-packages\sklearn\svm\base.py in _dense_fit(self,
    X, y, sample_weight, solver_type, kernel, random_seed)
    229 cache_size=self.cache_size, coef0=self.coef0,
    230 gamma=self._gamma, epsilon=self.epsilon,
    --> 231 max_iter=self.max_iter, random_seed=random_seed)
    232
    233 self._warn_from_fit_status()

    C:\Anaconda3\lib\site-packages\sklearn\svm\libsvm.pyd in
    sklearn.svm.libsvm.fit (sklearn\svm\libsvm.c:1864)()

    ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

    我不知道为什么。有人能解释吗?我认为这与预处理后转换回数据帧有关。


    这里的错误在您作为标签传递的df中:y_trainN

    如果与示例文档版本和代码进行比较:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    In [40]:

    n_samples, n_features = 10, 5
    np.random.seed(0)
    y = np.random.randn(n_samples)
    print(y)
    y_trainN.values
    [ 1.76405235  0.40015721  0.97873798  2.2408932   1.86755799 -0.97727788
      0.95008842 -0.15135721 -0.10321885  0.4105985 ]
    Out[40]:
    array([[-0.06680594],
           [ 0.23535043],
           [-1.49265082],
           [ 1.22537862],
           [-0.46499134],
           [-0.23744759],
           [ 1.40520679],
           [ 0.95882677],
           [ 1.66996413],
           [-0.37515955],
           [-0.75826444],
           [-1.45945337],
           [-0.63995369]])

    因此,您可以调用squeeze生成一个序列,或者选择df中的唯一列,这样就不会出现错误:

    1
    pred = clf.fit(X_trainN,y_trainN[0]).predict(X_testN)

    1
    pred = clf.fit(X_trainN,y_trainN.squeeze()).predict(X_testN)

    因此,我们可以认为,对于只有一列的df,它应该返回一些可以强制转换为numpy数组的内容,或者numpy没有正确地调用数组属性,但实际上,您应该传递一个序列,或者从df中选择列作为参数。