关于python:Python3 – 为什么这个代码出现索引列表?

Python3 - Why this code occurs list out of index?

我在Kaggle使用IMDB5000+电影元数据集练习回归。我正在使用pandas库读取csv文件并将该数据转换为嵌套列表。我得到一个名为"电影数据"的列表。

我想删除movie_data[n]行,其中movie_data[n][0] != 'Color'行。因此,我尝试通过for循环删除,但此代码出现在i == 4827处:

IndexError: list index out of range

这是我的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import tensorflow as tf
import numpy as np
import pandas as pd

tf.set_random_seed(777)

read = pd.read_csv('movie_metadata.csv', sep=',')
movie_data = read.values.tolist()
gross_data = []
for i in range(len(movie_data)):
    gross_data.append(movie_data[i][8])

#delete gross row
for row in movie_data:
    del row[8]

#remove not-colored (e.g. black and white) movie datas
for i in range(len(movie_data)):
    print(i)
    if movie_data[i][0] != 'Color':
        del movie_data[i]

training_movie_data = movie_data[0:3500]
training_gross_data = gross_data[0:3500]

#print(training_movie_data)

第20行出错:if movie_data[i][0] != 'Color'

我怎么修这个?


您不应该删除正在迭代的元素:

1
2
3
4
5
6
7
8
9
10
11
12
13
In [11]: A = [1, 2, 3]

In [12]: for i in range(len(A)):
    ...:     del A[i]
    ...:
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-12-1ffb9090e54f> in <module>()
      1 for i in range(len(A)):
----> 2     del A[i]
      3

IndexError: list assignment index out of range

在这种情况下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
In [21]: A = [1, 2, 3]

In [22]: for i in range(len(A)):
    ...:     print(A[i])
    ...:     del A[i]
    ...:
1
3
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-22-af7e1866dc89> in <module>()
      1 for i in range(len(A)):
----> 2     print(A[i]);del A[i]
      3
      4

IndexError: list index out of range

这就是你对del movie_data[i]所做的。


如果你只想看非彩色电影,你可以用熊猫,然后这样说:

代码:

1
bw = read[read.color != 'Color']

测试代码:

1
2
3
read = pd.read_csv('movie_metadata.csv', sep=',')
bw = read[read.color != 'Color']
print(bw.head())

**结果:

1
2
3
4
5
6
7
                color    director_name  num_critic_for_reviews  duration
4                 NaN      Doug Walker                     NaN       NaN  
111   Black and White      Michael Bay                   191.0     184.0  
149   Black and White     Lee Tamahori                   264.0     133.0  
257   Black and White  Martin Scorsese                   267.0     170.0  
272   Black and White     Michael Mann                   174.0     165.0  
....