Python，内存错误，csv文件太大

Python, memory error, csv file too large

本问题已经有最佳答案，请猛点这里访问。

我有一个python模块的问题，无法处理导入大数据文件(文件targets.csv权重接近1 Gb)

加载此行时出现错误：

1 2	targets = [(name, float(X), float(Y), float(Z), float(BG)) for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]

追溯：

1
2
3
4

Traceback (most recent call last):
File"C:\Users\gary\Documents\EPSON STUDIES\colors_text_D65.py", line 41, in <module>
for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]
MemoryError

我想知道是否有办法逐行打开文件targets.csv？并且还想知道这会减慢这个过程吗？

这个模块已经很慢......

谢谢！

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108

import geometry
import csv
import numpy as np
import random
import cv2

S = 0

img = cv2.imread("MAP.tif", -1)
height, width = img.shape

pixx = height * width
iterr = float(pixx / 1000)
accomplished = 0
temp = 0

ppm = file("epson gamut.ppm", 'w')

ppm.write("P3" +"
" + str(width) +"" + str(height) +"
" +"255" +"
")
# PPM file header

all_colors = [(name, float(X), float(Y), float(Z))
for name, X, Y, Z in csv.reader(open('XYZcolorlist_D65.csv'))]

# background is marked SUPPORT
support_i = [i for i, color in enumerate(all_colors) if color[0] == '255 255 255']
if len(support_i)>0:
support = np.array(all_colors[support_i[0]][1:])
del all_colors[support_i[0]]
else:
support = None

tg, hull_i = geometry.tetgen_of_hull([(X,Y,Z) for name, X, Y, Z in all_colors])
colors = [all_colors[i] for i in hull_i]

print ("thrown out:"
+",".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0])))

targets = [(name, float(X), float(Y), float(Z), float(BG))
for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]

for target in targets:

name, X, Y, Z, BG = target

target_point = support + (np.array([X,Y,Z]) - support)/(1-BG)

tet_i, bcoords = geometry.containing_tet(tg, target_point)

if tet_i == None:
#print str("out")
ppm.write(str("255 255 255") +"
")
print"out"

temp += 1

if temp >= iterr:

accomplished += temp
print str(100 * accomplished / (float(pixx))) + str(" %")
temp = 0

continue
# not in gamut

else:

A = bcoords[0]
B = bcoords[1]
C = bcoords[2]
D = bcoords[3]

R = random.uniform(0,1)

names = [colors[i][0] for i in tg.tets[tet_i]]

if R <= A:
S = names[0]

elif R <= A+B:
S = names[1]

elif R <= A+B+C:
S = names[2]

else:
S = names[3]

ppm.write(str(S) +"
")

temp += 1

if temp >= iterr:

accomplished += temp
print str(100 * accomplished / (float(pixx))) + str(" %")
temp = 0

print"done"
ppm.close()

csv.reader()已经一次读取一行。但是，您首先将所有行收集到列表中。您应该一次处理一行。一种方法是切换到生成器，例如：

1 2	targets = ((name, float(X), float(Y), float(Z), float(BG)) for name, X, Y, Z, BG in csv.reader(open('targets.csv')))

(从方括号切换到parens应该将target从列表推导更改为生成器。)