open txt file using read_csv by pandas
我正在尝试使用pandas处理txt文件。
但是,我在read_csv上遇到以下错误
CParserError Traceback (most recent call
last) in ()
22 Col.append(elm)
23
---> 24 revised=pd.read_csv(Path+file,skiprows=Header+1,header=None,delim_whitespace=True)
25
26 TimeSeries.append(revised)C:\Users\obakatsu\Anaconda3\lib\site-packages\pandas\io\parsers.py in
parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col,
usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters,
true_values, false_values, skipinitialspace, skiprows, skipfooter,
nrows, na_values, keep_default_na, na_filter, verbose,
skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col,
date_parser, dayfirst, iterator, chunksize, compression, thousands,
decimal, lineterminator, quotechar, quoting, escapechar, comment,
encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines,
skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints,
use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
560 skip_blank_lines=skip_blank_lines)
561
--> 562 return _read(filepath_or_buffer, kwds)
563
564 parser_f.name = nameC:\Users\obakatsu\Anaconda3\lib\site-packages\pandas\io\parsers.py in
_read(filepath_or_buffer, kwds)
323 return parser
324
--> 325 return parser.read()
326
327 _parser_defaults = {C:\Users\obakatsu\Anaconda3\lib\site-packages\pandas\io\parsers.py in
read(self, nrows)
813 raise ValueError('skip_footer not supported for iteration')
814
--> 815 ret = self._engine.read(nrows)
816
817 if self.options.get('as_recarray'):C:\Users\obakatsu\Anaconda3\lib\site-packages\pandas\io\parsers.py in
read(self, nrows) 1312 def read(self, nrows=None): 1313
try:
-> 1314 data = self._reader.read(nrows) 1315 except StopIteration: 1316 if self._first_chunk:pandas\parser.pyx in pandas.parser.TextReader.read
(pandas\parser.c:8748)()pandas\parser.pyx in pandas.parser.TextReader._read_low_memory
(pandas\parser.c:9003)()pandas\parser.pyx in pandas.parser.TextReader._read_rows
(pandas\parser.c:9731)()pandas\parser.pyx in pandas.parser.TextReader._tokenize_rows
(pandas\parser.c:9602)()pandas\parser.pyx in pandas.parser.raise_parser_error
(pandas\parser.c:23325)()CParserError: Error tokenizing data. C error: Expected 4 fields in
line 6, saw 8
有谁知道如何解决这个问题?
我想要处理的python脚本和示例txt文件如下所示。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | Path='data/NanFung/OCTA_Tower/test/' files=os.listdir(Path) TimeSeries=[] Cols=[] for file in files: new=open(Path+file) Supplement=[] Col=[] data=[] Header=0 #calculate how many rows should be skipped for line in new: if line.startswith('Timestamp'): new1=line.split("") new1[-1]=str(file)[:-4] break else: Header += 1 #clean col name for elm in new1: if len(elm)>0: Col.append(elm) revised=pd.read_csv(Path+file,skiprows=Header+1,header=None,delim_whitespace=True) TimeSeries.append(revised) Cols.append(Col) |
txt文件
1 2 3 4 5 6 7 | history:/NIKL6215_ENC_1/CH$2d19$2d1$20$20CHW$20OUTLET$20TEMP 20-Oct-12 8:00 PM CT to ? Timestamp Trend Flags Status Value (oC) ------------------------- ----------- ------ ---------- 20-Oct-12 8:00:00 PM HKT {start} {ok} 15.310 oC 21-Oct-12 12:00:00 AM HKT { } {ok} 15.130 oC |
它失败了,因为您正在阅读的文件部分如下所示:
1 2 3 4 | Timestamp Trend Flags Status Value (oC) ------------------------- ----------- ------ ---------- 20-Oct-12 8:00:00 PM HKT {start} {ok} 15.310 oC 21-Oct-12 12:00:00 AM HKT { } {ok} 15.130 oC |
但这里没有一致的分隔符。
包括此行之前
1 | file_name = Path+file #change below line to given |
revised=pd.read_csv(Path+file,skiprows=Header+1,header=None,delim_whitespace=True)
revised=pd.read_csv(file_name,skiprows=Header+1,header=None,sep="")