Use gawk to convert from human readable time in a file to unix time?
我是新来的呆子。基于这个线程,我已经制作了一个gawk函数来将日期时间(如"07,jun,2015,06,pm")转换为Unix时间(毫秒):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | $ cat tst.awk function cvttime(t, a) { split(t,a,/[,: ]+/) # fa0,07,DEC,2014,10,AM,862.209018 # => # a[2] ="07" date # a[3] ="DEC" month # a[4] ="2014" year # a[5] ="06" time # a[6] ="AM" AM/PM if ( (a[6] =="PM") && (a[5] < 12) ) { a[5] += 12 } match("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",a[3]) a[3] = (RSTART+2)/3 return( a[1]","mktime(a[4]""a[3]""a[2]""a[5]" 00 0")"000,"a[7]) } BEGIN { mdt ="fa0,07,DEC,2014,10,AM,862.209018" ms = cvttime(mdt) print ms } |
在终端中,以下命令给出了正确的UNIX时间:
1 | $ TZ=UTC gawk -f tst.awk |
返回:
1 | fa0,1417946400000,862.209018 |
现在,我有一个文件"input.csv",其中包含:
1 2 3 4 5 | aa1,07,DEC,2014,06,AM,282.485988 ac3,07,DEC,2014,07,AM,97.6757181 ef3,07,DEC,2014,08,AM,112.816554 ag3,07,DEC,2014,09,AM,101.479961 fa0,07,DEC,2014,10,AM,862.209018 |
我应该如何修改gawk函数和shell命令以传入"input.csv"并生成"output.csv",其中包含:
1 2 3 4 5 | aa1,1417932000000,282.485988 ac3,1417935600000,97.6757181 ef3,1417939200000,112.816554 ag3,1417942800000,101.479961 fa0,1417946400000,862.209018 |
事先谢谢!
你可以用这样的方法:
1 | awk -F, '{"date -d"$3"" ""$2"" ""$5"" ""$6"" ""$4" '+%s'" | getline d; print $1","d"000,"$NF""}' input.csv > output.csv |
或
1 | awk -F, '{"date -u -d"$3"" ""$2"" ""$5"" ""$6"" ""$4" '+%s'" | getline d; print $1","d"000,"$NF""}' input.csv > output.csv |
如果要设置标志
这很令人困惑,因为输入时间与输出时间不匹配,但我认为这是你想要的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | BEGIN { FS = OFS ="," } { # fix year $4 += 2000 # fix month match("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC", $3) $3 = (RSTART + 2) / 3 # fix hour if ($6 =="PM" && $5 < 12) $5 += 12 print $1, mktime($4"" $3"" $2"" $5" 0 0") * 1000, $NF } |