Updating Each Row with Minutes Since First Row
我有一个包含一百万条推文的文件。 第一条推文发生了
我在这里找到了datetime的帮助,并在这里转换时间,但当我把两者放在一起时,我没有得到合适的时间。 它可能是每个
它抛出的错误是:
1 2 3 | tweets['minsSince'] = tweets.apply(timesince,axis=1) ... TypeError: ('string indices must be integers, not str', u'occurred at index 0') |
谢谢你的帮助。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #Import stuff from datetime import datetime import time import pandas as pd from pandas import DataFrame #Read the csv file tweets = pd.read_csv('BostonTWEETS.csv') tweets.head() #The first tweet's published_at time starttime = datetime (2013, 04, 15, 20, 17, 18) #Run through the document and calculate the minutes since the first tweet def timesince(row): minsSince = int() tweetTime = row['published_at'] ts = time.strftime('%Y-%m-%d %H:%M:%S', time.strptime(tweetTime['published_at'], '%Y-%m-%d %H:%M:%S %UTC')) timediff = (tweetTime - starttime) minsSince.append("timediff") return",".join(minsSince) tweets['minsSince'] = tweets.apply(timesince,axis=1) df = DataFrame(tweets) print(df) |
前5行的示例csv文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #Import stuff from datetime import datetime import time import pandas as pd from pandas import DataFrame #Read the csv file tweets = pd.read_csv('sample.csv') tweets.head() #The first tweet's published_at time starttime = tweets.published_at.values[0] starttime = datetime.strptime(starttime, '%Y-%m-%d %H:%M:%S UTC') #Run through the document and calculate the minutes since the first tweet def timesince(row): ts = datetime.strptime(row, '%Y-%m-%d %H:%M:%S UTC') timediff = (ts- starttime) timediff = divmod(timediff.days * 86400 + timediff.seconds, 60) return timediff[0] tweets['minSince'] = 0 tweets.minSince = tweets.published_at.map(timesince) df = DataFrame(tweets) print(df) |
我希望这就是你要找的东西。