关于python：从第一行开始每分钟更新一行

Updating Each Row with Minutes Since First Row

我有一个包含一百万条推文的文件。第一条推文发生了2013-04-15 20:17:18 UTC。我希望随后用第一条推文的minsSince更新每个推文行。

我在这里找到了datetime的帮助，并在这里转换时间，但当我把两者放在一起时，我没有得到合适的时间。它可能是每个published_at值末尾的UTC字符串。

它抛出的错误是：

1
2
3

tweets['minsSince'] = tweets.apply(timesince,axis=1)
...
TypeError: ('string indices must be integers, not str', u'occurred at index 0')

谢谢你的帮助。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

#Import stuff
from datetime import datetime
import time
import pandas as pd
from pandas import DataFrame

#Read the csv file
tweets = pd.read_csv('BostonTWEETS.csv')
tweets.head()

#The first tweet's published_at time
starttime = datetime (2013, 04, 15, 20, 17, 18)

#Run through the document and calculate the minutes since the first tweet
def timesince(row):
minsSince = int()
tweetTime = row['published_at']
ts = time.strftime('%Y-%m-%d %H:%M:%S', time.strptime(tweetTime['published_at'], '%Y-%m-%d %H:%M:%S %UTC'))
timediff = (tweetTime - starttime)
minsSince.append("timediff")
return",".join(minsSince)

tweets['minsSince'] = tweets.apply(timesince,axis=1)

df = DataFrame(tweets)

print(df)

前5行的示例csv文件。

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

#Import stuff
from datetime import datetime
import time
import pandas as pd
from pandas import DataFrame

#Read the csv file
tweets = pd.read_csv('sample.csv')
tweets.head()

#The first tweet's published_at time
starttime = tweets.published_at.values[0]
starttime = datetime.strptime(starttime, '%Y-%m-%d %H:%M:%S UTC')

#Run through the document and calculate the minutes since the first tweet
def timesince(row):
ts = datetime.strptime(row, '%Y-%m-%d %H:%M:%S UTC')
timediff = (ts- starttime)
timediff = divmod(timediff.days * 86400 + timediff.seconds, 60)
return timediff[0]

tweets['minSince'] = 0
tweets.minSince = tweets.published_at.map(timesince)

df = DataFrame(tweets)

print(df)

我希望这就是你要找的东西。