在Python中从复杂字符串中检索日期

Retrieving a date from a complex string in Python

我正在尝试使用datetime.strptime从两个字符串中获取单个datetime。

时间很容易(比如晚上8:53),所以我可以做如下的事情:

1
theTime = datetime.strptime(givenTime,"%I:%M%p")

然而,字符串不仅仅是一个日期,它是一个类似于http://site.com/?year=2011&month=10&day=5&hour=11格式的链接。我知道我可以做如下的事情:

1
theDate = datetime.strptime(givenURL,"http://site.com/?year=%Y&month=%m&day=%d&hour=%H")

但我不想从链接中得到那个小时,因为它正在其他地方被检索。是否有一种方法可以放置一个虚拟符号(如%x或其他)作为最后一个变量的灵活空间?

最后,我设想有一条类似于:

1
theDateTime = datetime.strptime(givenURL + givenTime,""http://site.com/?year=%Y&month=%m&day=%d&hour=%x%I:%M%p")

(不过,显然,不会使用%x)。有什么想法吗?


如果您想简单地跳过URL中的时间,可以使用split,例如以下方法:

1
2
3
givenURL = 'http://site.com/?year=2011&month=10&day=5&hour=11'
pattern ="http://site.com/?year=%Y&month=%m&day=%d"
theDate = datetime.strptime(givenURL.split('&hour=')[0], pattern)

所以不确定你是否理解正确,但是:

1
2
3
4
5
givenURL = 'http://site.com/?year=2011&month=10&day=5&hour=11'
datePattern ="http://site.com/?year=%Y&month=%m&day=%d"
timePattern ="&time=%I:%M%p"

theDateTime = datetime.strptime(givenURL.split('&hour=')[0] + '&time=' givenTime, datePattern + timePattern)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import datetime
import re

givenURL  = 'http://site.com/?year=2011&month=10&day=5&hour=11'
givenTime = '08:53PM'

print ' givenURL == ' + givenURL
print 'givenTime == ' + givenTime

regx = re.compile('year=(\d\d\d\d)&month=(\d\d?)&day=(\d\d?)&hour=\d\d?')
print '
map(int,regx.search(givenURL).groups()) =='
,map(int,regx.search(givenURL).groups())

theDate = datetime.date(*map(int,regx.search(givenURL).groups()))
theTime = datetime.datetime.strptime(givenTime,"%I:%M%p")

print '
theDate =='
,theDate,type(theDate)
print '
theTime =='
,theTime,type(theTime)


theDateTime = theTime.replace(theDate.year,theDate.month,theDate.day)
print '
theDateTime =='
,theDateTime,type(theDateTime)

结果

1
2
3
4
5
6
7
8
9
10
 givenURL == http://site.com/?year=2011&month=10&day=5&hour=11
givenTime == 08:53PM

map(int,regx.search(givenURL).groups()) == [2011, 10, 5]

theDate == 2011-10-05 <type 'datetime.date'>

theTime == 1900-01-01 20:53:00 <type 'datetime.datetime'>

theDateTime == 2011-10-05 20:53:00 <type 'datetime.datetime'>

。编辑1

由于strptime()很慢,我改进了代码以消除它

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from datetime import datetime
import re
from time import clock


n = 10000

givenURL  = 'http://site.com/?year=2011&month=10&day=5&hour=11'
givenTime = '08:53AM'

# eyquem
regx = re.compile('year=(\d\d\d\d)&month=(\d\d?)&day=(\d\d?)&hour=\d\d? (\d\d?):(\d\d?)(PM|pm)?')
t0 = clock()
for i in xrange(n):
    given = givenURL + ' ' + givenTime
    mat = regx.search(given)
    grps = map(int,mat.group(1,2,3,4,5))
    if mat.group(6):
        grps[3] += 12 # when it is PM/pm, the hour must be augmented with 12
    theDateTime1 = datetime(*grps)
print clock()-t0,"seconds   eyquem's code"
print theDateTime1


print

# Artsiom Rudzenka
dateandtimePattern ="http://site.com/?year=%Y&month=%m&day=%d&time=%I:%M%p"
t0 = clock()
for i in xrange(n):
    theDateTime2 = datetime.strptime(givenURL.split('&hour=')[0] + '&time=' + givenTime, dateandtimePattern)
print clock()-t0,"seconds   Artsiom's code"
print theDateTime2

print
print theDateTime1 == theDateTime2

结果

1
2
3
4
5
6
7
0.460598763251 seconds   eyquem's code
2011-10-05 08:53:00

2.10386180366 seconds   Artsiom'
s code
2011-10-05 08:53:00

True

我的代码快了4.5倍。如果有很多这样的转换要执行,这可能会很有趣。


使用格式字符串是不可能做到这一点的。但是,如果时间无关紧要,您可以像在第一个示例中那样从URL获取时间,然后调用theDateTime.replace(hour=hour_from_a_different_source)

这样就不需要进行任何额外的解析。