Efficient User-Agent Regex to find Safari in Python
要查明用户代理是否属于Safari,必须查找Safari的存在,而不是Chrome的存在。我还假设这需要不区分大小写。
我尝试在Python中使用正则表达式来实现这一点,而不需要随后遍历组来匹配字符串。
解决这一问题的一种方法是:
1 2 3 4 5 | r1 = re.compile ("Safari", re.I) r2 = re.compile ("Chrome", re.I) if len(r1.findall (userAgentString)) > 0 and len(r2.findall(userAgentString)) <=0): print"Found Safari" |
我也尝试过使用
1 2 3 4 | r = re.compile ("(?P<s>Safari)|(?P<c>Chrome)", re.I) m = r.search (userAgentString) if (m.group('s') and not m.group('c')): print"Found Safari" |
这不起作用,因为搜索将在找到"chrome"或"safari"的第一个实例(可能对regex gurus很明显)后停止。
我可以使用re.finditer()函数使其稍微高效地工作,如下所示:
1 2 3 4 5 6 7 8 9 | r = re.compile ("(?P<s>Safari)|(?P<c>Chrome)", re.I) safari = chrome = False for i in r.finditer (userAgentString): if i.group('s'): safari = True if i.group('c'): chrome = True if safari and not chrome: print"Found Safari" |
有更有效的方法吗?(请注意,我在寻找效率而不是便利)。谢谢。
示例用户代理:
Safari :"Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X)
AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d
Safari/8536.25"Chrome :"Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36"
值得一提的是,我对它进行了计时,Jwodder的效率达到了"lower()"和"in"。结果发现它比预编译的regex快10倍。除非我在设置/计时时出错…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | import timeit setup = ''' import re r = re.compile ('(?P<m>MSIE)|(?P<c>Chrome)|(?P<s>Safari)', re.I) def strictBrowser (userAgentString): c=s=m=False for f in r.finditer(userAgentString): if f.group('m'): m = True if f.group('c'): c = True if f.group('s'): s = True # msie or (safari but not chrome) # all chromes us will have safari in them.. return m or (s and not c) ''' print timeit.timeit( 'strictBrowser ("Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.2")', setup=setup, number=100000 ) setup = ''' def strictBrowser (userAgentString): userAgentString = userAgentString.lower() if ( 'msie' in userAgentString or ('safari' in userAgentString and 'chrome' not in userAgentString) ): return True return False ''' print timeit.timeit( 'strictBrowser ("Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.2")', setup=setup, number=100000 ) Output : 0.0778814506637 0.00664118263765 |
由于您正在测试特定的固定字符串是否出现在给定的字符串中,完全放弃regex可能是最简单和最有效的:
1 2 | if 'safari' in userAgentString.lower() and 'chrome' not in userAgentString.lower(): print"Found Safari" |