Check if any (all) character of a string is in a given range
我有一个包含Unicode符号(西里尔文)的字符串:
1 2 | myString1 = 'Австрия' myString2 = 'AustriЯ' |
我想检查字符串中的所有元素是否都是英语(ASCII)。现在我用一个循环:
1 2 3 | for char in myString1: if ord(s) not in range(65,91): break |
所以如果我找到第一个非英语元素,我就打破这个循环。但是对于给定的示例,您可以看到字符串末尾可以包含许多英文符号和Unicode。这样我会检查整个字符串。此外,如果所有字符串都是英文的,我仍然会检查每个字符。
有没有更有效的方法?我想的是:
1 | if any(myString[:]) is not in range(65,91) |
您可以使用
1 2 3 4 5 6 7 | import string ascii = set(string.ascii_uppercase) ascii_all = set(string.ascii_uppercase + string.ascii_lowercase) if all(x in ascii for x in my_string1): # my_string1 is all ascii |
当然,任何
1 2 | if not any(x not in ascii for x in my_string1): # my_string1 is all ascii |
更新:
正如Artyer所指出的,一种不需要完全迭代的基于集合的好方法:
1 2 | if ascii.issuperset(my_string1): # my_string1 is all ascii |
另一种方法,正如@schwobaseggl所建议的,但使用完整的方法:
1 2 3 4 | import string ascii = string.ascii_uppercase + string.ascii_lowercase if set(my_string).issubset(ascii): #myString is ascii |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | import re # to check whether any outside ranges (->MatchObject) / all in ranges (->None) nonletter = re.compile('[^a-zA-Z]').search # to check whether any in ranges (->MatchObject) / all outside ranges (->None) letter = re.compile('[a-zA-Z]').search bool(nonletter(myString1)) # True bool(nonletter(myString2)) # True bool(nonletter(myString2[:-1])) # False |
OP的两个示例和一个正示例的基准(set是@schwobaseggl setset是@danielsanchez):
1 2 3 4 5 6 7 8 9 10 11 12 | Австрия re 0.48832818 ± 0.09022105 μs set 0.58745548 ± 0.01759877 μs setset 0.81759223 ± 0.03595184 μs AustriЯ re 0.51960442 ± 0.01881561 μs set 1.03043942 ± 0.02453405 μs setset 0.54060076 ± 0.01505265 μs tralala re 0.27832978 ± 0.01462306 μs set 0.88285526 ± 0.03792728 μs setset 0.43238688 ± 0.01847240 μs |
基准代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | import types from timeit import timeit import re import string import numpy as np def mnsd(trials): return '{:1.8f} \u00b1 {:10.8f} \u00b5s'.format(np.mean(trials), np.std(trials)) nonletter = re.compile('[^a-zA-Z]').search letterset = set(string.ascii_letters) def f_re(stri): return not nonletter(stri) def f_set(stri): return all(x in letterset for x in stri) def f_setset(stri): return set(stri).issubset(letterset) for stri in ('Австрия', 'AustriЯ', 'tralala'): ref = f_re(stri) print(stri) for name, func in list(globals().items()): if not name.startswith('f_') or not isinstance(func, types.FunctionType): continue try: assert ref == func(stri) print("{:16s}".format(name[2:]), mnsd([timeit( 'f(stri)', globals={'f':func, 'stri':stri}, number=1000) * 1000 for i in range(1000)])) except: print("{:16s} apparently failed".format(name[2:])) |
无法避免迭代。但是,通过执行