String comparison technique used by Python
我想知道python是如何进行字符串比较的,更具体地说,它是如何确定使用小于(
例如,如果我把
来自文档:
The comparison uses lexicographical
ordering: first the first two items
are compared, and if they differ this
determines the outcome of the
comparison; if they are equal, the
next two items are compared, and so
on, until either sequence is
exhausted.
也:
Lexicographical ordering for strings uses the Unicode code point number to order individual characters.
或在Python 2上:
Lexicographical ordering for strings uses the ASCII ordering for individual characters.
举个例子:
1 2 3 4 | >>> 'abc' > 'bac' False >>> ord('a'), ord('b') (97, 98) |
当发现
注意大小写:
1 2 3 4 | >>> [(x, ord(x)) for x in abc] [('a', 97), ('b', 98), ('c', 99), ('d', 100), ('e', 101), ('f', 102), ('g', 103), ('h', 104), ('i', 105), ('j', 106), ('k', 107), ('l', 108), ('m', 109), ('n', 110), ('o', 111), ('p', 112), ('q', 113), ('r', 114), ('s', 115), ('t', 116), ('u', 117), ('v', 118), ('w', 119), ('x', 120), ('y', 121), ('z', 122)] >>> [(x, ord(x)) for x in abc.upper()] [('A', 65), ('B', 66), ('C', 67), ('D', 68), ('E', 69), ('F', 70), ('G', 71), ('H', 72), ('I', 73), ('J', 74), ('K', 75), ('L', 76), ('M', 77), ('N', 78), ('O', 79), ('P', 80), ('Q', 81), ('R', 82), ('S', 83), ('T', 84), ('U', 85), ('V', 86), ('W', 87), ('X', 88), ('Y', 89), ('Z', 90)] |
python字符串比较是词典:
来自python文档:http://docs.python.org/reference/expressions.html
Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters. Unicode and 8-bit strings are fully interoperable in this behavior.
因此,在您的示例中,
python和几乎所有其他计算机语言都使用与(我希望)在印刷字典中查找单词时所使用的相同的原则:
(1)根据所涉及的人类语言,您有一个字符排序的概念:"A"<"B"<"C"等
(2)第一个字符比第二个字符更重:"az"<"za"(无论语言是从左到右还是从右到左还是从右到左,或者boustrophedon都是不相关的)
(3)如果要测试的字符用完,则较短的字符串小于较长的字符串:"foo"<"food"
通常,在计算机语言中,"字符排序的概念"是相当原始的:每个字符都有一个与人类语言无关的数字
还可以看看如何在python中按字母顺序对unicode字符串进行排序?其中讨论的是Unicode排序规则算法(http://www.unicode.org/reports/tr10/)给出的排序规则。
回复评论
What? How else can ordering be defined other than left-to-right?
在S.Lott看来,在对法语进行排序时有一个著名的反例。它涉及重音:事实上,可以说,在法语中,字母是从左到右排序的,重音是从右到左排序的。下面是反例:我们有E和O?你会想到cote,cot_,c这个词吗?TE,C?t_排序为cote 最后一句话:你不应该谈论从左到右和从右到左的排序,而应该谈论向前和向后的排序。 的确,有些语言是从右到左写的,如果你认为阿拉伯语和希伯来语是从右到左排序的,那么从图形的角度看,你可能是对的,但在逻辑层面上你是错的! 事实上,Unicode考虑按逻辑顺序编码的字符串,而写入方向是glyph级别上发生的一种现象。换言之,即使在这个词里????字母shin出现在跛子的右边,逻辑上它出现在它之前。要对这个词进行排序,首先考虑shin,然后考虑lamed,然后考虑vav,然后考虑mem,这是向前排序(尽管希伯来语是从右向左写的),而法语重音是向后排序(尽管法语是从左向右写的)。
这是词典编纂顺序。它只是按字典顺序排列。
字符串比较的纯python等价物是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | def less(string1, string2): # Compare character by character for idx in range(min(len(string1), len(string2))): # Get the"value" of the character ordinal1, ordinal2 = ord(string1[idx]), ord(string2[idx]) # If the"value" is identical check the next characters if ordinal1 == ordinal2: continue # If it's smaller we're finished and can return True elif ordinal1 < ordinal2: return True # If it's bigger we're finished and return False else: return False # We're out of characters and all were equal, so the result depends on the length # of the strings. return len(string1) < len(string2) |
这个函数相当于实际方法(python 3.6和python 2.7),速度慢得多。同样要注意的是,这个实现并不完全是"pythonic",只适用于
更普遍的变种是:
1 2 3 4 5 6 7 8 9 10 11 12 13 | from operator import lt, gt def compare(string1, string2, less=True): op = lt if less else gt for char1, char2 in zip(string1, string2): ordinal1, ordinal2 = ord(char1), ord(char1) if ordinal1 == ordinal2: continue elif op(ordinal1, ordinal2): return True else: return False return op(len(string1), len(string2)) |
使用字符的数字等价物(内置函数ord()的结果)在词典中比较字符串。Unicode和8位字符串在此行为中完全可互操作。
下面是一个示例代码,它从词典的角度比较两个字符串。
1 2 3 4 5 6 7 8 9 10 11 | a = str(input()) b = str(input()) if 1<=len(a)<=100 and 1<=len(b)<=100: a = a.lower() b = b.lower() if a > b: print('1') elif a < b: print( '-1') elif a == b: print('0') |
对于不同的输入,输出是-
1 2 3 4 5 6 7 8 9 10 11 | 1- abcdefg abcdeff 1 2- abc Abc 0 3- abs AbZ -1 |