Simple way to remove multiple spaces in a string?
假设这是字符串:
1 | The fox jumped over the log. |
这将导致:
1 | The fox jumped over the log. |
能做到这一点的最简单的1-2衬垫是什么?不拆分和进入列表…
是你的字符串。
1 | "".join(foo.split()) |
虽然这是"全warned removes Whitespace字符(换行符,制表符,回车空间,进纸)"。(感谢hhsaffar,见评论)将有效地
"
1 2 3 | >>> import re >>> re.sub(' +', ' ', 'The quick brown fox') 'The quick brown fox' |
1 2 3 | import re s ="The fox jumped over the log." re.sub("\s\s+" ,"", s) |
或
1 | re.sub("\s\s+","", s) |
由于空间是上市前的逗号在宠物peeve pep8(驼鹿,在上述的评论。
利用regexes与"S"和做简单的string.split()也将删除其他Whitespace样换行符,回车,制表符。除非这是一只到所需的空间,这些实例多,我现在。
编辑:我对我的wont睡过. .,对本模型的校正和此外,在最后的结果(v3.3.3"不是64位,32位),明显的告诉我:字符串是相当平凡的测试。
所以,我……第11话,千字节,6665 Lorem ipsum得更现实的时间测试。然后由一随机长度在额外的空间。
1 | original_string = ''.join(word + (' ' * random.randint(1, 10)) for word in lorem_ipsum.split(' ')) |
我也在"正确的校正
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | # setup = ''' import re def while_replace(string): while ' ' in string: string = string.replace(' ', ' ') return string def re_replace(string): return re.sub(r' {2,}' , ' ', string) def proper_join(string): split_string = string.split(' ') # To account for leading/trailing spaces that would simply be removed beg = ' ' if not split_string[ 0] else '' end = ' ' if not split_string[-1] else '' # versus simply ' '.join(item for item in string.split(' ') if item) return beg + ' '.join(item for item in split_string if item) + end original_string ="""Lorem ipsum ... no, really, it kept going... malesuada enim feugiat. Integer imperdiet erat.""" assert while_replace(original_string) == re_replace(original_string) == proper_join(original_string) #''' |
1 2 3 4 5 6 | # while_replace_test new_string = original_string[:] new_string = while_replace(new_string) assert new_string != original_string |
1 2 3 4 5 6 | # re_replace_test new_string = original_string[:] new_string = re_replace(new_string) assert new_string != original_string |
1 2 3 4 5 6 | # proper_join_test new_string = original_string[:] new_string = proper_join(new_string) assert new_string != original_string |
注:< >"
1 2 3 4 5 6 7 8 9 | # while_replace_test new_string = original_string[:] new_string = while_replace(new_string) assert new_string != original_string # will break the 2nd iteration while ' ' in original_string: original_string = original_string.replace(' ', ' ') |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | Tests run on a laptop with an i5 processor running Windows 7 (64-bit). timeit.Timer(stmt = test, setup = setup).repeat(7, 1000) test_string = 'The fox jumped over \t the log.' # trivial Python 2.7.3, 32-bit, Windows test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.001066 | 0.001260 | 0.001128 | 0.001092 re_replace_test | 0.003074 | 0.003941 | 0.003357 | 0.003349 proper_join_test | 0.002783 | 0.004829 | 0.003554 | 0.003035 Python 2.7.3, 64-bit, Windows test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.001025 | 0.001079 | 0.001052 | 0.001051 re_replace_test | 0.003213 | 0.004512 | 0.003656 | 0.003504 proper_join_test | 0.002760 | 0.006361 | 0.004626 | 0.004600 Python 3.2.3, 32-bit, Windows test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.001350 | 0.002302 | 0.001639 | 0.001357 re_replace_test | 0.006797 | 0.008107 | 0.007319 | 0.007440 proper_join_test | 0.002863 | 0.003356 | 0.003026 | 0.002975 Python 3.3.3, 64-bit, Windows test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.001444 | 0.001490 | 0.001460 | 0.001459 re_replace_test | 0.011771 | 0.012598 | 0.012082 | 0.011910 proper_join_test | 0.003741 | 0.005933 | 0.004341 | 0.004009 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | test_string = lorem_ipsum # Thanks to http://www.lipsum.com/ #"Generated 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum" Python 2.7.3, 32-bit test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.342602 | 0.387803 | 0.359319 | 0.356284 re_replace_test | 0.337571 | 0.359821 | 0.348876 | 0.348006 proper_join_test | 0.381654 | 0.395349 | 0.388304 | 0.388193 Python 2.7.3, 64-bit test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.227471 | 0.268340 | 0.240884 | 0.236776 re_replace_test | 0.301516 | 0.325730 | 0.308626 | 0.307852 proper_join_test | 0.358766 | 0.383736 | 0.370958 | 0.371866 Python 3.2.3, 32-bit test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.438480 | 0.463380 | 0.447953 | 0.446646 re_replace_test | 0.463729 | 0.490947 | 0.472496 | 0.468778 proper_join_test | 0.397022 | 0.427817 | 0.406612 | 0.402053 Python 3.3.3, 64-bit test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.284495 | 0.294025 | 0.288735 | 0.289153 re_replace_test | 0.501351 | 0.525673 | 0.511347 | 0.508467 proper_join_test | 0.422011 | 0.448736 | 0.436196 | 0.440318 |
在平凡的字符串,它将似乎是在环是最大的Python,跟着的字符串和正则表达式分/合,拉上的后方。
字符串的非平凡的,似乎有更多的位来考虑。32位2.7?它的正则表达式来救援!2.7 64位?这是最好的
最后,一个可以提高性能,如果/在/在必要的,但它的口头禅:To Remember The Best
ianal,ymmv,货物出门,概不退换!
保罗McGuire已经同意上述的评论。给我,
1 | ' '.join(the_string.split()) |
vastly是可取的whipping出来到正则表达式。
我的测量(Linux,Python 2.5分)展的再连接。5次被几乎不做"re.sub(……)。3、如果你仍然在使用预编译的时代和时代的一次操作多。与它的更容易了解的任何测度——更大的Python。
类似于先前的解决方案,但更多的是两个或两个以上的空间特异性:与人:
1 2 3 4 | >>> import re >>> s ="The fox jumped over the log." >>> re.sub('\s{2,}', ' ', s) 'The fox jumped over the log.' |
简单的解决办法
1 2 3 4 | >>> import re >>> s="The fox jumped over the log." >>> print re.sub('\s+',' ', s) The fox jumped over the log. |
您也可以在熊猫数据帧中使用字符串拆分技术,而无需使用.apply(..),这在需要对大量字符串快速执行操作时非常有用。这是一条线:
1 | df['message'] = (df['message'].str.split()).str.join(' ') |
1 2 3 4 5 | import re string = re.sub('[ \t ]+', ' ', 'The quick brown \t fox') |
这将删除所有选项卡、新行和多个带有单个空白的空白。
一个额外的代码来删除线后,在所有的空间,和中的句子:
1 2 | sentence =" The fox jumped over the log. " sentence = ' '.join(filter(None,sentence.split(' '))) |
解释:
剩余的元素应该是*字或词与punctuations等,我没有测试这个extensively,但这应该是良好的开端。所有最好的!
在某些情况下,需要将每个空格字符的连续出现替换为该字符的单个实例。您可以使用带有backreferences的正则表达式来实现这一点。
将其包装在函数中:
1 2 3 4 | import re def normalize_whitespace(string): return re.sub(r'(\s)\1{1,}', r'\1', string) |
1 2 3 4 5 6 7 8 | >>> normalize_whitespace('The fox jumped over the log.') 'The fox jumped over the log.' >>> normalize_whitespace('First line\t\t\t Second line') 'First line\t Second line' |
其他替代
1 2 3 4 5 | >>> import re >>> str = 'this is a string with multiple spaces and tabs' >>> str = re.sub('[ \t]+' , ' ', str) >>> print str this is a string with multiple spaces and tabs |
这也似乎工作:
1 2 | while" " in s: s=s.replace(" ","") |
在你的字符串变量的代表。
1 2 3 4 5 6 | def unPretty(S): # given a dictionary, json, list, float, int, or even a string.. # return a string stripped of CR, LF replaced by space, with multiple spaces reduced to one. return ' '.join( str(S).replace(' ',' ').replace(' ','').split() ) |
用户生成字符串的最快速度是:
1 2 3 | if ' ' in text: while ' ' in text: text = text.replace(' ', ' ') |
短路使它比皮索拉的综合答案稍微快一点。如果你追求的是效率,那么就去追求这个目标,并且严格地考虑剔除单一空间中多余的空白。
1 2 3 4 5 6 7 8 9 10 | i have tried the following method and it even works with the extreme case like str1=' i live on earth ' ' '.join(str1.split()) but if you prefer regular expression it can be done as:- re.sub('\s+',' ',str1) although some preprocessing has to be done in order to remove the trailing and ending space. |
如果这是你处理Whitespace分裂在线不包括空字符串将不会对返回的值。
http:/ / / 2 /图书馆/ stdtypes.html docs.python.org # str.split
我有大学时用的简单方法。
1 2 3 4 5 6 | line ="I have a nice day." end = 1000 while end != 0: line.replace(" ","") end -= 1 |
这将用单个空间替换每个双空间,并将执行1000次。这意味着你可以有2000个额外的空间,仍然可以工作。:)
要删除空白,请考虑单词之间的前导空格、尾随空格和额外的空白,请使用:
?<= s)+^ ^+?=(s)?= +[ 0 ]
第一个或处理前导空格,第二个或处理字符串开头的前导空格,最后一个处理尾随空格
为了证明使用,此链接将为您提供一个测试。
网址:https://regex101.com/r/mebyli/4
如果您找到一个将破坏此regex代码的输入,请通知我。
另外-这将与re.split函数一起使用
我没有读过很多其他的例子,但是我刚刚创建了这个方法来合并多个连续的空格字符。
它不使用任何库,虽然它在脚本长度方面相对较长,但它不是一个复杂的实现。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | def spaceMatcher(command): """ function defined to consolidate multiple whitespace characters in strings to a single space """ #initiate index to flag if more than 1 consecutive character iteration space_match = 0 space_char ="" for char in command: if char =="": space_match += 1 space_char +="" elif (char !="") & (space_match > 1): new_command = command.replace(space_char,"") space_match = 0 space_char ="" elif char !="": space_match = 0 space_char ="" return new_command command = None command = str(input("Please enter a command ->")) print(spaceMatcher(command)) print(list(spaceMatcher(command))) |
1 2 3 4 5 6 | string='This is a string full of spaces and taps' string=string.split(' ') while '' in string: string.remove('') string=' '.join(string) print(string) |
结果:
This is a string full of spaces and taps