关于python：删除字符串中多个空格的简单方法？

Simple way to remove multiple spaces in a string?

假设这是字符串：

1	The fox jumped over the log.

这将导致：

1	The fox jumped over the log.

能做到这一点的最简单的1-2衬垫是什么？不拆分和进入列表…

相关讨论

是你的字符串。

1	"".join(foo.split())

虽然这是"全warned removes Whitespace字符(换行符，制表符，回车空间，进纸)"。(感谢hhsaffar，见评论)将有效地"this is \t a test
"IE端上"this is a test"

相关讨论

1
2
3

>>> import re
>>> re.sub(' +', ' ', 'The quick brown fox')
'The quick brown fox'

相关讨论

1
2
3

import re
s ="The fox jumped over the log."
re.sub("\s\s+" ,"", s)

或

1	re.sub("\s\s+","", s)

由于空间是上市前的逗号在宠物peeve pep8(驼鹿，在上述的评论。

相关讨论

利用regexes与"S"和做简单的string.split()也将删除其他Whitespace样换行符，回车，制表符。除非这是一只到所需的空间，这些实例多，我现在。

编辑：我对我的wont睡过. .，对本模型的校正和此外，在最后的结果(v3.3.3"不是64位，32位)，明显的告诉我：字符串是相当平凡的测试。

所以，我……第11话，千字节，6665 Lorem ipsum得更现实的时间测试。然后由一随机长度在额外的空间。

1	original_string = ''.join(word + (' ' * random.randint(1, 10)) for word in lorem_ipsum.split(' '))

我也在"正确的校正join"；如果用一个线性的一个，将所带的任何前导/尾随空间，这preserves尾随领先的修正版本号(但只有一个空间；-)。(我发现这lorem_ipsum因为随机间隔有额外的空间，因此在失败的assert端)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

# setup = '''

import re

def while_replace(string):
while ' ' in string:
string = string.replace(' ', ' ')

return string

def re_replace(string):
return re.sub(r' {2,}' , ' ', string)

def proper_join(string):
split_string = string.split(' ')

# To account for leading/trailing spaces that would simply be removed
beg = ' ' if not split_string[ 0] else ''
end = ' ' if not split_string[-1] else ''

# versus simply ' '.join(item for item in string.split(' ') if item)
return beg + ' '.join(item for item in split_string if item) + end

original_string ="""Lorem ipsum ... no, really, it kept going... malesuada enim feugiat. Integer imperdiet erat."""

assert while_replace(original_string) == re_replace(original_string) == proper_join(original_string)

#'''

1
2
3
4
5
6

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string

1
2
3
4
5
6

# re_replace_test
new_string = original_string[:]

new_string = re_replace(new_string)

assert new_string != original_string

1
2
3
4
5
6

# proper_join_test
new_string = original_string[:]

new_string = proper_join(new_string)

assert new_string != original_string

注：< >"while版"的《original_string副本制造的改进，我相信一次的第一次连续在线运行，(如果只会更快(位)。这个时间的增加，这一增加的字符串复制到其他两个这样的时代表现出的差分逻辑只在S＞＜／。在心，让主stmt在线timeit实例只会被执行一次，这是我的路，while环在在线的标签，original_string，因此，第二个跑，那里将是不到的。The Way It’s设置了呼叫的功能，使用两个不同的标签，是不是，这是个问题。我assert语句添加到所有的工人都验证我们的迭代变化的东西(可能是那些dubious)。例如：这和它的变化。

1
2
3
4
5
6
7
8
9

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string # will break the 2nd iteration

while ' ' in original_string:
original_string = original_string.replace(' ', ' ')

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Tests run on a laptop with an i5 processor running Windows 7 (64-bit).

timeit.Timer(stmt = test, setup = setup).repeat(7, 1000)

test_string = 'The fox jumped over
\t the log.' # trivial

Python 2.7.3, 32-bit, Windows
test | minum | maximum | average | median
---------------------+------------+------------+------------+-----------
while_replace_test | 0.001066 | 0.001260 | 0.001128 | 0.001092
re_replace_test | 0.003074 | 0.003941 | 0.003357 | 0.003349
proper_join_test | 0.002783 | 0.004829 | 0.003554 | 0.003035

Python 2.7.3, 64-bit, Windows
test | minum | maximum | average | median
---------------------+------------+------------+------------+-----------
while_replace_test | 0.001025 | 0.001079 | 0.001052 | 0.001051
re_replace_test | 0.003213 | 0.004512 | 0.003656 | 0.003504
proper_join_test | 0.002760 | 0.006361 | 0.004626 | 0.004600

Python 3.2.3, 32-bit, Windows
test | minum | maximum | average | median
---------------------+------------+------------+------------+-----------
while_replace_test | 0.001350 | 0.002302 | 0.001639 | 0.001357
re_replace_test | 0.006797 | 0.008107 | 0.007319 | 0.007440
proper_join_test | 0.002863 | 0.003356 | 0.003026 | 0.002975

Python 3.3.3, 64-bit, Windows
test | minum | maximum | average | median
---------------------+------------+------------+------------+-----------
while_replace_test | 0.001444 | 0.001490 | 0.001460 | 0.001459
re_replace_test | 0.011771 | 0.012598 | 0.012082 | 0.011910
proper_join_test | 0.003741 | 0.005933 | 0.004341 | 0.004009

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

test_string = lorem_ipsum
# Thanks to http://www.lipsum.com/
#"Generated 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum"

Python 2.7.3, 32-bit
test | minum | maximum | average | median
---------------------+------------+------------+------------+-----------
while_replace_test | 0.342602 | 0.387803 | 0.359319 | 0.356284
re_replace_test | 0.337571 | 0.359821 | 0.348876 | 0.348006
proper_join_test | 0.381654 | 0.395349 | 0.388304 | 0.388193

Python 2.7.3, 64-bit
test | minum | maximum | average | median
---------------------+------------+------------+------------+-----------
while_replace_test | 0.227471 | 0.268340 | 0.240884 | 0.236776
re_replace_test | 0.301516 | 0.325730 | 0.308626 | 0.307852
proper_join_test | 0.358766 | 0.383736 | 0.370958 | 0.371866

Python 3.2.3, 32-bit
test | minum | maximum | average | median
---------------------+------------+------------+------------+-----------
while_replace_test | 0.438480 | 0.463380 | 0.447953 | 0.446646
re_replace_test | 0.463729 | 0.490947 | 0.472496 | 0.468778
proper_join_test | 0.397022 | 0.427817 | 0.406612 | 0.402053

Python 3.3.3, 64-bit
test | minum | maximum | average | median
---------------------+------------+------------+------------+-----------
while_replace_test | 0.284495 | 0.294025 | 0.288735 | 0.289153
re_replace_test | 0.501351 | 0.525673 | 0.511347 | 0.508467
proper_join_test | 0.422011 | 0.448736 | 0.436196 | 0.440318

在平凡的字符串，它将似乎是在环是最大的Python，跟着的字符串和正则表达式分/合，拉上的后方。

字符串的非平凡的，似乎有更多的位来考虑。32位2.7？它的正则表达式来救援！2.7 64位？这是最好的while环，通过该边缘。32位3.2，去与"正确的"join。64位while是3.3，开始循环。再次。

最后，一个可以提高性能，如果/在/在必要的，但它的口头禅：To Remember The Best

使它工作

让它的权利

让它快速

ianal，ymmv，货物出门，概不退换！

相关讨论

保罗McGuire已经同意上述的评论。给我，

1	' '.join(the_string.split())

vastly是可取的whipping出来到正则表达式。

我的测量(Linux，Python 2.5分)展的再连接。5次被几乎不做"re.sub(……)。3、如果你仍然在使用预编译的时代和时代的一次操作多。与它的更容易了解的任何测度——更大的Python。

相关讨论

类似于先前的解决方案，但更多的是两个或两个以上的空间特异性：与人：

1
2
3
4

>>> import re
>>> s ="The fox jumped over the log."
>>> re.sub('\s{2,}', ' ', s)
'The fox jumped over the log.'

简单的解决办法

1
2
3
4

>>> import re
>>> s="The fox jumped over the log."
>>> print re.sub('\s+',' ', s)
The fox jumped over the log.

相关讨论

您也可以在熊猫数据帧中使用字符串拆分技术，而无需使用.apply(..)，这在需要对大量字符串快速执行操作时非常有用。这是一条线：

1	df['message'] = (df['message'].str.split()).str.join(' ')

1
2
3
4
5

import re
string = re.sub('[ \t
]+', ' ', 'The quick brown

\t fox')

这将删除所有选项卡、新行和多个带有单个空白的空白。

相关讨论

一个额外的代码来删除线后，在所有的空间，和中的句子：

1 2	sentence =" The fox jumped over the log. " sentence = ' '.join(filter(None,sentence.split(' ')))

解释：

分割字符串为整个列表。

从滤波器的空元素的列表。

*与单元素rejoin剩余空间

剩余的元素应该是*字或词与punctuations等，我没有测试这个extensively，但这应该是良好的开端。所有最好的！

在某些情况下，需要将每个空格字符的连续出现替换为该字符的单个实例。您可以使用带有backreferences的正则表达式来实现这一点。

(\s)\1{1,}与任何空白字符匹配，后跟一个或多个该字符。现在，您需要做的就是指定第一个组(\1)作为匹配的替换。

将其包装在函数中：

1
2
3
4

import re

def normalize_whitespace(string):
return re.sub(r'(\s)\1{1,}', r'\1', string)

1
2
3
4
5
6
7
8

>>> normalize_whitespace('The fox jumped over the log.')
'The fox jumped over the log.'
>>> normalize_whitespace('First line\t\t\t

Second line')
'First line\t
Second line'

其他替代

1
2
3
4
5

>>> import re
>>> str = 'this is a string with multiple spaces and tabs'
>>> str = re.sub('[ \t]+' , ' ', str)
>>> print str
this is a string with multiple spaces and tabs

这也似乎工作：

1 2	while" " in s: s=s.replace(" ","")

在你的字符串变量的代表。

1
2
3
4
5
6

def unPretty(S):
# given a dictionary, json, list, float, int, or even a string..
# return a string stripped of CR, LF replaced by space, with multiple spaces reduced to one.
return ' '.join( str(S).replace('
',' ').replace('
','').split() )

用户生成字符串的最快速度是：

1
2
3

if ' ' in text:
while ' ' in text:
text = text.replace(' ', ' ')

短路使它比皮索拉的综合答案稍微快一点。如果你追求的是效率，那么就去追求这个目标，并且严格地考虑剔除单一空间中多余的空白。

1
2
3
4
5
6
7
8
9
10

i have tried the following method and it even works with the extreme case
like str1=' i live on earth '

' '.join(str1.split())

but if you prefer regular expression it can be done as:-

re.sub('\s+',' ',str1)

although some preprocessing has to be done in order to remove the trailing and ending space.

如果这是你处理Whitespace分裂在线不包括空字符串将不会对返回的值。

http：/ / / 2 /图书馆/ stdtypes.html docs.python.org # str.split

我有大学时用的简单方法。

1
2
3
4
5
6

line ="I have a nice day."

end = 1000
while end != 0:
line.replace(" ","")
end -= 1

这将用单个空间替换每个双空间，并将执行1000次。这意味着你可以有2000个额外的空间，仍然可以工作。：)

要删除空白，请考虑单词之间的前导空格、尾随空格和额外的空白，请使用：

？<= s)＋^ ^＋？=(s)？= +[ 0 ]

第一个或处理前导空格，第二个或处理字符串开头的前导空格，最后一个处理尾随空格

为了证明使用，此链接将为您提供一个测试。

网址：https://regex101.com/r/mebyli/4

如果您找到一个将破坏此regex代码的输入，请通知我。

另外-这将与re.split函数一起使用

我没有读过很多其他的例子，但是我刚刚创建了这个方法来合并多个连续的空格字符。

它不使用任何库，虽然它在脚本长度方面相对较长，但它不是一个复杂的实现。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

def spaceMatcher(command):
"""
function defined to consolidate multiple whitespace characters in
strings to a single space
"""
#initiate index to flag if more than 1 consecutive character
iteration
space_match = 0
space_char =""
for char in command:
if char =="":
space_match += 1
space_char +=""
elif (char !="") & (space_match > 1):
new_command = command.replace(space_char,"")
space_match = 0
space_char =""
elif char !="":
space_match = 0
space_char =""
return new_command

command = None
command = str(input("Please enter a command ->"))
print(spaceMatcher(command))
print(list(spaceMatcher(command)))

1
2
3
4
5
6

string='This is a string full of spaces and taps'
string=string.split(' ')
while '' in string:
string.remove('')
string=' '.join(string)
print(string)

结果：

This is a string full of spaces and taps