关于python：如何替换字符串的多个子字符串？

How to replace multiple substrings of a string?

我想使用.replace函数来替换多个字符串。

我现在有

1	string.replace("condition1","")

但是想要有类似的东西

1	string.replace("condition1","").replace("condition2","text")

虽然那感觉不是很好的语法

这样做的正确方法是什么？有点像在grep / regex中你可以做\1和\2来将字段替换为某些搜索字符串

相关讨论

这是一个简短的例子，应该使用正则表达式：

1
2
3
4
5
6
7
8
9

import re

rep = {"condition1":"","condition2":"text"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)

例如：

1 2	>>> pattern.sub(lambda m: rep[re.escape(m.group(0))],"(condition1) and --condition2--") '() and --text--'

相关讨论

我认为它整洁。虽然Id将其包装在一个函数中。
问题：此代码是否在一次通过中进行替换？或者每个字典键值对sub()调用一次？
替换发生在一次通过。
dkamins：它不是太聪明，它甚至不像它应该的那样聪明(我们应该在使用"|"加入它们之前使用正则表达式转义键)。为什么不是过度工程？因为这样我们一次通过(=快)，我们同时做所有替换，避免像"spamham sha".replace("spam","eggs").replace("sha","md5")而不是"eggsham md5"这样的冲突
很棒的想法，但你的代码中有一个错误：你需要像这样逃避m.group(0)：lambda m: rep[re.escape(m.group(0))]。此外，您的代码不适用于包含多行的字符串：您需要在re.compile中添加re.M。
@MiniQuark在替换期间转义的好点，但是这应该可以在多行字符串上正常工作。即使没有re.M选项(只会改变^和$的含义)，您也可以匹配字符串中的文字换行符。
啊，你说得对，我的坏。至于转义，我认为它实际上应该在任何地方删除，并添加到行pattern = ...。请参阅下面的答案。
我喜欢。而不是使用转义版本覆盖密钥并且必须重新转义匹配结果，为什么不：pattern = re.compile("|".join(re.escape(k) for k in rep))然后text = pattern.sub(lambda m: rep[m.group(0)], text)
@AndrewClark如果你能用lambda解释最后一行发生的事情，我将不胜感激。
@AndrewClark这个答案可能应该指明它是用于Python2还是Python3，因为字典的使用方式已经改变。
@minerals lambda是一个匿名函数。在那里，它需要一个值(m)并返回以下表达式的结果。或者，您可以创建一个命名函数def replace_conditions(text, rep): return rep[re.escape(text.group(0))]，为文本text ="(condition1) and --condition2--"分配一个变量，并使用该函数生成的模式和原始文本调用？sub：pattern.sub(replace_conditions(text, rep), text)。
您好，我创建了一个小小的要点，其中包含更清晰的片段版本。它应该稍微更高效：gist.github.com/bgusach/a967e0587d6e01e889fd1d776c5f3729
这对python 3也有效吗？
万分感谢 - 这很好用(简洁！)，但我有重音字符问题(utf-8编码源)。有什么我可以添加来解决这个问题吗？谢谢，最好。
它不适用于Python 3
对于python 3，使用items()而不是iteritems()。
这个答案令人困惑和复杂。我仍然不明白第三行。应该更好地解释。 Enrico Bianchis的回答非常简单！
很好的答案。有点可悲的是，它不是蝙蝠，但是...... mystring.replace({condition1: , condition2: text})

你可以做一个很好的小循环功能。

1
2
3
4

def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text

其中text是完整的字符串，dic是字典，每个定义都是一个字符串，用于替换该字符串的匹配项。

注意：在Python 3中，iteritems()已替换为items()

小心：Python字典没有可靠的迭代顺序。此解决方案仅解决您的问题，如果：

替换顺序无关紧要
替换可以改变以前替换的结果

例如：

1
2
3
4

d = {"cat":"dog","dog":"pig"}
mySentence ="This is my cat and this is my dog."
replace_all(mySentence, d)
print(mySentence)

可能的输出＃1：

1	"This is my pig and this is my pig."

可能的输出＃2

1	"This is my dog and this is my pig."

一种可能的解决方法是使用OrderedDict。

1
2
3
4
5
6
7
8
9

from collections import OrderedDict
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
return text
od = OrderedDict([("cat","dog"), ("dog","pig")])
mySentence ="This is my cat and this is my dog."
replace_all(mySentence, od)
print(mySentence)

输出：

1	"This is my pig and this is my pig."

小心＃2：如果text字符串太大或字典中有很多对，效率低下。

相关讨论

以下是使用reduce的第一个解决方案的变体，以备您正常使用。 :)

1
2
3

repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)

martineau甚至更好的版本：

1
2
3

repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

相关讨论

为什么不这样的解决方案呢？

1
2
3
4
5

s ="The quick brown fox jumps over the lazy dog"
for r in (("brown","red"), ("lazy","quick")):
s = s.replace(*r)

#output will be: The quick red fox jumps over the quick dog

这只是对F.J和MiniQuark的一个更简洁的回顾。您需要实现多个同时串替换的功能如下：

1
2
3

def multiple_replace(string, rep_dict):
pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
return pattern.sub(lambda x: rep_dict[x.group(0)], string)

用法：

1 2	>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'}) 'Do you prefer tea? No, I prefer cafe.'

如果您愿意，您可以从这个更简单的更换功能开始。

相关讨论

我在F.J.s上建立了这个很好的答案：

1
2
3
4
5
6
7
8
9
10

import re

def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function = lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
return lambda string: pattern.sub(replacement_function, string)

def multiple_replace(string, *key_values):
return multiple_replacer(*key_values)(string)

一次性用法：

1
2
3

>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.

请注意，由于替换仅在一次通过中完成，"café"更改为"tea"，但它不会更改回"café"。

如果您需要多次进行相同的更换，您可以轻松创建替换功能：

1
2
3
4
5
6
7
8
9
10
11
12

>>> my_escaper = multiple_replacer(('"','\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by"my_escaper"',
u'Does this work?\tYes it does',
u'And can we span
multiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
... print my_escaper(line)
...
This text will be escaped by "my_escaper"
Does this work?\tYes it does
And can we span
multiple lines?\t"Yes\twe\tcan!"

改进：

将代码转换为函数
增加了多线支持
修复了转义中的错误
易于为特定的多次替换创建功能

请享用！ :-)

相关讨论

我想提出字符串模板的用法。只需将要替换的字符串放在字典中即可完成所有设置！来自docs.python.org的示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

相关讨论

在我的情况下，我需要用名称简单替换唯一键，所以我想到了这一点：

1
2
3
4
5
6

a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

相关讨论

我的0.02美元。它基于Andrew Clark的答案，只是更清楚一点，它还涵盖了当要替换的字符串是要替换的另一个字符串的子字符串(更长的字符串获胜)的情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

def multireplace(string, replacements):
"""
Given a string and a replacement map, it returns the replaced string.

:param str string: string to execute replacements on
:param dict replacements: replacement dictionary {value to find: value to replace}
:rtype: str

"""
# Place longer ones first to keep shorter substrings from matching
# where the longer ones should take place
# For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against
# the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
substrs = sorted(replacements, key=len, reverse=True)

# Create a big OR regex that matches any of the substrings to replace
regexp = re.compile('|'.join(map(re.escape, substrs)))

# For each match, look up the new string in the replacements
return regexp.sub(lambda match: replacements[match.group(0)], string)

正是在这个要点中，如果您有任何建议，请随时修改它。

相关讨论

从Python 3.8开始，并引入赋值表达式(PEP 572)(:=运算符)，我们可以在列表解析中应用替换：

1
2
3
4

# text ="The quick brown fox jumps over the lazy dog"
# replacements = [("brown","red"), ("lazy","quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

我需要一个解决方案，其中要替换的字符串可以是正则表达式，
例如，通过用一个空格替换多个空格字符来帮助规范化长文本。基于其他人的一系列答案，包括MiniQuark和mmj，这就是我想出的：

1
2
3
4
5
6
7
8
9
10
11
12
13

def multiple_replace(string, reps, re_flags = 0):
""" Transforms string, replacing keys from re_str_dict with values.
reps: dictionary, or list of key-value pairs (to enforce ordering;
earlier items have higher priority).
Keys are used as regular expressions.
re_flags: interpretation of regular expressions, such as re.DOTALL
"""
if isinstance(reps, dict):
reps = reps.items()
pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
for i, re_str in enumerate(reps)),
re_flags)
return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)

它适用于其他答案中给出的示例，例如：

1
2
3
4
5
6
7
8
9
10

>>> multiple_replace("(condition1) and --condition2--",
... {"condition1":"","condition2":"text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
... {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'

对我来说最重要的是你也可以使用正则表达式，例如仅替换整个单词，或者规范化空格：

1
2
3
4
5
6

>>> s ="I don't want to change this name:
Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[
\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"

如果要将字典键用作普通字符串，
你可以在使用例如多个这个功能：

1
2
3
4
5
6
7

def escape_keys(d):
""" transform dictionary d by applying re.escape to the keys"""
return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:
Philip II of Spain"

以下函数可以帮助您在字典键中找到错误的正则表达式(因为来自multiple_replace的错误消息不是很有说服力)：

1
2
3
4
5
6
7
8
9
10

def check_re_list(re_list):
""" Checks if each regular expression in list is well-formed."""
for i, e in enumerate(re_list):
try:
re.compile(e)
except (TypeError, re.error):
print("Invalid regular expression string"
"at position {}: '{}'".format(i, e))

>>> check_re_list(re_str_dict.keys())

请注意，它不会链接替换，而是同时执行它们。这样可以在不限制其功能的情况下提高效率。要模仿链接的效果，您可能只需要添加更多字符串替换对并确保对的预期排序：

1
2
3
4
5

>>> multiple_replace("button", {"but":"mut","mutton":"lamb"})
'mutton'
>>> multiple_replace("button", [("button","lamb"),
... ("but","mut"), ("mutton","lamb")])
'lamb'

相关讨论

这是一个在长字符串上更有效的示例，有许多小的替换。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

source ="Here is foo, it does moo!"

replacements = {
'is': 'was', # replace 'is' with 'was'
'does': 'did',
'!': '?'
}

def replace(source, replacements):
finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
result = []
pos = 0
while True:
match = finder.search(source, pos)
if match:
# cut off the part up until match
result.append(source[pos : match.start()])
# cut off the matched part and replace it in place
result.append(replacements[source[match.start() : match.end()]])
pos = match.end()
else:
# the rest after the last match
result.append(source[pos:])
break
return"".join(result)

print replace(source, replacements)

重点是避免很多长串的连接。我们将源字符串剪切为片段，在我们构成列表时替换一些片段，然后将整个事物连接回字符串。

我不知道速度，但这是我的工作日快速修复：

1
2
3
4

reduce(lambda a, b: a.replace(*b)
, [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
, 'tomato' #The string from which to replace values
)

...但我喜欢上面的＃1正则表达式答案。注意 - 如果一个新值是另一个的子字符串，则该操作不可交换。

你真的不应该这样做，但我觉得它太酷了：

1
2
3
4
5

>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>> cmd +=".replace(%s, %s)" %(k,v)
>>> exec(cmd)

现在，answer是所有替换的结果

再次，这是非常hacky，并不是你应该经常使用的东西。但是如果你需要的话，知道你可以做这样的事情真是太好了。

相关讨论

或者只是为了快速入侵：

1
2
3
4
5

for line in to_read:
read_buffer = line
stripped_buffer1 = read_buffer.replace("term1","")
stripped_buffer2 = stripped_buffer1.replace("term2","")
write_to_file = to_write.write(stripped_buffer2)

从Andrew的宝贵答案开始，我开发了一个脚本，该脚本从文件加载字典并详细说明打开的文件夹上的所有文件以进行替换。该脚本从外部文件加载映射，您可以在其中设置分隔符。我是初学者，但我发现这个脚本在多个文件中进行多次替换时非常有用。它在几秒钟内加载了一个包含1000多个条目的字典。它不优雅，但它对我有用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

import glob
import re

mapfile = input("Enter map file name with extension eg. codifica.txt:")
sep = input("Enter map file column separator eg. |:")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed:")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files:")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
for line in temprep:
(key, val) = line.strip('
').split(sep)
rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

with open (filename,"r") as textfile: # load each file in the variable text
text = textfile.read()

# start replacement
#rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[m.group(0)], text)

#write of te output files with the prompted suffice
target = open(filename[:-4]+"_NEW.txt","w")
target.write(text)
target.close()

您可以使用pandas库和replace函数，它支持完全匹配和正则表达式替换。例如：

1
2
3
4
5
6

df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})

to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']

print(df.text.replace(to_replace, replace_with, regex=True))

修改后的文字是：

1
2
3

0 name is going to visit city in month
1 I was born in date
2 I will be there at time

你可以在这里找到一个例子。请注意，文本的替换是按照它们在列表中出现的顺序完成的

这是我解决问题的方法。我在聊天机器人中使用它来一次替换不同的单词。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

def mass_replace(text, dct):
new_string =""
old_string = text
while len(old_string) > 0:
s =""
sk =""
for k in dct.keys():
if old_string.startswith(k):
s = dct[k]
sk = k
if s:
new_string+=s
old_string = old_string[len(sk):]
else:
new_string+=old_string[0]
old_string = old_string[1:]
return new_string

print mass_replace("The dog hunts the cat", {"dog":"cat","cat":"dog"})

这将成为The cat hunts the dog

另一个例子：
输入列表

1 2	error_list = ['[br]', '[ex]', 'Something'] words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']

期望的输出将是

1	words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']

代码：

1	[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]

我建议代码应该是，例如：

1 2	z ="My name is Ahmed, and I like coding" print(z.replace(" Ahmed"," Dauda").replace(" like"," Love" ))

它将按要求打印出所有更改。

相关讨论

以下是使用字典执行此操作的另一种方法：

1
2
3
4

listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print"".join(modify[x] for x in listA)