Split a Python string with nested separated symbol
我需要从弦
1 | i ="1,'Test','items (one, two, etc.)',1,'long, list'" |
提取下一个字符串的数组:
1 | ['1',"'Test'","'items (one, two, etc.)'", '1',"'long, list'"] |
号
在RegExpress的帮助下
1 | r=re.split(r',+(?=[^()]*(?:\(|$))', i) |
我只收到下一个结果:
1 | ['1',"'Test'","'items (one, two, etc.)'", '1',"'long"," list'"] |
。
UPD1
应支持空值
1 2 | i ="1,'Test',NULL,'items (one, two, etc.)',1,'long, list'" ['1',"'Test'", 'NULL',"'items (one, two, etc.)'", '1',"'long, list'"] |
在这种情况下,您不需要
1 2 | >>> [k for j in re.findall(r"(\d)|'([^']*)'",i) for k in j if k] ['1', 'Test', 'items (one, two, etc.)', '1', 'long, list'] |
前面的regex将匹配一个引号
或者,在这种情况下,作为一种更有效的方法,您可以使用
1 2 3 | >>> from ast import literal_eval >>> literal_eval(i) (1, 'Test', 'items (one, two, etc.)', 1, 'long, list') |
号
这是
1 2 3 4 5 6 7 | import csv from StringIO import StringIO line ="1,'Test','items (one, two, etc.)',1,'long, list'" reader = csv.reader(StringIO(line), quotechar="'") row = next(reader) # row == ['1', 'Test', 'items (one, two, etc.)', '1', 'long, list'] |
。
这里的关键是创建一个csv阅读器,将单引号指定为引号字符。
您可以单引号拆分:
1 2 3 4 5 6 | i ="1,'Test','items (one, two, etc.)',1,'long, list'" print([ele.strip(" ,") for ele in i.split("'") if ele.strip(",")]) ['1', 'Test', 'items (one, two, etc.)', '1', 'long, list'] |
或与地图一起使用:
1 | print([ele for ele in map(lambda x: x.strip(","), i.split("'")) if ele]) |
。
将map与python 3结合使用非常有效:
1 2 3 4 5 6 7 8 9 | In [7]: i ="1,'Test','items (one, two, etc.)',1,'long, list'" In [8]: timeit [ele for ele in map(lambda x: x.strip(","), i.split("'")) if ele] 1000000 loops, best of 3: 1.5 μs per loop In [9]: r = re.compile(r"(\d)|'([^']*)'") In [10]: timeit [k for j in r.findall(i) for k in j if k] 100000 loops, best of 3: 3.92 μs per loop |
更好地使用python2和
1 2 3 4 5 6 7 8 9 10 11 | In [9]: from itertools import imap In [10]: timeit [ele for ele in imap(lambda x: x.strip(","), i.split("'")) if ele] 1000000 loops, best of 3: 871 ns per loop In [11]: r = re.compile(r"(\d)|'([^']*)'") In [12]: timeit [k for j in r.findall(i) for k in j if k] 100000 loops, best of 3: 4.27 μs per loop In [17]: from ast import literal_eval In [18]: timeit literal_eval(i) 100000 loops, best of 3: 16.2 μs per loop |
。
所有这些返回的输出条文字值与它将数字计算为整数时返回的值相同:
1 2 3 4 5 6 7 | In [19]: literal_eval(i) Out[19]: (1, 'Test', 'items (one, two, etc.)', 1, 'long, list') In [20]: [k for j in r.findall(i) for k in j if k] Out[20]: ['1', 'Test', 'items (one, two, etc.)', '1', 'long, list'] In [21]: [ele for ele in imap(lambda x: x.strip(","), i.split("'")) if ele]Out[21]: ['1', 'Test', 'items (one, two, etc.)', '1', 'long, list'] |
。
空行没有什么不同:
1 2 3 4 5 6 7 | i ="1,'Test',NULL,'items (one, two, etc.)',1,'long, list'" print([ele for ele in map(lambda x: x.strip(","), i.split("'")) if ele]) ['1', 'Test', 'NULL', 'items (one, two, etc.)', '1', 'long, list'] |