关于python:如何拆分字符串中的两个项目?

How to split on two items in a string?

本问题已经有最佳答案,请猛点这里访问。

使用.read()读取文件,如何同时拆分两个对象?我试图同时拆分逗号和"
"
,但当我首先拆分逗号时,它会将我的字符串转换为一个列表,在列表中我不能再拆分。

下面是我要拆分的字符串:

埃多克斯1〔2〕


您可以使用列表理解:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
>>> strs = 'States, Total Score, Critical Reading, Mathematics, Writing, Participation (%)
Washington,1564,524,532,508,41.2000
NewHampshire,1554,520,524,510,64.0000
Massachusetts,1547,512,526,509,72.1000
Oregon,1546,523,524,499,37.1000
Vermont,1546,519,512,506,64.0000
Arizona,1544,519,525,500,22.4000
Connecticut,1536,509,514,513,71.2000
Alaska,1524,518,515,491,32.7000
Virginia,1521,512,512,497,56.0000
California,1517,501,516,500,37.5000
NewJersey,1506,495,514,497,69.0000
Maryland,1502,501,506,495,56.7000
NorthCarolina,1485,497,511,477,45.5000
RhodeIsland,1477,494,495,488,60.8000
Indiana,1476,494,505,477,52.0000
Florida,1473,496,498,479,44.7000
Pennsylvania,1473,492,501,480,62.3000
Nevada,1470,496,501,473,25.9000
Delaware,1469,493,495,481,59.2000
Texas,1462,484,505,473,41.5000
NewYork,1461,484,499,478,59.6000
Hawaii,1458,483,505,470,47.1000
Georgia,1453,488,490,475,46.5000
SouthCarolina,1447,484,495,468,40.7000
Maine,1389,468,467,454,87.1000
Iowa,1798,603,613,582,2.7000
Minnesota,1781,594,607,580,6.0000
Wisconsin,1778,595,604,579,3.8000
Missouri,1768,593,595,580,3.6000
Michigan,1766,585,605,576,3.8000
SouthDakota,1766,592,603,571,2.0000
Illinois,1762,585,600,577,4.6700
Kansas,1752,590,595,567,4.7000
Nebraska,1746,585,593,568,3.9000
NorthDakota,1733,580,594,559,3.4000
Kentucky,1713,575,575,563,5.0000
Tennessee,1712,576,571,565,6.4000
Colorado,1695,568,572,555,14.1000
Arkansas,1684,566,566,552,3.5000
Oklahoma,1684,569,568,547,3.8000
Wyoming,1683,570,567,546,3.6000
Utah,1674,568,559,547,4.5000
Mississippi,1666,566,548,552,2.2000
Louisiana,1652,555,550,547,4.0000
Alabama,1650,556,550,544,5.4000
NewMexico,1636,553,549,534,7.1000
Ohio,1609,538,548,522,17.2000
Idaho,1601,543,541,517,14.6000
Montana,1593,538,538,517,20.0000
West Virginia,1522,515,507,500,13.2000
'

>>> [ y for x in strs.splitlines() for y in x.split(",")]
['States', ' Total Score', ' Critical Reading', ' Mathematics', ' Writing', ' Participation (%)', 'Washington', '1564', '524', '532', '508', '41.2000', 'NewHampshire', '1554', '520', '524', '510', '64.0000', 'Massachusetts', '1547', '512', '526', '509', '72.1000', 'Oregon', '1546', '523', '524', '499', '37.1000', 'Vermont', '1546', '519', '512', '506', '64.0000', 'Arizona', '1544', '519', '525', '500', '22.4000', 'Connecticut', '1536', '509', '514', '513', '71.2000', 'Alaska', '1524', '518', '515', '491', '32.7000', 'Virginia', '1521', '512', '512', '497', '56.0000', 'California', '1517', '501', '516', '500', '37.5000', 'NewJersey', '1506', '495', '514', '497', '69.0000', 'Maryland', '1502', '501', '506', '495', '56.7000', 'NorthCarolina', '1485', '497', '511', '477', '45.5000', 'RhodeIsland', '1477', '494', '495', '488', '60.8000', 'Indiana', '1476', '494', '505', '477', '52.0000', 'Florida', '1473', '496', '498', '479', '44.7000', 'Pennsylvania', '1473', '492', '501', '480', '62.3000', 'Nevada', '1470', '496', '501', '473', '25.9000', 'Delaware', '1469', '493', '495', '481', '59.2000', 'Texas', '1462', '484', '505', '473', '41.5000', 'NewYork', '1461', '484', '499', '478', '59.6000', 'Hawaii', '1458', '483', '505', '470', '47.1000', 'Georgia', '1453', '488', '490', '475', '46.5000', 'SouthCarolina', '1447', '484', '495', '468', '40.7000', 'Maine', '1389', '468', '467', '454', '87.1000', 'Iowa', '1798', '603', '613', '582', '2.7000', 'Minnesota', '1781', '594', '607', '580', '6.0000', 'Wisconsin', '1778', '595', '604', '579', '3.8000', 'Missouri', '1768', '593', '595', '580', '3.6000', 'Michigan', '1766', '585', '605', '576', '3.8000', 'SouthDakota', '1766', '592', '603', '571', '2.0000', 'Illinois', '1762', '585', '600', '577', '4.6700', 'Kansas', '1752', '590', '595', '567', '4.7000', 'Nebraska', '1746', '585', '593', '568', '3.9000', 'NorthDakota', '1733', '580', '594', '559', '3.4000', 'Kentucky', '1713', '575', '575', '563', '5.0000', 'Tennessee', '1712', '576', '571', '565', '6.4000', 'Colorado', '1695', '568', '572', '555', '14.1000', 'Arkansas', '1684', '566', '566', '552', '3.5000', 'Oklahoma', '1684', '569', '568', '547', '3.8000', 'Wyoming', '1683', '570', '567', '546', '3.6000', 'Utah', '1674', '568', '559', '547', '4.5000', 'Mississippi', '1666', '566', '548', '552', '2.2000', 'Louisiana', '1652', '555', '550', '547', '4.0000', 'Alabama', '1650', '556', '550', '544', '5.4000', 'NewMexico', '1636', '553', '549', '534', '7.1000', 'Ohio', '1609', '538', '548', '522', '17.2000', 'Idaho', '1601', '543', '541', '517', '14.6000', 'Montana', '1593', '538', '538', '517', '20.0000', 'West Virginia', '1522', '515', '507', '500', '13.2000']

如果需要包含在,处拆分的每行的列表:

1
2
>>> [x.split(",") for x in strs.splitlines()]
[['States', ' Total Score', ' Critical Reading', ' Mathematics', ' Writing', ' Participation (%)'], ['Washington', '1564', '524', '532', '508', '41.2000'], ['NewHampshire', '1554', '520', '524', '510', '64.0000'], ['Massachusetts', '1547', '512', '526', '509', '72.1000'], ['Oregon', '1546', '523', '524', '499', '37.1000'], ['Vermont', '1546', '519', '512', '506', '64.0000'], ['Arizona', '1544', '519', '525', '500', '22.4000'], ['Connecticut', '1536', '509', '514', '513', '71.2000'], ['Alaska', '1524', '518', '515', '491', '32.7000'], ['Virginia', '1521', '512', '512', '497', '56.0000'], ['California', '1517', '501', '516', '500', '37.5000'], ['NewJersey', '1506', '495', '514', '497', '69.0000'], ['Maryland', '1502', '501', '506', '495', '56.7000'], ['NorthCarolina', '1485', '497', '511', '477', '45.5000'], ['RhodeIsland', '1477', '494', '495', '488', '60.8000'], ['Indiana', '1476', '494', '505', '477', '52.0000'], ['Florida', '1473', '496', '498', '479', '44.7000'], ['Pennsylvania', '1473', '492', '501', '480', '62.3000'], ['Nevada', '1470', '496', '501', '473', '25.9000'], ['Delaware', '1469', '493', '495', '481', '59.2000'], ['Texas', '1462', '484', '505', '473', '41.5000'], ['NewYork', '1461', '484', '499', '478', '59.6000'], ['Hawaii', '1458', '483', '505', '470', '47.1000'], ['Georgia', '1453', '488', '490', '475', '46.5000'], ['SouthCarolina', '1447', '484', '495', '468', '40.7000'], ['Maine', '1389', '468', '467', '454', '87.1000'], ['Iowa', '1798', '603', '613', '582', '2.7000'], ['Minnesota', '1781', '594', '607', '580', '6.0000'], ['Wisconsin', '1778', '595', '604', '579', '3.8000'], ['Missouri', '1768', '593', '595', '580', '3.6000'], ['Michigan', '1766', '585', '605', '576', '3.8000'], ['SouthDakota', '1766', '592', '603', '571', '2.0000'], ['Illinois', '1762', '585', '600', '577', '4.6700'], ['Kansas', '1752', '590', '595', '567', '4.7000'], ['Nebraska', '1746', '585', '593', '568', '3.9000'], ['NorthDakota', '1733', '580', '594', '559', '3.4000'], ['Kentucky', '1713', '575', '575', '563', '5.0000'], ['Tennessee', '1712', '576', '571', '565', '6.4000'], ['Colorado', '1695', '568', '572', '555', '14.1000'], ['Arkansas', '1684', '566', '566', '552', '3.5000'], ['Oklahoma', '1684', '569', '568', '547', '3.8000'], ['Wyoming', '1683', '570', '567', '546', '3.6000'], ['Utah', '1674', '568', '559', '547', '4.5000'], ['Mississippi', '1666', '566', '548', '552', '2.2000'], ['Louisiana', '1652', '555', '550', '547', '4.0000'], ['Alabama', '1650', '556', '550', '544', '5.4000'], ['NewMexico', '1636', '553', '549', '534', '7.1000'], ['Ohio', '1609', '538', '548', '522', '17.2000'], ['Idaho', '1601', '543', '541', '517', '14.6000'], ['Montana', '1593', '538', '538', '517', '20.0000'], ['West Virginia', '1522', '515', '507', '500', '13.2000']]

您可以使用itertools.chain轻松获取元素,而不是一次生成整个列表(或者,如果您一次迭代一行,则更倾向于使用@martijn pieters的解决方案):

1
2
3
4
5
6
7
8
9
10
11
12
>>> from itertools import chain
>>> for elem in chain(*(x.split(",") for x in strs.splitlines())):
...     print elem
...    
States
 Total Score
 Critical Reading
 Mathematics
 Writing
 Participation (%)
Washington
...


不要一次读取整个文件,每行读取一次,然后拆分:

1
2
3
with open(filepath) as f:
    for line in f:
        print line.strip().split(',')

您还可以首先在换行符上拆分,然后在逗号上循环和拆分:

1
lines = [line.split(',') for line in somestring.splitlines()]

但对于逗号分隔的文件,最好使用csv模块:

1
2
3
4
5
6
import csv

with open(filepath, 'rb') as f:
    reader = csv.reader(f, delimiter=',')
    for row in reader:
        print row

这将为您提供以下行:

1
2
3
['States', ' Total Score', ' Critical Reading', ' Mathematics', ' Writing', ' Participation (%)']
['Washington', '1564', '524', '532', '508', '41.2000']
['NewHampshire', '1554', '520', '524', '510', '64.0000']

由于第一行带有标题,因此也可以使用DictReader并获取将标题映射到值的字典:

1
2
3
4
5
with open(filepath, 'rb') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
        print row
        # address columns as: row['States'], row['Total Score']

输出行为:

1
{' Writing': '508', ' Total Score': '1564', ' Critical Reading': '524', 'States': 'Washington', ' Mathematics': '532', ' Participation (%)': '41.2000'}


多字符拆分有re.split

1
2
3
4
5
6
import re
re.split("
|"
,"this is
a short
test..."
)
>>> ['this', 'is', 'a', 'short', 'test...']

您可以使用re函数中的split(),在该函数中可以定义用于拆分的regex

看看这个:基于正则表达式的python拆分字符串