How to split on two items in a string?
本问题已经有最佳答案,请猛点这里访问。
使用
"
下面是我要拆分的字符串:
埃多克斯1〔2〕
您可以使用列表理解:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | >>> strs = 'States, Total Score, Critical Reading, Mathematics, Writing, Participation (%) Washington,1564,524,532,508,41.2000 NewHampshire,1554,520,524,510,64.0000 Massachusetts,1547,512,526,509,72.1000 Oregon,1546,523,524,499,37.1000 Vermont,1546,519,512,506,64.0000 Arizona,1544,519,525,500,22.4000 Connecticut,1536,509,514,513,71.2000 Alaska,1524,518,515,491,32.7000 Virginia,1521,512,512,497,56.0000 California,1517,501,516,500,37.5000 NewJersey,1506,495,514,497,69.0000 Maryland,1502,501,506,495,56.7000 NorthCarolina,1485,497,511,477,45.5000 RhodeIsland,1477,494,495,488,60.8000 Indiana,1476,494,505,477,52.0000 Florida,1473,496,498,479,44.7000 Pennsylvania,1473,492,501,480,62.3000 Nevada,1470,496,501,473,25.9000 Delaware,1469,493,495,481,59.2000 Texas,1462,484,505,473,41.5000 NewYork,1461,484,499,478,59.6000 Hawaii,1458,483,505,470,47.1000 Georgia,1453,488,490,475,46.5000 SouthCarolina,1447,484,495,468,40.7000 Maine,1389,468,467,454,87.1000 Iowa,1798,603,613,582,2.7000 Minnesota,1781,594,607,580,6.0000 Wisconsin,1778,595,604,579,3.8000 Missouri,1768,593,595,580,3.6000 Michigan,1766,585,605,576,3.8000 SouthDakota,1766,592,603,571,2.0000 Illinois,1762,585,600,577,4.6700 Kansas,1752,590,595,567,4.7000 Nebraska,1746,585,593,568,3.9000 NorthDakota,1733,580,594,559,3.4000 Kentucky,1713,575,575,563,5.0000 Tennessee,1712,576,571,565,6.4000 Colorado,1695,568,572,555,14.1000 Arkansas,1684,566,566,552,3.5000 Oklahoma,1684,569,568,547,3.8000 Wyoming,1683,570,567,546,3.6000 Utah,1674,568,559,547,4.5000 Mississippi,1666,566,548,552,2.2000 Louisiana,1652,555,550,547,4.0000 Alabama,1650,556,550,544,5.4000 NewMexico,1636,553,549,534,7.1000 Ohio,1609,538,548,522,17.2000 Idaho,1601,543,541,517,14.6000 Montana,1593,538,538,517,20.0000 West Virginia,1522,515,507,500,13.2000 ' >>> [ y for x in strs.splitlines() for y in x.split(",")] ['States', ' Total Score', ' Critical Reading', ' Mathematics', ' Writing', ' Participation (%)', 'Washington', '1564', '524', '532', '508', '41.2000', 'NewHampshire', '1554', '520', '524', '510', '64.0000', 'Massachusetts', '1547', '512', '526', '509', '72.1000', 'Oregon', '1546', '523', '524', '499', '37.1000', 'Vermont', '1546', '519', '512', '506', '64.0000', 'Arizona', '1544', '519', '525', '500', '22.4000', 'Connecticut', '1536', '509', '514', '513', '71.2000', 'Alaska', '1524', '518', '515', '491', '32.7000', 'Virginia', '1521', '512', '512', '497', '56.0000', 'California', '1517', '501', '516', '500', '37.5000', 'NewJersey', '1506', '495', '514', '497', '69.0000', 'Maryland', '1502', '501', '506', '495', '56.7000', 'NorthCarolina', '1485', '497', '511', '477', '45.5000', 'RhodeIsland', '1477', '494', '495', '488', '60.8000', 'Indiana', '1476', '494', '505', '477', '52.0000', 'Florida', '1473', '496', '498', '479', '44.7000', 'Pennsylvania', '1473', '492', '501', '480', '62.3000', 'Nevada', '1470', '496', '501', '473', '25.9000', 'Delaware', '1469', '493', '495', '481', '59.2000', 'Texas', '1462', '484', '505', '473', '41.5000', 'NewYork', '1461', '484', '499', '478', '59.6000', 'Hawaii', '1458', '483', '505', '470', '47.1000', 'Georgia', '1453', '488', '490', '475', '46.5000', 'SouthCarolina', '1447', '484', '495', '468', '40.7000', 'Maine', '1389', '468', '467', '454', '87.1000', 'Iowa', '1798', '603', '613', '582', '2.7000', 'Minnesota', '1781', '594', '607', '580', '6.0000', 'Wisconsin', '1778', '595', '604', '579', '3.8000', 'Missouri', '1768', '593', '595', '580', '3.6000', 'Michigan', '1766', '585', '605', '576', '3.8000', 'SouthDakota', '1766', '592', '603', '571', '2.0000', 'Illinois', '1762', '585', '600', '577', '4.6700', 'Kansas', '1752', '590', '595', '567', '4.7000', 'Nebraska', '1746', '585', '593', '568', '3.9000', 'NorthDakota', '1733', '580', '594', '559', '3.4000', 'Kentucky', '1713', '575', '575', '563', '5.0000', 'Tennessee', '1712', '576', '571', '565', '6.4000', 'Colorado', '1695', '568', '572', '555', '14.1000', 'Arkansas', '1684', '566', '566', '552', '3.5000', 'Oklahoma', '1684', '569', '568', '547', '3.8000', 'Wyoming', '1683', '570', '567', '546', '3.6000', 'Utah', '1674', '568', '559', '547', '4.5000', 'Mississippi', '1666', '566', '548', '552', '2.2000', 'Louisiana', '1652', '555', '550', '547', '4.0000', 'Alabama', '1650', '556', '550', '544', '5.4000', 'NewMexico', '1636', '553', '549', '534', '7.1000', 'Ohio', '1609', '538', '548', '522', '17.2000', 'Idaho', '1601', '543', '541', '517', '14.6000', 'Montana', '1593', '538', '538', '517', '20.0000', 'West Virginia', '1522', '515', '507', '500', '13.2000'] |
。
如果需要包含在
1 2 | >>> [x.split(",") for x in strs.splitlines()] [['States', ' Total Score', ' Critical Reading', ' Mathematics', ' Writing', ' Participation (%)'], ['Washington', '1564', '524', '532', '508', '41.2000'], ['NewHampshire', '1554', '520', '524', '510', '64.0000'], ['Massachusetts', '1547', '512', '526', '509', '72.1000'], ['Oregon', '1546', '523', '524', '499', '37.1000'], ['Vermont', '1546', '519', '512', '506', '64.0000'], ['Arizona', '1544', '519', '525', '500', '22.4000'], ['Connecticut', '1536', '509', '514', '513', '71.2000'], ['Alaska', '1524', '518', '515', '491', '32.7000'], ['Virginia', '1521', '512', '512', '497', '56.0000'], ['California', '1517', '501', '516', '500', '37.5000'], ['NewJersey', '1506', '495', '514', '497', '69.0000'], ['Maryland', '1502', '501', '506', '495', '56.7000'], ['NorthCarolina', '1485', '497', '511', '477', '45.5000'], ['RhodeIsland', '1477', '494', '495', '488', '60.8000'], ['Indiana', '1476', '494', '505', '477', '52.0000'], ['Florida', '1473', '496', '498', '479', '44.7000'], ['Pennsylvania', '1473', '492', '501', '480', '62.3000'], ['Nevada', '1470', '496', '501', '473', '25.9000'], ['Delaware', '1469', '493', '495', '481', '59.2000'], ['Texas', '1462', '484', '505', '473', '41.5000'], ['NewYork', '1461', '484', '499', '478', '59.6000'], ['Hawaii', '1458', '483', '505', '470', '47.1000'], ['Georgia', '1453', '488', '490', '475', '46.5000'], ['SouthCarolina', '1447', '484', '495', '468', '40.7000'], ['Maine', '1389', '468', '467', '454', '87.1000'], ['Iowa', '1798', '603', '613', '582', '2.7000'], ['Minnesota', '1781', '594', '607', '580', '6.0000'], ['Wisconsin', '1778', '595', '604', '579', '3.8000'], ['Missouri', '1768', '593', '595', '580', '3.6000'], ['Michigan', '1766', '585', '605', '576', '3.8000'], ['SouthDakota', '1766', '592', '603', '571', '2.0000'], ['Illinois', '1762', '585', '600', '577', '4.6700'], ['Kansas', '1752', '590', '595', '567', '4.7000'], ['Nebraska', '1746', '585', '593', '568', '3.9000'], ['NorthDakota', '1733', '580', '594', '559', '3.4000'], ['Kentucky', '1713', '575', '575', '563', '5.0000'], ['Tennessee', '1712', '576', '571', '565', '6.4000'], ['Colorado', '1695', '568', '572', '555', '14.1000'], ['Arkansas', '1684', '566', '566', '552', '3.5000'], ['Oklahoma', '1684', '569', '568', '547', '3.8000'], ['Wyoming', '1683', '570', '567', '546', '3.6000'], ['Utah', '1674', '568', '559', '547', '4.5000'], ['Mississippi', '1666', '566', '548', '552', '2.2000'], ['Louisiana', '1652', '555', '550', '547', '4.0000'], ['Alabama', '1650', '556', '550', '544', '5.4000'], ['NewMexico', '1636', '553', '549', '534', '7.1000'], ['Ohio', '1609', '538', '548', '522', '17.2000'], ['Idaho', '1601', '543', '541', '517', '14.6000'], ['Montana', '1593', '538', '538', '517', '20.0000'], ['West Virginia', '1522', '515', '507', '500', '13.2000']] |
您可以使用
1 2 3 4 5 6 7 8 9 10 11 12 | >>> from itertools import chain >>> for elem in chain(*(x.split(",") for x in strs.splitlines())): ... print elem ... States Total Score Critical Reading Mathematics Writing Participation (%) Washington ... |
。
不要一次读取整个文件,每行读取一次,然后拆分:
1 2 3 | with open(filepath) as f: for line in f: print line.strip().split(',') |
您还可以首先在换行符上拆分,然后在逗号上循环和拆分:
1 | lines = [line.split(',') for line in somestring.splitlines()] |
号
但对于逗号分隔的文件,最好使用
1 2 3 4 5 6 | import csv with open(filepath, 'rb') as f: reader = csv.reader(f, delimiter=',') for row in reader: print row |
这将为您提供以下行:
1 2 3 | ['States', ' Total Score', ' Critical Reading', ' Mathematics', ' Writing', ' Participation (%)'] ['Washington', '1564', '524', '532', '508', '41.2000'] ['NewHampshire', '1554', '520', '524', '510', '64.0000'] |
。
由于第一行带有标题,因此也可以使用
1 2 3 4 5 | with open(filepath, 'rb') as f: reader = csv.DictReader(f, delimiter=',') for row in reader: print row # address columns as: row['States'], row['Total Score'] |
输出行为:
1 | {' Writing': '508', ' Total Score': '1564', ' Critical Reading': '524', 'States': 'Washington', ' Mathematics': '532', ' Participation (%)': '41.2000'} |
。
多字符拆分有
1 2 3 4 5 6 | import re re.split(" |","this is a short test...") >>> ['this', 'is', 'a', 'short', 'test...'] |
您可以使用re函数中的split(),在该函数中可以定义用于拆分的regex
看看这个:基于正则表达式的python拆分字符串