How to convert a list of strings into a dictionary with keys and values matching
我在文件中有这个列表:
阿拉巴马
4802982
9
阿拉斯加州
721523
3
亚利桑那
6412700
11
阿肯色州
2926229
6
加州
37341989
55
科罗拉多州
5044930
9
(除了它继续每个州)我需要创建一个字典,其中州名作为键,人口和选举人票(第一和第二个数字)作为值列表。
到目前为止这是我的功能:
1 2 3 4 5 6 7 8 9 10 11 | def make_elector_dictionary(file): dic = {} try: infile = open(file,'r') except IOError: print('file not found') else: for line in infile: line = line.strip() dic[line] = () print(dic) |
试试这个:
1 2 3 | s ="Alabama 4802982 9 Alaska 721523 3 Arizona 6412700 11 Arkansas 2926229 6 California 37341989 55 Colorado 5044930 9" l = s.split() dictionaryYouWant = {l[index]: [l[index+1], l[index+2]] for index in range(0, len(l), 3)} |
这给出了:
1 | {'Alabama': ['4802982', '9'], 'Alaska': ['721523', '3'], 'Arizona': ['6412700', '11'], 'Arkansas': ['2926229', '6'], 'California': ['37341989', '55'], 'Colorado': ['5044930', '9']} |
以下内容应该大致给出你想要的东西:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | def make_elector_dictionary(file): # Open and read the entire file try: with open(file,'r') as infile: raw_data = infile.read() except IOError: print('file not found') return # Split the text into an array, using a space as the separator between array elements raw_data = raw_data.split(' ') # Rearrange the data into a dictionary of dictionaries processed_data = {raw_data[i]: {'pop': int(raw_data[i+1]), 'electoral_votes': int(raw_data[i+2])} for i in range(0, len(raw_data), 3) } return processed_data print(make_elector_dictionary('data.txt')) |
这给出了:
1 | {'Arizona': {'pop': 6412700, 'electoral_votes': 11}, 'Arkansas': {'pop': 2926229, 'electoral_votes': 6}, 'California': {'pop': 37341989, 'electoral_votes': 55}, 'Colorado': {'pop': 5044930, 'electoral_votes': 9}, 'Alabama': {'pop': 4802982, 'electoral_votes': 9}, 'Alaska': {'pop': 721523, 'electoral_votes': 3}} |
或者你可以使用
1 2 | processed_data = {raw_data[i]: [int(raw_data[i+1]), int(raw_data[i+2])] for i in range(0, len(raw_data), 3) } |
如果您希望字典值是数组而不是字典。此方法是否有效取决于数据文件的详细信息。例如,如果"New Hampshire"在您的数据文件中写入"New"和"Hampshire"之间的空格,那么"New"将被函数解释为状态名称,当您尝试时将获得ValueError将"汉普郡"作为人口传递给int。在这种情况下,您必须使用一些更复杂的解析才能使其正常工作 - 正则表达式可能是最佳选择。你可以这样做:
1 2 | processed_data = {match[1]: [match[2], match[3]] for match in re.findall(r'(\W|^)([a-zA-z ]+)\s+(\d+)\s+(\d+)', raw_data)} |
记得