how to extract a text file into a dictionary
我想知道如何用Python将文本提取到字典中。文本文件的格式如(见下文)所示,并以某种方式进行提取,例如,对象地球是关键点,其半径、周期和所有内容都在关键点内。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | RootObject: Sun Object: Sun Satellites: Mercury,Venus,Earth,Mars,Jupiter,Saturn,Uranus,Neptune,Ceres,Pluto,Haumea,Makemake,Eris Radius: 20890260 Orbital Radius: 0 Object: Earth Orbital Radius: 77098290 Period: 365.256363004 Radius: 6371000.0 Satellites: Moon Object: Moon Orbital Radius: 18128500 Radius: 1737000.10 Period: 27.321582 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | nk=""" RootObject: Sun Object: Sun Satellites: Mercury,Venus,Earth,Mars,Jupiter,Saturn,Uranus,Neptune,Ceres,Pluto,Haumea,Makemake,Eris Radius: 20890260 Orbital Radius: 0 Object: Earth Orbital Radius: 77098290 Period: 365.256363004 Radius: 6371000.0 Satellites: Moon Object: Moon Orbital Radius: 18128500 Radius: 1737000.10 Period: 27.321582 """ my_test_dict={} for x in nk.splitlines(): if ':' in x: if x.split(':')[0].strip()=='RootObject': root_obj=x.split(':')[1].strip() elif x.split(':')[0].strip()=='Object': my_test_dict[x.split(':')[1].strip()]={} current_dict=x.split(':')[1].strip() if x.split(':')[1].strip()!=root_obj: for x1 in my_test_dict: if 'Satellites' in my_test_dict[x1]: if x.split(':')[1].strip() in my_test_dict[x1]['Satellites'].split(','): my_test_dict[x.split(':')[1].strip()]['RootObject']=x1 else: my_test_dict[current_dict][x.split(':')[0].strip()]=x.split(':')[1].strip() print my_test_dict |
输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | { 'Sun': { 'Satellites': 'Mercury,Venus,Earth,Mars,Jupiter,Saturn,Uranus,Neptune,Ceres,Pluto,Haumea,Makemake,Eris', 'Orbital Radius': '0', 'Radius': '20890260' }, 'Moon': { 'Orbital Radius': '18128500', 'Radius': '1737000.10', 'Period': '27.321582', 'RootObject': 'Earth' }, 'Earth': { 'Satellites': 'Moon', 'Orbital Radius': '77098290', 'Radius': '6371000.0', 'Period': '365.256363004', 'RootObject': 'Sun' } } |
使用上面其中一个的修改,您将得到如下内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | def read_next_object(file): obj = {} for line in file: if not line.strip(): continue line = line.strip() key, val = line.split(":") if key in obj and key =="Object": yield obj obj = {} obj[key] = val yield obj planets = {} with open("test.txt", 'r') as f: for obj in read_next_object(f): planets[obj["Object"]] = obj print planets |
修正
1 | print planets["Sun"]["Radius"] |
应打印值EDOCX1[1]
上面的输出如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | { 'Earth': { 'Object': 'Earth', 'Orbital Radius': '77098290', 'Period': '365.256363004', 'Radius': '6371000.0', 'Satellites': 'Moon'}, 'Moon': { 'Object': 'Moon', 'Orbital Radius': '18128500', 'Period': '27.321582', 'Radius': '1737000.10'}, 'Sun': { 'Object': 'Sun', 'Orbital Radius': '0', 'Radius': '20890260', 'RootObject': 'Sun', 'Satellites': 'Mercury,Venus,Earth,Mars,Jupiter,Saturn,Uranus,Neptune,Ceres,Pluto,Haumea,Makemake,Eris'}} |
假设您希望使用逗号分隔值的元素作为列表,请尝试:
1 2 3 4 5 6 7 | mydict={} with open(my_file,'r') as the_file: for line in the_file: if not line.strip(): continue # skip blank lines key,val=line.split(":") val = val.split(",") mydict[key] = val if len(val) > 1 else val[0] |