python lxml iterparse() is skipping first event
我正在使用 python lxml 中的 iterparse() 来解析大型 XML 文件并获取相关数据。这工作得很好,除了第一次发生事件。未捕获第一个节点的数据。当我想获取标签"方式"(不在此代码片段中)时,也会发生同样的事情。为什么第一个事件元素没有被捕获?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | tree = etree.iterparse(state_file_xml, events=("start","end"),tag=('node')) context = iter(tree) event, root = context.next() nodes = {} for event, elem in context: if ((event == 'end') and (elem.tag == 'node')) : id = elem.get("id") lat = float(elem.get("lat")) lon = float(elem.get("lon")) nodes[id] = [lat,lon] |
我的 xml 文件如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | <?xml version="1.0" encoding="UTF-8"?> <osm version="0.6" generator="Overpass API 0.7.55.4 3079d8ea"> <note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note> <meta osm_base="2018-11-09T21:23:02Z"/> <way id="46916568"> <nd ref="286427634"/> <nd ref="3371562694"/> <nd ref="3371562693"/> <nd ref="1044837456"/> <nd ref="1299487829"/> <nd ref="1299487860"/> <nd ref="284132018"/> <tag k="highway" v="secondary"/> <tag k="lit" v="yes"/> <tag k="maxspeed" v="50"/> <tag k="name" v="Zürcherstrasse"/> <tag k="surface" v="asphalt"/> </way> <node id="30228243" lat="47.4030908" lon="8.4049015"/> <node id="283533527" lat="47.4016971" lon="8.4036696"/> <node id="284132018" lat="47.4034413" lon="8.4042634"/> <node id="286427571" lat="47.4037481" lon="8.4058661"/> <node id="286427634" lat="47.4043045" lon="8.4032429"/> <node id="318217124" lat="47.4044289" lon="8.4054211"/> <node id="428076175" lat="47.4027948" lon="8.4045078"/> <node id="460527594" lat="47.4027445" lon="8.4055605"/> <node id="460527973" lat="47.4029993" lon="8.4040697"/> <node id="984783907" lat="47.4027808" lon="8.4054934"/> |
1 2 3 4 5 6 7 8 | In [14]: tree = etree.iterparse(state_file_xml, events=("start","end"),tag=('node')) In [15]: context = iter(tree) In [16]: event, root = next(context) In [17]: root.attrib Out[17]: {'id': '30228243', 'lon': '8.4049015', 'lat': '47.4030908'} |
(我将
顺便说一句,
而且由于您只需要处理每个
1 2 3 4 5 6 7 8 9 10 11 | import lxml.etree as ET context = ET.iterparse(state_file_xml, events=("end",), tag=('node')) nodes = {} for event, elem in context: id = elem.get("id") lat = float(elem.get("lat")) lon = float(elem.get("lon")) nodes[id] = [lat,lon] |