python find index of the first duplicate within a list
我想遍历一个列表,这样我就可以找到索引号,列表中的第一个项在这里找到第一个匹配项。我的结果应该打印
我的意思是:
1 2 3 4 5 6 7 8 9 10 11 12 13 | .APT 5B APT 5B . .BUSINESS JOEY BUSINESS. . 1ST FL . . NATE JR SAM . . JOE 7 . . . .2ND FLR TOM 2ND FLR . .A1 2FL APT 71E . .APT E205 APT 1R . . CONSTRUCTION . .APT 640 APT 545. .PART1 SYNC PART2 . . NATE JR SAM . |
我遇到的问题是,即使在找到第一个匹配项之后,程序仍在向字典中添加项,因此附加了我想忽略/忽略的数据。
以下是我的资料:
1 2 3 4 5 6 7 8 9 10 11 12 13 | dictt = {} with open(path + 'sample33.txt', 'rb') as txtin: for line in txtin: part2 = line[1:29].split() uniq = [] print '%r' % part2 for key in part2: if key not in dictt: dictt[key] = key uniq.append(key) dictt = {} print ' '.join(uniq) |
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ['APT', '5B', 'APT', '5B'] APT 5B ['BUSINESS', 'JOEY', 'BUSINESS'] BUSINESS JOEY ['1ST', 'FL'] 1ST FL ['NATE', 'JR', 'SAM'] NATE JR SAM ['JOE', '7'] JOE 7 [] ['2ND', 'FLR', 'TOM', '2ND', 'FLR'] 2ND FLR TOM ['A1', '2FL', 'APT', '71E'] A1 2FL APT 71E ['APT', 'E205', 'APT', '1R'] APT E205 1R # Would like to stop adding items after first 'APT' match ['CONSTRUCTION'] CONSTRUCTION ['APT', '640', 'APT', '545'] APT 640 545 # same here... ['PART1', 'SYNC', 'PART2'] PART1 SYNC PART2 ['NATE', 'JR', 'SAM'] NATE JR SAM [Finished in 0.1s] |
我希望我已经正确地解释了这个问题,并且有人可以对它进行微调。
谢谢您
编辑第1页下面是我想要打印的示例:
1 2 | listt: ['APT', '640', 'APT', '1', '2', '3'] |
找到"apt"匹配,因此:
1 2 | print: APT 640 |
忽略
干得好:
1 2 3 4 5 6 7 | >>> f = open('your_file.txt') >>> for x in f: line = re.findall('\w+',x.strip()) print line try: print"" .join(line[:line[1:].index(line[0])+1]) except: print"".join(line) |
输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ['APT', '5B', 'APT', '5B'] APT 5B ['BUSINESS', 'JOEY', 'BUSINESS'] BUSINESS JOEY ['1ST', 'FL'] 1ST FL ['NATE', 'JR', 'SAM'] NATE JR SAM ['JOE', '7'] JOE 7 [] ['2ND', 'FLR', 'TOM', '2ND', 'FLR'] 2ND FLR TOM ['A1', '2FL', 'APT', '71E'] A1 2FL APT 71E ['APT', 'E205', 'APT', '1R'] APT E205 # not printing after match ['CONSTRUCTION'] CONSTRUCTION ['APT', '640', 'APT', '545'] APT 640 # not printing after match ['PART1', 'SYNC', 'PART2'] PART1 SYNC PART2 ['NATE', 'JR', 'SAM'] NATE JR SAM |
我不确定我完全理解你需要什么,但这可能有用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | def read_text(name_file, string): index_found = [0, 0] result = [0, 0] with open (name_file) as f: read_temp = [word for line in f for word in line.split()] for s in read_temp: if string in str(s): index_str = read_temp.index(s) index_found[0] = index_str index_found[1] = index_str + 1 result[0] = read_temp[index_found[0]] result[1] = read_temp[index_found[1]] return result os.chdir('Path to your .txt') result_list = read_text("your_file.txt","APT") #"APT" or whatever string you need to find. print result_list |
输出:
1 | ['APT', '5B'] |
如果你担心的是从你的列表中删除重复的条目,那么"set"就是用来拯救你的。
1 | uniqlist = list(set(dupelist)) |
我还应该提到另一篇文章提到了从列表中删除重复项的能力。
使用集合的python唯一列表