Map a function by key path in nested dict including slices, wildcards and ragged hierarchies
这个问题是基于这里和这里的扩展。
在嵌套dict中将函数映射到指定键路径的好方法是什么,包括以下路径规范:
如果这样做比较简单,可以假设只有dict是嵌套的,没有dict列表,因为前者可以通过
但是,层次结构可能是不规则的,例如:
1 2 3 4 | data = {0: {'a': 1, 'b': 2}, 1: {'a': 10, 'c': 13}, 2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, 3: {'a': 30, 'b': 31, 'c': {'d': 300}}} |
希望能够这样指定密钥路径:
1 | map_at(f, ['*',['b','c'],'d']) |
返回:
1 2 3 4 | {0: {'a': 1, 'b': 2}, 1: {'a': 10, 'c': 13}, 2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23}, 3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}} |
这里,
切片将被指定为,例如
我认为路径规范是明确的,尽管可以概括为,例如,匹配键路径前缀(在这种情况下,
这可以通过理解和递归来实现,还是需要大量的提升来捕获
请不要建议熊猫作为替代品。
我不太喜欢伪代码,但是在这种情况下,你需要写一个算法。以下是我对您要求的理解:
- 如果
data 是终端,那就是失败:我们没有匹配完整的path_pattern ?所以没有理由应用这个函数。只需返回data 。 - 否则,我们必须探索数据中的每一条路径。如果可能的话,我们会消耗
path_pattern 的头部。也就是返回一个dictdata key ->map_at(func, new_path, data value) ,其中new_path 是path_pattern 的tail ,如果键与head 匹配,则返回"路径"模式本身。
- 如果
data 是终端,返回func(data) 。 - 否则,找到叶子,用
func :返回一个dictdata key ->map_at(func, [], data value) 。
笔记:
- 我假设模式
*-b-d 与路径0-a-b-c-d-e 匹配; - 这是一个迫切需要的算法:路径的头部总是在可能的时候被消耗掉;
- 如果路径被完全占用,则应映射每个终端;
- 这是一个简单的DFS,因此我想用堆栈编写迭代版本是可能的。
代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | def map_at(func, path_pattern, data): def matches(pattern, value): try: return pattern == '*' or value == pattern or value in pattern except TypeError: # EDIT: avoid"break" in the dict comprehension if pattern is not a list. return False if path_pattern: head, *tail = path_pattern try: # try to consume head for each key of data return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()} except AttributeError: # fail: terminal data but path_pattern was not consumed return data else: # success: path_pattern is empty. try: # not a leaf: map every leaf of every path return {k: map_at(func, [], v) for k,v in data.items()} except AttributeError: # a leaf: map it return func(data) |
注意,
正如你所看到的,你永远无法摆脱第二种情况。:如果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | def map_all_leaves(func, data): """Apply func to all leaves""" try: return {k: map_all_leaves(func, v) for k,v in data.items()} except AttributeError: return func(data) def map_at(func, path_pattern, data): def matches(pattern, value): try: return pattern == '*' or value == pattern or value in pattern except TypeError: # EDIT: avoid"break" in the dict comprehension if pattern is not a list. return False if path_pattern: head, *tail = path_pattern try: # try to consume head for each key of data return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()} except AttributeError: # fail: terminal data but path_pattern is not consumed return data else: map_all_leaves(func, data) |
编辑
如果要处理列表,可以尝试以下操作:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | def map_at(func, path_pattern, data): def matches(pattern, value): try: return pattern == '*' or value == pattern or value in pattern except TypeError: # EDIT: avoid"break" in the dict comprehension if pattern is not a list. return False def get_items(data): try: return data.items() except AttributeError: try: return enumerate(data) except TypeError: raise if path_pattern: head, *tail = path_pattern try: # try to consume head for each key of data return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in get_items(data)} except TypeError: # fail: terminal data but path_pattern was not consumed return data else: # success: path_pattern is empty. try: # not a leaf: map every leaf of every path return {k: map_at(func, [], v) for k,v in get_items(data)} except TypeError: # a leaf: map it return func(data) |
想法很简单:
1 2 3 4 | >>> list(enumerate(['a', 'b'])) [(0, 'a'), (1, 'b')] >>> list({0:'a', 1:'b'}.items()) [(0, 'a'), (1, 'b')] |
因此,
缺点是列表在过程中转换为dict:
1 2 3 | >>> data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}] >>> map_at(type,['*',['b','c'],'d'],data2) {0: {'a': 1, 'b': 2}, 1: {'a': 10, 'c': 13}, 2: {'a': 20, 'b': {'d': <class 'int'>, 'e': 101}, 'c': 23}, 3: {'a': 30, 'b': 31, 'c': {'d': <class 'int'>}}} |
编辑
因为您在寻找类似于xpath的json,所以可以尝试https://pypi.org/project/jsonpath/或https://pypi.org/project/jsonpath rw/。(我没有测试那些libs)。
我想您可能会喜欢这个刷新的生成器实现-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | def select(sel = [], d = {}, res = []): # (base case: no selector) if not sel: yield (res, d) # (inductive: a selector) non-dict elif not isinstance(d, dict): return # (inductive: a selector, a dict) wildcard selector elif sel[0] == '*': for (k, v) in d.items(): yield from select \ ( sel[1:] , v , [*res, k] ) # (inductive: a selector, a dict) list selector elif isinstance(sel[0], list): for s in sel[0]: yield from select \ ( [s, *sel[1:]] , d , res ) # (inductive: a selector, a dict) single selector elif sel[0] in d: yield from select \ ( sel[1:] , d[sel[0]] , [*res, sel[0]] ) # (inductive: single selector not in dict) no match else: return |
它是这样工作的-
1 2 3 4 5 6 7 8 9 10 11 12 | data = \ { 0: { 'a': 1, 'b': 2 } , 1: { 'a': 10, 'c': 13 } , 2: { 'a': 20, 'b': { 'd': 100, 'e': 101 }, 'c': 23 } , 3: { 'a': 30, 'b': 31, 'c': { 'd': 300 } } } for (path, v) in select(['*',['b','c'],'d'], data): print(path, v) # [2, 'b', 'd'] 100 # [3, 'c', 'd'] 300 |
因为
1 2 3 4 5 6 7 8 9 | s = select(['*',['b','c'],'d'], data) work = lambda r: f"path: {r[0]}, value: {r[1]}" for x in map(work, s): print(x) # path: [2, 'b', 'd'], value: 100 # path: [3, 'c', 'd'], value: 300 |
这不是很简单,效率也不高,但它应该可以工作:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | def map_at(f,kp,d): return map_at0(f,kp,d,0) def slice_contains(s,i): # no negative-index support a=s.start or 0 return i>=a and (s.end is None or i<s.end) and\ not (i-a)%(s.step or 1) def map_at0(f,kp,d,i): if i==len(kp): return f(d) if not isinstance(d,dict): return d # no such path here ret={} p=kp[i] if isinstance(p,str) and p!='*': p=p, for j,(k,v) in enumerate(sorted(d.items())): if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p): v=map_at0(f,kp,v,i+1) ret[k]=v return ret |
请注意,这将复制它展开的每个字典(因为它与键路径匹配,即使没有进一步的键匹配,并且从不应用