How to “perfectly” override a dict?
我怎样才能使dict的子类尽可能"完美"?最终目标是要有一个简单的dict,其中的键是小写的。
似乎应该有一些我可以覆盖的小的原语集来完成这个工作,但是根据我所有的研究和尝试,情况似乎不是这样的:
如果我覆盖
__getitem__ /__setitem__ ,那么get /set 不起作用。我怎样才能让它们工作?当然,我不需要单独执行它们吗?我是否防止酸洗工作,是否需要执行
__setstate__ 等?我需要
repr 、update 和__init__ 吗?我应该使用可变映射吗(似乎不应该使用
UserDict )或者是DictMixin ?如果是这样,怎么办?医生们并不是很有启发性。
这是我的第一步,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | class arbitrary_dict(dict): """A dictionary that applies an arbitrary key-altering function before accessing the keys.""" def __keytransform__(self, key): return key # Overridden methods. List from # https://stackoverflow.com/questions/2390827/how-to-properly-subclass-dict def __init__(self, *args, **kwargs): self.update(*args, **kwargs) # Note: I'm using dict directly, since super(dict, self) doesn't work. # I'm not sure why, perhaps dict is not a new-style class. def __getitem__(self, key): return dict.__getitem__(self, self.__keytransform__(key)) def __setitem__(self, key, value): return dict.__setitem__(self, self.__keytransform__(key), value) def __delitem__(self, key): return dict.__delitem__(self, self.__keytransform__(key)) def __contains__(self, key): return dict.__contains__(self, self.__keytransform__(key)) class lcdict(arbitrary_dict): def __keytransform__(self, key): return str(key).lower() |
你可以用abcs编写一个类似dict的对象(抽象基类)。它甚至告诉你如果您错过了一个方法,那么下面是关闭ABC的最小版本。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import collections class TransformedDict(collections.MutableMapping): """A dictionary that applies an arbitrary key-altering function before accessing the keys""" def __init__(self, *args, **kwargs): self.store = dict() self.update(dict(*args, **kwargs)) # use the free update to set keys def __getitem__(self, key): return self.store[self.__keytransform__(key)] def __setitem__(self, key, value): self.store[self.__keytransform__(key)] = value def __delitem__(self, key): del self.store[self.__keytransform__(key)] def __iter__(self): return iter(self.store) def __len__(self): return len(self.store) def __keytransform__(self, key): return key |
你可以从ABC那里得到一些免费的方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | class MyTransformedDict(TransformedDict): def __keytransform__(self, key): return key.lower() s = MyTransformedDict([('Test', 'test')]) assert s.get('TEST') is s['test'] # free get assert 'TeSt' in s # free __contains__ # free setdefault, __eq__, and so on import pickle assert pickle.loads(pickle.dumps(s)) == s # works too since we just use a normal dict |
我不会直接将
How can I make as"perfect" a subclass of dict as possible?
The end goal is to have a simple dict in which the keys are lowercase.
Ok.
If I override
__getitem__ /__setitem__ , then get/set don't work. How
do I make them work? Surely I don't need to implement them
individually?Ok.
Am I preventing pickling from working, and do I need to implement
__setstate__ etc?Ok.
Do I need repr, update and
__init__ ?Ok.
Should I just use
mutablemapping (it seems one shouldn't useUserDict
orDictMixin )? If so, how? The docs aren't exactly enlightening.Ok.
接受的答案将是我的第一个方法,但由于它有一些问题,既然没有人讨论过替代方案,实际上是
对我来说,这似乎是一个相当简单的请求:好的。
How can I make as"perfect" a subclass of dict as possible?
The end goal is to have a simple dict in which the keys are lowercase.Ok.
接受的答案实际上不属于
1 2 | >>> isinstance(MyTransformedDict([('Test', 'test')]), dict) False |
理想情况下,任何类型检查代码都将测试我们期望的接口或抽象的基类,但是如果我们的数据对象被传递到正在测试
一个人可能会制造的其他诡辩:好的。
- 接受的答案也缺少classmethod:
fromkeys 。 接受的答案也有一个多余的
__dict__ ,因此占用了更多的内存空间:好的。1
2
3>>> s.foo = 'bar'
>>> s.__dict__
{'foo': 'bar', 'store': {'test': 'test'}}
实际子类
我们可以通过继承重用dict方法。我们所需要做的就是创建一个接口层,以确保密钥以小写形式(如果它们是字符串)传递到dict中。好的。
If I override
__getitem__ /__setitem__ , then get/set don't work. How do I make them work? Surely I don't need to implement them individually?Ok.
好吧,单独实现它们是这种方法的缺点,也是使用
首先,让我们考虑一下python 2和3之间的区别,创建一个singleton(
1 2 3 4 5 6 7 8 9 10 11 12 13 | from itertools import chain try: # Python 2 str_base = basestring items = 'iteritems' except NameError: # Python 3 str_base = str, bytes, bytearray items = 'items' _RaiseKeyError = object() # singleton for no-default behavior def ensure_lower(maybe_str): """dict keys can be any hashable object - only call lower if str""" return maybe_str.lower() if isinstance(maybe_str, str_base) else maybe_str |
现在我们实现了-我使用的是
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | class LowerDict(dict): # dicts take a mapping or iterable as their optional first argument __slots__ = () # no __dict__ - that would be redundant @staticmethod # because this doesn't make sense as a global function. def _process_args(mapping=(), **kwargs): if hasattr(mapping, items): mapping = getattr(mapping, items)() return ((ensure_lower(k), v) for k, v in chain(mapping, getattr(kwargs, items)())) def __init__(self, mapping=(), **kwargs): super(LowerDict, self).__init__(self._process_args(mapping, **kwargs)) def __getitem__(self, k): return super(LowerDict, self).__getitem__(ensure_lower(k)) def __setitem__(self, k, v): return super(LowerDict, self).__setitem__(ensure_lower(k), v) def __delitem__(self, k): return super(LowerDict, self).__delitem__(ensure_lower(k)) def get(self, k, default=None): return super(LowerDict, self).get(ensure_lower(k), default) def setdefault(self, k, default=None): return super(LowerDict, self).setdefault(ensure_lower(k), default) def pop(self, k, v=_RaiseKeyError): if v is _RaiseKeyError: return super(LowerDict, self).pop(ensure_lower(k)) return super(LowerDict, self).pop(ensure_lower(k), v) def update(self, mapping=(), **kwargs): super(LowerDict, self).update(self._process_args(mapping, **kwargs)) def __contains__(self, k): return super(LowerDict, self).__contains__(ensure_lower(k)) def copy(self): # don't delegate w/ super - dict.copy() -> dict :( return type(self)(self) @classmethod def fromkeys(cls, keys, v=None): return super(LowerDict, cls).fromkeys((ensure_lower(k) for k in keys), v) def __repr__(self): return '{0}({1})'.format(type(self).__name__, super(LowerDict, self).__repr__()) |
对于任何引用键的方法或特殊方法,我们都使用一个几乎是锅炉板的方法,但是,通过继承,我们可以免费获得方法:
(注意,在python2中不推荐使用
以下是一些用法:好的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | >>> ld = LowerDict(dict(foo='bar')) >>> ld['FOO'] 'bar' >>> ld['foo'] 'bar' >>> ld.pop('FoO') 'bar' >>> ld.setdefault('Foo') >>> ld {'foo': None} >>> ld.get('Bar') >>> ld.setdefault('Bar') >>> ld {'bar': None, 'foo': None} >>> ld.popitem() ('bar', None) |
Am I preventing pickling from working, and do I need to implement
__setstate__ etc?Ok.
酸洗
而dict子类pickles就很好了:好的。
1 2 3 4 5 6 7 8 9 | >>> import pickle >>> pickle.dumps(ld) b'\x80\x03c__main__ LowerDict q\x00)\x81q\x01X\x03\x00\x00\x00fooq\x02Ns.' >>> pickle.loads(pickle.dumps(ld)) {'foo': None} >>> type(pickle.loads(pickle.dumps(ld))) <class '__main__.LowerDict'> |
Do I need repr, update and
__init__ ?Ok.
我们定义了
1 2 | >>> ld # without __repr__ defined for the class, we get this {'foo': None} |
但是,编写一个
1 2 3 4 5 6 | >>> ld = LowerDict({}) >>> eval(repr(ld)) == ld True >>> ld = LowerDict(dict(a=1, b=2, c=3)) >>> eval(repr(ld)) == ld True |
你看,这正是我们重新创建等效对象所需要的——这可能会出现在我们的日志或回溯中:好的。
1 2 | >>> ld LowerDict({'a': 1, 'c': 3, 'b': 2}) |
结论
Should I just use
mutablemapping (it seems one shouldn't useUserDict
orDictMixin )? If so, how? The docs aren't exactly enlightening.Ok.
是的,这是另外几行代码,但它们的目的是全面的。我的第一个倾向是使用公认的答案,如果有问题的话,我会看一下我的答案——因为它有点复杂,没有ABC来帮助我正确的界面。好的。
在寻找性能方面,过早的优化将带来更大的复杂性。
我应该补充一下,有人推动将类似的字典放入
1 | my_dict[transform(key)] |
它应该更容易调试。好的。比较和对比
使用
在这两种方法中,我们都得到了一个自由的
- 子类化
MutableMapping 更简单,错误机会更少,但速度较慢,占用的内存更多(参见冗余dict),并使isinstance(x, dict) 失败。 - 子类化
dict 更快,占用更少的内存,并通过isinstance(x, dict) ,但实现起来更复杂。
哪个更完美?这取决于你对完美的定义。好的。好啊。
我的要求有点严格:
- 我必须保留大小写信息(字符串是显示给用户的文件的路径,但它是一个Windows应用程序,因此在内部所有操作都必须不区分大小写)
- 我需要尽可能小的密钥(它确实在内存性能上起到了一定的作用,从370中切掉了110 MB)。这意味着缓存小写版本的密钥不是一个选项。
- 我需要尽可能快地创建数据结构(这次再次改变了性能和速度)。我得用一个内置的
我最初的想法是用笨拙的路径类替换不区分大小写的Unicode子类,但是:
- 事实证明很难做到这一点——参见:Python中不区分大小写的字符串类
- 结果表明,显式的dict键处理使代码变得冗长、混乱和容易出错(结构在这里和那里传递,不清楚它们是否将citr实例作为键/元素,容易忘记加上EDOCX1[1]是丑陋的)
所以我终于写下了那个不区分大小写的口述,这要归功于@aaronhall编写的代码,它简化了10倍。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | class CIstr(unicode): """See https://stackoverflow.com/a/43122305/281545, especially for inlines""" __slots__ = () # does make a difference in memory performance #--Hash/Compare def __hash__(self): return hash(self.lower()) def __eq__(self, other): if isinstance(other, CIstr): return self.lower() == other.lower() return NotImplemented def __ne__(self, other): if isinstance(other, CIstr): return self.lower() != other.lower() return NotImplemented def __lt__(self, other): if isinstance(other, CIstr): return self.lower() < other.lower() return NotImplemented def __ge__(self, other): if isinstance(other, CIstr): return self.lower() >= other.lower() return NotImplemented def __gt__(self, other): if isinstance(other, CIstr): return self.lower() > other.lower() return NotImplemented def __le__(self, other): if isinstance(other, CIstr): return self.lower() <= other.lower() return NotImplemented #--repr def __repr__(self): return '{0}({1})'.format(type(self).__name__, super(CIstr, self).__repr__()) def _ci_str(maybe_str): """dict keys can be any hashable object - only call CIstr if str""" return CIstr(maybe_str) if isinstance(maybe_str, basestring) else maybe_str class LowerDict(dict): """Dictionary that transforms its keys to CIstr instances. Adapted from: https://stackoverflow.com/a/39375731/281545 """ __slots__ = () # no __dict__ - that would be redundant @staticmethod # because this doesn't make sense as a global function. def _process_args(mapping=(), **kwargs): if hasattr(mapping, 'iteritems'): mapping = getattr(mapping, 'iteritems')() return ((_ci_str(k), v) for k, v in chain(mapping, getattr(kwargs, 'iteritems')())) def __init__(self, mapping=(), **kwargs): # dicts take a mapping or iterable as their optional first argument super(LowerDict, self).__init__(self._process_args(mapping, **kwargs)) def __getitem__(self, k): return super(LowerDict, self).__getitem__(_ci_str(k)) def __setitem__(self, k, v): return super(LowerDict, self).__setitem__(_ci_str(k), v) def __delitem__(self, k): return super(LowerDict, self).__delitem__(_ci_str(k)) def copy(self): # don't delegate w/ super - dict.copy() -> dict :( return type(self)(self) def get(self, k, default=None): return super(LowerDict, self).get(_ci_str(k), default) def setdefault(self, k, default=None): return super(LowerDict, self).setdefault(_ci_str(k), default) __no_default = object() def pop(self, k, v=__no_default): if v is LowerDict.__no_default: # super will raise KeyError if no default and key does not exist return super(LowerDict, self).pop(_ci_str(k)) return super(LowerDict, self).pop(_ci_str(k), v) def update(self, mapping=(), **kwargs): super(LowerDict, self).update(self._process_args(mapping, **kwargs)) def __contains__(self, k): return super(LowerDict, self).__contains__(_ci_str(k)) @classmethod def fromkeys(cls, keys, v=None): return super(LowerDict, cls).fromkeys((_ci_str(k) for k in keys), v) def __repr__(self): return '{0}({1})'.format(type(self).__name__, super(LowerDict, self).__repr__()) |
隐式和显式仍然是一个问题,但一旦尘埃落定,重新命名属性/变量以开始CI(和一个大的胖文档注释解释CI代表不区分大小写)我认为是一个完美的解决方案-因为代码的读者必须充分意识到我们正在处理不区分大小写的底层数据结构。这将有望修复一些难以复制的错误,我怀疑归结为大小写敏感度。
欢迎评论/更正:)
你要做的就是
1 2 3 | class BatchCollection(dict): def __init__(self, *args, **kwargs): dict.__init__(*args, **kwargs) |
或
1 2 3 | class BatchCollection(dict): def __init__(self, inpt={}): super(BatchCollection, self).__init__(inpt) |
我个人使用的示例用法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ### EXAMPLE class BatchCollection(dict): def __init__(self, inpt={}): dict.__init__(*args, **kwargs) def __setitem__(self, key, item): if (isinstance(key, tuple) and len(key) == 2 and isinstance(item, collections.Iterable)): # self.__dict__[key] = item super(BatchCollection, self).__setitem__(key, item) else: raise Exception( "Valid key should be a tuple (database_name, table_name)" "and value should be iterable") |
注:仅在python3中测试
在尝试了这两个前两个建议之后,我已经为python 2.7确定了一个看起来阴暗的中间路线。也许3岁更理智,但对我来说:
1 2 3 4 5 6 | class MyDict(MutableMapping): # ... the few __methods__ that mutablemapping requires # and then this monstrosity @classmethod def __class__(cls): return dict |
我真的很讨厌,但似乎符合我的需要,这是:
- 可以覆盖
**my_dict 。- 如果您从
dict 继承,这将绕过您的代码。试试看。 - 这使得2在任何时候都不可接受,因为这在Python代码中很常见。
- 如果您从
- 伪装成
isinstance(my_dict, dict) 。- 排除了可变映射,因此1不够
- 我衷心地推荐1如果您不需要这个,它是简单的和可预测的
- 完全可控行为
- 所以我不能从以东继承
如果你需要把自己和别人区分开来,我个人会用这样的名字(尽管我会推荐更好的名字):
1 2 3 4 5 6 7 8 9 | def __am_i_me(self): return True @classmethod def __is_it_me(cls, other): try: return other.__am_i_me() except Exception: return False |
只要您只需要在内部识别自己,这样就很难因为python的名字munging而意外地调用
到目前为止,我没有任何抱怨,除了一个面目可憎的
作为证据:https://repl.it/repls/traumaticoughcockatoo
基本上:复制当前2选项,在每个方法中添加
1 2 3 | d = LowerDict() # prints"init", or whatever your print statement said print '------' splatted = dict(**d) # note that there are no prints here |
对于其他场景,您将看到类似的行为。假设您的fake-
这对