Dictionary如何在Swift中使用Equatable协议?

How does Dictionary use the Equatable protocol in Swift?

为了解决这个问题,我一直在使用一个实现哈希协议的自定义结构。我试图了解根据填充Dictionary时是否存在哈希冲突,调用等价运算符重载(==的次数。

更新

@Matt编写了一个更清晰的自定义结构示例,该结构实现了哈希协议,并显示了调用hashValue==的频率。我正在复制下面的代码。要查看我的原始示例,请查看编辑历史记录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
struct S : Hashable {
    static func ==(lhs:S,rhs:S) -> Bool {
        print("called == for", lhs.id, rhs.id)
        return lhs.id == rhs.id
    }
    let id : Int
    var hashValue : Int {
        print("called hashValue for", self.id)
        return self.id
    }
    init(_ id:Int) {self.id = id}
}
var s = Set<S>()
for i in 1...5 {
    print("inserting", i)
    s.insert(S(i))
}

这将产生以下结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/*
inserting 1
called hashValue for 1
inserting 2
called hashValue for 2
called == for 1 2
called hashValue for 1
called hashValue for 2
inserting 3
called hashValue for 3
inserting 4
called hashValue for 4
called == for 3 4
called == for 1 4
called hashValue for 2
called hashValue for 3
called hashValue for 1
called hashValue for 4
called == for 3 4
called == for 1 4
inserting 5
called hashValue for 5
*/

由于hashable使用equatable来区分散列冲突(我假设无论如何),所以我希望只有在存在散列冲突时才调用func ==()。然而,在上面的@matt示例中根本没有哈希冲突,但是仍然在调用==。在我的其他强制散列冲突的实验中(参见这个问题的编辑历史),==似乎被称为随机次数。

这是怎么回事?


我在这里从bugs.swift.org复制我的答案。它讲的是集合,但细节同样适用于字典。

In hashed collections, collisions can occur whenever the number of buckets is smaller than the keyspace. When you're creating a new Set without specifying a minimum capacity, the set might have only one bucket, so when you're inserting the second element, a collision occurs. The insert method will then decide if the storage should be grown, using something called a load factor. If the storage was grown, the existing elements have to be migrated over to the new storage buffer. That's when you're seeing all those extra calls to hashValue when inserting 4.

The reason you're still seeing more calls to == than you would expect if the number of buckets is equal or higher to the number of elements has to do with an implementation detail of the bucket index calculation. The bits of the hashValue are mixed or"shuffled" before the modulo operation. This is to cut down on excessive collisions for types with bad hash algorithms.


好吧,这是你的答案:

https://bugs.swift.org/browse/sr-3330?focusedCommentID=19980&;page=com.atlassian.jira.plugin.system.issueTabbpanels:comment tabbpanel_comment-19980

What's actually happening:

  • We hash a value only once on insertion.
  • We don't use hashes for comparison of elements, only ==. Using hashes for comparison is only reasonable if you store the hashes, but
    that means more memory usage for every Dictionary. A compromise that
    needs evaluation.
  • We try to insert the element before evaluating if the Dictionary can fit that element. This is because the element might already be in the
    Dictionary, in which case we don't need any more capacity.
  • When we resize the Dictionary, we have to rehash everything, because we didn't store the hashes.

So what you're seeing is:

  • one hash of the search key
  • some =='s (searching for a space)
  • hashes of every element in the collection (resize)
  • one hash of the search key (actually totally wasteful, but not a big deal considering it only happens after an O?? reallocation)
  • some =='s (searching for a space in the new buffer)

我们都完全错了。他们根本不使用散列值-只使用==来决定这是否是一个不同的键。然后在集合增长的情况下进行第二轮调用。