What is the best algorithm for an overridden System.Object.GetHashCode?
在.NET中,
我通常采用类似于Josh Bloch出色的Java的实现方式。它的速度很快,并且创建了一个相当好的哈希,不太可能导致冲突。选择两个不同的质数,例如17和23,然后执行以下操作:
1 2 3 4 5 6 7 8 9 10 11 12 | public override int GetHashCode() { unchecked // Overflow is fine, just wrap { int hash = 17; // Suitable nullity checks etc, of course :) hash = hash * 23 + field1.GetHashCode(); hash = hash * 23 + field2.GetHashCode(); hash = hash * 23 + field3.GetHashCode(); return hash; } } |
正如注释中所指出的,您可能会发现最好选择一个大素数来乘。显然486187739是好的…虽然我看到的大多数小数字的例子都倾向于使用素数,但至少有类似的算法经常使用非素数。例如,在后面不完全是fnv的例子中,我使用的数字显然工作得很好,但初始值不是质数。(乘法常数是素数。我不知道这有多重要。)
这比
1 2 | XorHash(x, x) == XorHash(y, y) == 0 for all x, y XorHash(x, y) == XorHash(y, x) for all x, y |
顺便说一下,早期的算法是C编译器目前用于匿名类型的算法。
这个页面提供了很多选项。我认为在大多数情况下,以上这些都是"足够好的",而且非常容易记住和纠正。Fnv替代方法同样简单,但使用不同的常量和
1 2 3 4 5 6 7 8 9 10 11 12 13 | // Note: Not quite FNV! public override int GetHashCode() { unchecked // Overflow is fine, just wrap { int hash = (int) 2166136261; // Suitable nullity checks etc, of course :) hash = (hash * 16777619) ^ field1.GetHashCode(); hash = (hash * 16777619) ^ field2.GetHashCode(); hash = (hash * 16777619) ^ field3.GetHashCode(); return hash; } } |
请注意,需要注意的一点是,理想情况下,在将相等敏感(因此对哈希代码敏感)状态添加到依赖哈希代码的集合后,应该防止其发生更改。
根据文件:
You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:
- You can compute the hash code from fields that are not mutable; or
- You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.
匿名类型
Microsoft已经提供了一个好的通用哈希代码生成器:只需将属性/字段值复制到匿名类型并对其进行哈希:
1 | new { PropA, PropB, PropC, PropD }.GetHashCode(); |
这对任何数量的属性都有效。它不使用拳击。它只使用已经在匿名类型框架中实现的算法。
valuetuple-更新c_7正如@cactuaroid在注释中提到的,可以使用值元组。这节省了一些击键,更重要的是只在堆栈上执行(没有垃圾):
1 | (PropA, PropB, PropC, PropD).GetHashCode(); |
(注意:使用匿名类型的原始技术似乎在堆上创建了一个对象,即垃圾,因为匿名类型是作为类实现的,尽管编译器可能会对此进行优化。对这些选项进行基准测试是很有意思的,但是tuple选项应该是更好的。)
这是我的哈希代码助手。它的优点是它使用泛型类型参数,因此不会导致装箱:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | public static class HashHelper { public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2) { unchecked { return 31 * arg1.GetHashCode() + arg2.GetHashCode(); } } public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3) { unchecked { int hash = arg1.GetHashCode(); hash = 31 * hash + arg2.GetHashCode(); return 31 * hash + arg3.GetHashCode(); } } public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4) { unchecked { int hash = arg1.GetHashCode(); hash = 31 * hash + arg2.GetHashCode(); hash = 31 * hash + arg3.GetHashCode(); return 31 * hash + arg4.GetHashCode(); } } public static int GetHashCode<T>(T[] list) { unchecked { int hash = 0; foreach (var item in list) { hash = 31 * hash + item.GetHashCode(); } return hash; } } public static int GetHashCode<T>(IEnumerable<T> list) { unchecked { int hash = 0; foreach (var item in list) { hash = 31 * hash + item.GetHashCode(); } return hash; } } /// <summary> /// Gets a hashcode for a collection for that the order of items /// does not matter. /// So {1, 2, 3} and {3, 2, 1} will get same hash code. /// </summary> public static int GetHashCodeForOrderNoMatterCollection<T>( IEnumerable<T> list) { unchecked { int hash = 0; int count = 0; foreach (var item in list) { hash += item.GetHashCode(); count++; } return 31 * hash + count.GetHashCode(); } } /// <summary> /// Alternative way to get a hashcode is to use a fluent /// interface like this:<br /> /// return 0.CombineHashCode(field1).CombineHashCode(field2). /// CombineHashCode(field3); /// </summary> public static int CombineHashCode<T>(this int hashCode, T arg) { unchecked { return 31 * hashCode + arg.GetHashCode(); } } |
它还有一个扩展方法来提供一个流畅的接口,因此您可以这样使用它:
1 2 3 4 | public override int GetHashCode() { return HashHelper.GetHashCode(Manufacturer, PartN, Quantity); } |
或者像这样:
1 2 3 4 5 6 | public override int GetHashCode() { return 0.CombineHashCode(Manufacturer) .CombineHashCode(PartN) .CombineHashCode(Quantity); } |
我在助手库中有一个哈希类,我将它用于此目的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | /// <summary> /// This is a simple hashing function from Robert Sedgwicks Hashing in C book. /// Also, some simple optimizations to the algorithm in order to speed up /// its hashing process have been added. from: www.partow.net /// </summary> /// <param name="input">array of objects, parameters combination that you need /// to get a unique hash code for them</param> /// <returns>Hash code</returns> public static int RSHash(params object[] input) { const int b = 378551; int a = 63689; int hash = 0; // If it overflows then just wrap around unchecked { for (int i = 0; i < input.Length; i++) { if (input[i] != null) { hash = hash * a + input[i].GetHashCode(); a = a * b; } } } return hash; } |
然后,您可以简单地将其用作:
1 2 3 4 | public override int GetHashCode() { return Hashing.RSHash(_field1, _field2, _field3); } |
我没有评估它的性能,所以欢迎任何反馈。
这是我的助手类,使用了jon skeet的实现。
1 2 3 4 5 6 7 8 9 10 | public static class HashCode { public const int Start = 17; public static int Hash<T>(this int hash, T obj) { var h = EqualityComparer<T>.Default.GetHashCode(obj); return unchecked((hash * 31) + h); } } |
用途:
1 2 3 4 5 6 7 | public override int GetHashCode() { return HashCode.Start .Hash(_field1) .Hash(_field2) .Hash(_field3); } |
如果要避免为System.Int32编写扩展方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | public struct HashCode { private readonly int _value; public HashCode(int value) => _value = value; public static HashCode Start { get; } = new HashCode(17); public static implicit operator int(HashCode hash) => hash._value; public HashCode Hash<T>(T obj) { var h = EqualityComparer<T>.Default.GetHashCode(obj); return unchecked(new HashCode((_value * 31) + h)); } public override int GetHashCode() => _value; } |
它仍然是通用的,它仍然避免任何堆分配,并且使用的方式完全相同:
1 2 3 4 5 6 7 8 9 | public override int GetHashCode() { // This time `HashCode.Start` is not an `Int32`, it's a `HashCode` instance. // And the result is implicitly converted to `Int32`. return HashCode.Start .Hash(_field1) .Hash(_field2) .Hash(_field3); } |
马丁评论后更新:
- 请参阅有关默认比较器性能的此答案。
- 有关空值的哈希代码的讨论,请参见此问题。
编辑(2018年5月):
在大多数情况下,equals()比较多个字段时,如果gethash()在一个字段或多个字段上散列,实际上并不重要。您只需确保计算散列值是非常便宜的(请不要分配)和快速的(没有繁重的计算,当然也没有数据库连接),并提供良好的分布。
重要的提升应该是equals()方法的一部分;哈希应该是一个非常便宜的操作,以便能够对尽可能少的项调用equals()。
最后一点提示:不要依赖getHashCode()在多次应用程序运行中保持稳定。许多.NET类型不保证它们的哈希代码在重新启动后保持不变,因此您应该只对内存中的数据结构使用getHashCode()的值。
直到最近,我的答案都非常接近乔恩·斯基特的答案。然而,我最近启动了一个项目,它使用了两个哈希表的幂,即内部表的大小为8、16、32等的哈希表。有一个很好的理由支持素数大小,但是两个大小的幂也有一些优势。好的。
而且很糟糕。因此,经过一些实验和研究,我开始用以下方法重新散列散列:好的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | public static int ReHash(int source) { unchecked { ulong c = 0xDEADBEEFDEADBEEF + (ulong)source; ulong d = 0xE2ADBEEFDEADBEEF ^ c; ulong a = d += c = c << 15 | c >> -15; ulong b = a += d = d << 52 | d >> -52; c ^= b += a = a << 26 | a >> -26; d ^= c += b = b << 51 | b >> -51; a ^= d += c = c << 28 | c >> -28; b ^= a += d = d << 9 | d >> -9; c ^= b += a = a << 47 | a >> -47; d ^= c += b << 54 | b >> -54; a ^= d += c << 32 | c >> 32; a += d << 25 | d >> -25; return (int)(a >> 1); } } |
然后我的两个哈希表的能力不再吸了。好的。
但这让我很不安,因为上面的内容不应该奏效。或者更准确地说,除非最初的
重新混合hashcode不能改善一个好的hashcode,因为唯一可能的效果是我们引入了更多的冲突。好的。
重新混合哈希代码不能改善糟糕的哈希代码,因为唯一可能的效果是我们将值53上的大量冲突更改为值183487291的大量冲突。好的。
重新混合哈希代码只能改善哈希代码,它至少在避免整个范围内的绝对冲突(232个可能的值)方面做得相当好,但在避免模块为哈希表中的实际使用而停机时会很糟糕。虽然两个表的幂模的简单化使这一点更加明显,但它对更常见的素数表也有负面影响,这并不是很明显(在重设中的额外工作会超过好处,但好处仍然存在)。好的。
编辑:我还使用了开放式寻址,这也增加了对冲突的敏感度,也许比它是二的力量更重要。好的。
好吧,令人不安的是.NET(或本文的研究)中的
我以前编写的所有gethashcode()实现,实际上都是作为这个站点上答案的基础,比我通过的要糟糕得多。很多时候,它"足够好"用于很多用途,但我想要更好的东西。好的。
所以我把这个项目放到一边(不管怎么说,这是一个宠物项目),并开始研究如何在.NET中快速生成一个良好的、分布良好的哈希代码。好的。
最后我决定把spokyhash移植到.net上。实际上,上面的代码是一个使用spokyhash从32位输入生成32位输出的快速路径版本。好的。
现在,spokyhash不是一个很好的快速记忆代码。我的港口更是如此,因为我手上有很多这样更好的速度*。但这就是代码重用的目的。好的。
然后我把这个项目放到一边,因为正如最初的项目产生了如何生成更好的哈希代码的问题,所以这个项目产生了如何生成更好的.NET memcpy的问题。好的。
然后我回来,生成了大量的重载,可以轻松地填充几乎所有本地类型(除了
它很快,因为我移植的原始代码更快,所以鲍勃·詹金斯应该获得大部分荣誉,尤其是在算法优化的64位机器上。.好的。
完整的代码可以在https://bitback.org/jonhana/spookilyshap/src上看到,但是考虑到上面的代码是它的简化版本。好的。
然而,由于它现在已经写好了,人们可以更容易地利用它:好的。
1 2 3 4 5 6 7 8 | public override int GetHashCode() { var hash = new SpookyHash(); hash.Update(field1); hash.Update(field2); hash.Update(field3); return hash.Final().GetHashCode(); } |
它还采用种子值,因此,如果需要处理不受信任的输入,并希望防止哈希DoS攻击,则可以基于正常运行时间或类似时间设置种子,并使攻击者无法预测结果:好的。
1 2 3 4 5 6 7 8 9 10 11 12 13 | private static long hashSeed0 = Environment.TickCount; private static long hashSeed1 = DateTime.Now.Ticks; public override int GetHashCode() { //produce different hashes ever time this application is restarted //but remain consistent in each run, so attackers have a harder time //DoSing the hash tables. var hash = new SpookyHash(hashSeed0, hashSeed1); hash.Update(field1); hash.Update(field2); hash.Update(field3); return hash.Final().GetHashCode(); } |
*这方面的一个大惊喜是,手工引入了一个旋转方法,返回了
?从.NET的角度来看,
?通过比较。如果在字符串上使用,64位上的spookyhash比32位上的
这是一个很好的例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | /// <summary> /// Helper class for generating hash codes suitable /// for use in hashing algorithms and data structures like a hash table. /// </summary> public static class HashCodeHelper { private static int GetHashCodeInternal(int key1, int key2) { unchecked { var num = 0x7e53a269; num = (-1521134295 * num) + key1; num += (num << 10); num ^= (num >> 6); num = ((-1521134295 * num) + key2); num += (num << 10); num ^= (num >> 6); return num; } } /// <summary> /// Returns a hash code for the specified objects /// </summary> /// <param name="arr">An array of objects used for generating the /// hash code.</param> /// <returns> /// A hash code, suitable for use in hashing algorithms and data /// structures like a hash table. /// </returns> public static int GetHashCode(params object[] arr) { int hash = 0; foreach (var item in arr) hash = GetHashCodeInternal(hash, item.GetHashCode()); return hash; } /// <summary> /// Returns a hash code for the specified objects /// </summary> /// <param name="obj1">The first object.</param> /// <param name="obj2">The second object.</param> /// <param name="obj3">The third object.</param> /// <param name="obj4">The fourth object.</param> /// <returns> /// A hash code, suitable for use in hashing algorithms and /// data structures like a hash table. /// </returns> public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3, T4 obj4) { return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4)); } /// <summary> /// Returns a hash code for the specified objects /// </summary> /// <param name="obj1">The first object.</param> /// <param name="obj2">The second object.</param> /// <param name="obj3">The third object.</param> /// <returns> /// A hash code, suitable for use in hashing algorithms and data /// structures like a hash table. /// </returns> public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3) { return GetHashCode(obj1, GetHashCode(obj2, obj3)); } /// <summary> /// Returns a hash code for the specified objects /// </summary> /// <param name="obj1">The first object.</param> /// <param name="obj2">The second object.</param> /// <returns> /// A hash code, suitable for use in hashing algorithms and data /// structures like a hash table. /// </returns> public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2) { return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode()); } } |
下面是如何使用它:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | private struct Key { private Type _type; private string _field; public Type Type { get { return _type; } } public string Field { get { return _field; } } public Key(Type type, string field) { _type = type; _field = field; } public override int GetHashCode() { return HashCodeHelper.GetHashCode(_field, _type); } public override bool Equals(object obj) { if (!(obj is Key)) return false; var tf = (Key)obj; return tf._field.Equals(_field) && tf._type.Equals(_type); } } |
这里是jon skeet发布的上述算法的另一个流畅的实现,但不包括分配或装箱操作:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | public static class Hash { public const int Base = 17; public static int HashObject(this int hash, object obj) { unchecked { return hash * 23 + (obj == null ? 0 : obj.GetHashCode()); } } public static int HashValue<T>(this int hash, T value) where T : struct { unchecked { return hash * 23 + value.GetHashCode(); } } } |
用途:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | public class MyType<T> { public string Name { get; set; } public string Description { get; set; } public int Value { get; set; } public IEnumerable<T> Children { get; set; } public override int GetHashCode() { return Hash.Base .HashObject(this.Name) .HashObject(this.Description) .HashValue(this.Value) .HashObject(this.Children); } } |
编译器将确保不会由于泛型类型约束而使用类调用
这是我的简单方法。我使用的是经典的构建器模式。它是类型安全的(没有装箱/拆箱),也兼容.NET 2.0(没有扩展方法等)。
使用方法如下:
1 2 3 4 5 6 | public override int GetHashCode() { HashBuilder b = new HashBuilder(); b.AddItems(this.member1, this.member2, this.member3); return b.Result; } |
这里是Acutal Builder类:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | internal class HashBuilder { private const int Prime1 = 17; private const int Prime2 = 23; private int result = Prime1; public HashBuilder() { } public HashBuilder(int startHash) { this.result = startHash; } public int Result { get { return this.result; } } public void AddItem<T>(T item) { unchecked { this.result = this.result * Prime2 + item.GetHashCode(); } } public void AddItems<T1, T2>(T1 item1, T2 item2) { this.AddItem(item1); this.AddItem(item2); } public void AddItems<T1, T2, T3>(T1 item1, T2 item2, T3 item3) { this.AddItem(item1); this.AddItem(item2); this.AddItem(item3); } public void AddItems<T1, T2, T3, T4>(T1 item1, T2 item2, T3 item3, T4 item4) { this.AddItem(item1); this.AddItem(item2); this.AddItem(item3); this.AddItem(item4); } public void AddItems<T1, T2, T3, T4, T5>(T1 item1, T2 item2, T3 item3, T4 item4, T5 item5) { this.AddItem(item1); this.AddItem(item2); this.AddItem(item3); this.AddItem(item4); this.AddItem(item5); } public void AddItems<T>(params T[] items) { foreach (T item in items) { this.AddItem(item); } } } |
从https://github.com/dotnet/corecrl/pull/14863开始,有一种生成哈希代码的新方法非常简单!只写
1 2 | public override int GetHashCode() => HashCode.Combine(field1, field2, field3); |
这将生成一个高质量的哈希代码,而不必担心实现细节。
resharper用户可以使用
1 2 3 4 5 6 7 8 9 10 11 | // ReSharper's GetHashCode looks like this public override int GetHashCode() { unchecked { int hashCode = Id; hashCode = (hashCode * 397) ^ IntMember; hashCode = (hashCode * 397) ^ OtherIntMember; hashCode = (hashCode * 397) ^ (RefMember != null ? RefMember.GetHashCode() : 0); // ... return hashCode; } } |
这和夜间编码器的解决方案非常相似,只是如果你想提高素数更容易。
PS:这是你嘴里吐了一点东西的时候,你知道这个方法可以重构成一个9默认值的方法,但是速度会变慢,所以你闭上眼睛,试着忘记它。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | /// <summary> /// Try not to look at the source code. It works. Just rely on it. /// </summary> public static class HashHelper { private const int PrimeOne = 17; private const int PrimeTwo = 23; public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9, T10>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10) { unchecked { int hash = PrimeOne; hash = hash * PrimeTwo + arg1.GetHashCode(); hash = hash * PrimeTwo + arg2.GetHashCode(); hash = hash * PrimeTwo + arg3.GetHashCode(); hash = hash * PrimeTwo + arg4.GetHashCode(); hash = hash * PrimeTwo + arg5.GetHashCode(); hash = hash * PrimeTwo + arg6.GetHashCode(); hash = hash * PrimeTwo + arg7.GetHashCode(); hash = hash * PrimeTwo + arg8.GetHashCode(); hash = hash * PrimeTwo + arg9.GetHashCode(); hash = hash * PrimeTwo + arg10.GetHashCode(); return hash; } } public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9) { unchecked { int hash = PrimeOne; hash = hash * PrimeTwo + arg1.GetHashCode(); hash = hash * PrimeTwo + arg2.GetHashCode(); hash = hash * PrimeTwo + arg3.GetHashCode(); hash = hash * PrimeTwo + arg4.GetHashCode(); hash = hash * PrimeTwo + arg5.GetHashCode(); hash = hash * PrimeTwo + arg6.GetHashCode(); hash = hash * PrimeTwo + arg7.GetHashCode(); hash = hash * PrimeTwo + arg8.GetHashCode(); hash = hash * PrimeTwo + arg9.GetHashCode(); return hash; } } public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8) { unchecked { int hash = PrimeOne; hash = hash * PrimeTwo + arg1.GetHashCode(); hash = hash * PrimeTwo + arg2.GetHashCode(); hash = hash * PrimeTwo + arg3.GetHashCode(); hash = hash * PrimeTwo + arg4.GetHashCode(); hash = hash * PrimeTwo + arg5.GetHashCode(); hash = hash * PrimeTwo + arg6.GetHashCode(); hash = hash * PrimeTwo + arg7.GetHashCode(); hash = hash * PrimeTwo + arg8.GetHashCode(); return hash; } } public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7) { unchecked { int hash = PrimeOne; hash = hash * PrimeTwo + arg1.GetHashCode(); hash = hash * PrimeTwo + arg2.GetHashCode(); hash = hash * PrimeTwo + arg3.GetHashCode(); hash = hash * PrimeTwo + arg4.GetHashCode(); hash = hash * PrimeTwo + arg5.GetHashCode(); hash = hash * PrimeTwo + arg6.GetHashCode(); hash = hash * PrimeTwo + arg7.GetHashCode(); return hash; } } public static int GetHashCode<T1, T2, T3, T4, T5, T6>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6) { unchecked { int hash = PrimeOne; hash = hash * PrimeTwo + arg1.GetHashCode(); hash = hash * PrimeTwo + arg2.GetHashCode(); hash = hash * PrimeTwo + arg3.GetHashCode(); hash = hash * PrimeTwo + arg4.GetHashCode(); hash = hash * PrimeTwo + arg5.GetHashCode(); hash = hash * PrimeTwo + arg6.GetHashCode(); return hash; } } public static int GetHashCode<T1, T2, T3, T4, T5>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5) { unchecked { int hash = PrimeOne; hash = hash * PrimeTwo + arg1.GetHashCode(); hash = hash * PrimeTwo + arg2.GetHashCode(); hash = hash * PrimeTwo + arg3.GetHashCode(); hash = hash * PrimeTwo + arg4.GetHashCode(); hash = hash * PrimeTwo + arg5.GetHashCode(); return hash; } } public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4) { unchecked { int hash = PrimeOne; hash = hash * PrimeTwo + arg1.GetHashCode(); hash = hash * PrimeTwo + arg2.GetHashCode(); hash = hash * PrimeTwo + arg3.GetHashCode(); hash = hash * PrimeTwo + arg4.GetHashCode(); return hash; } } public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3) { unchecked { int hash = PrimeOne; hash = hash * PrimeTwo + arg1.GetHashCode(); hash = hash * PrimeTwo + arg2.GetHashCode(); hash = hash * PrimeTwo + arg3.GetHashCode(); return hash; } } public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2) { unchecked { int hash = PrimeOne; hash = hash * PrimeTwo + arg1.GetHashCode(); hash = hash * PrimeTwo + arg2.GetHashCode(); return hash; } } } |
我的大部分工作都是通过数据库连接完成的,这意味着我的类都具有来自数据库的唯一标识符。我总是使用数据库中的ID来生成哈希代码。
1 2 3 4 5 6 7 | // Unique ID from database private int _id; ... { return _id.GetHashCode(); } |
如果我们的属性不超过8个(希望如此),这里还有另一个选择。
这意味着我们可以简单地做到:
1 2 | // Yay, no allocations and no custom implementations! public override int GetHashCode() => (this.PropA, this.PropB).GetHashCode(); |
让我们来看一下.NET核心目前对
来自
1 2 3 4 5 6 7 8 9 | internal static int CombineHashCodes(int h1, int h2) { return HashHelpers.Combine(HashHelpers.Combine(HashHelpers.RandomSeed, h1), h2); } internal static int CombineHashCodes(int h1, int h2, int h3) { return HashHelpers.Combine(CombineHashCodes(h1, h2), h3); } |
这是来自
1 2 3 4 5 6 7 8 9 10 11 12 | public static readonly int RandomSeed = Guid.NewGuid().GetHashCode(); public static int Combine(int h1, int h2) { unchecked { // RyuJIT optimizes this to use the ROL instruction // Related GitHub pull request: dotnet/coreclr#1830 uint rol5 = ((uint)h1 << 5) | ((uint)h1 >> 27); return ((int)rol5 + h1) ^ h2; } } |
英语:
- 左旋转(圆周移动)h1 5个位置。
- 将结果和h1相加。
- xor结果与h2。
- 首先对静态随机种子h1执行上述操作。
- 对于每个进一步的项目,对上一个结果和下一个项目(例如h2)执行操作。
最好了解更多关于这个rol-5散列码算法的属性。
遗憾的是,为我们自己的
使用上面选择的实现,我遇到了一个带有浮点和小数的问题。
此测试失败(浮动;哈希相同,即使我将2个值切换为负数):
1 2 3 4 5 | var obj1 = new { A = 100m, B = 100m, C = 100m, D = 100m}; var obj2 = new { A = 100m, B = 100m, C = -100m, D = -100m}; var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D); var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D); Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different hash1:{0} hash2:{1}",hash1,hash2)); |
但是这个测试通过了(带ints):
1 2 3 4 5 | var obj1 = new { A = 100m, B = 100m, C = 100, D = 100}; var obj2 = new { A = 100m, B = 100m, C = -100, D = -100}; var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D); var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D); Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different hash1:{0} hash2:{1}",hash1,hash2)); |
我将实现改为不对基元类型使用gethashcode,它似乎工作得更好
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | private static int InternalComputeHash(params object[] obj) { unchecked { var result = (int)SEED_VALUE_PRIME; for (uint i = 0; i < obj.Length; i++) { var currval = result; var nextval = DetermineNextValue(obj[i]); result = (result * MULTIPLIER_VALUE_PRIME) + nextval; } return result; } } private static int DetermineNextValue(object value) { unchecked { int hashCode; if (value is short || value is int || value is byte || value is sbyte || value is uint || value is ushort || value is ulong || value is long || value is float || value is double || value is decimal) { return Convert.ToInt32(value); } else { return value != null ? value.GetHashCode() : 0; } } } |
微软领导了几种散列方法…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | //for classes that contain a single int value return this.value; //for classes that contain multiple int value return x ^ y; //for classes that contain single number bigger than int return ((int)value ^ (int)(value >> 32)); //for classes that contain class instance fields which inherit from object return obj1.GetHashCode(); //for classes that contain multiple class instance fields which inherit from object return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode(); |
我可以猜测,对于多个大整数,可以使用这个:
1 2 3 4 | int a=((int)value1 ^ (int)(value1 >> 32)); int b=((int)value2 ^ (int)(value2 >> 32)); int c=((int)value3 ^ (int)(value3 >> 32)); return a ^ b ^ c; |
多类型相同:全部使用
对于那些使用hash作为id(我的意思是一个唯一的值)的人,hash自然被限制在数字的数量上,我认为它对于hash算法是5个字节,至少是md5。
您可以将多个值转换为哈希值,其中一些值是相同的,因此不要将其用作标识符。(也许有一天我会用你的组件)