Is “==” in sorted array not faster than unsorted array?
注:我认为所谓的重复问题主要与"<"和">"比较有关,但不与"=="比较有关,因此不回答我关于"=="运算符性能的问题。
很长一段时间以来,我一直认为排序数组的"处理"速度应该比未排序数组快。首先,我认为在排序数组中使用"=="应该比在未排序数组中更快,因为-我猜-关于分支预测的工作原理:
UNSORTEDARRAY:
1 2 3 4 5 6 | 5 == 100 F 43 == 100 F 100 == 100 T 250 == 100 F 6 == 100 F (other elements to check) |
SORTEDARRAY:
1 2 3 4 5 | 5 == 100 F 6 == 100 F 43 == 100 F 100 == 100 T (no need to check other elements, so all are F) |
所以我猜sortedarray应该比unsortedarray更快,但是今天我用代码在头中生成2个数组进行测试,分支预测似乎不像我想的那样工作。
我生成了一个未排序的数组和一个要测试的排序数组:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | srand(time(NULL)); int UNSORTEDARRAY[524288]; int SORTEDARRAY[sizeof(UNSORTEDARRAY)/sizeof(int)]; for(int i=0;i<sizeof(SORTEDARRAY)/sizeof(int);i++){ SORTEDARRAY[i]=UNSORTEDARRAY[i]=rand(); } sort(SORTEDARRAY,SORTEDARRAY+sizeof(SORTEDARRAY)/sizeof(int)); string u="const int UNSORTEDARRAY[]={"; string s="const int SORTEDARRAY[]={"; for(int i=0;i<sizeof(UNSORTEDARRAY)/sizeof(int);i++){ u+=to_string(UNSORTEDARRAY[i])+","; s+=to_string(SORTEDARRAY[i])+","; } u.erase(u.end()-1); s.erase(s.end()-1); u+="}; "; s+="}; "; ofstream out("number.h"); string code=u+s; out << code; out.close(); |
所以要测试,只需计算一下,如果值为==rand_max/2,如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 |
运行3次:
无索达雷
1 2 3 | 0.005376 0.005239 0.005220 |
索德达雷
1 2 3 | 0.005334 0.005120 0.005223 |
这似乎是一个很小的性能差异,所以我不相信它,然后尝试将"sortearray[i]==rand_max/2"更改为"sortearray[i]>rand_max/2",以查看它是否产生了差异:
无索达雷
1 2 3 | 0.008407 0.008363 0.008606 |
索德达雷
1 2 3 | 0.005306 0.005227 0.005146 |
这次有很大的不同。
排序数组中的"=="是否不比未排序数组快?如果是,为什么排序数组中的">"比未排序数组快,但"=="不是?
立即想到的一件事是CPU的分支预测算法。
在
在未排序的数组中,
这就是使排序版本更快的原因。
在
注意,我正在回答这个问题,因为它太长了,无法发表评论。
这里的效果正是在这个问题的大量答案中已经详细解释过的。在这种情况下,由于分支预测,处理排序数组的速度更快。
在这里,罪魁祸首再次是分支预测。
寓意:
I believe"processing" a sorted array should be faster than [an ]unsorted array.
你需要知道为什么。这不是什么神奇的规则,也不总是真的。
比较
您可以使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | jason@io /tmp $ lz4 -d ints | perf stat ./proc-eq >/dev/null Successfully decoded 104824717 bytes Performance counter stats for './proc-eq': 5226.932577 task-clock (msec) # 0.953 CPUs utilized 31 context-switches # 0.006 K/sec 24 cpu-migrations # 0.005 K/sec 3,479 page-faults # 0.666 K/sec 15,763,486,767 cycles # 3.016 GHz 4,238,973,549 stalled-cycles-frontend # 26.89% frontend cycles idle <not supported> stalled-cycles-backend 31,522,072,416 instructions # 2.00 insns per cycle # 0.13 stalled cycles per insn 8,515,545,178 branches # 1629.167 M/sec 10,261,743 branch-misses # 0.12% of all branches 5.483071045 seconds time elapsed jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-eq >/dev/null Successfully decoded 104824717 bytes Performance counter stats for './proc-eq': 5536.031410 task-clock (msec) # 0.348 CPUs utilized 198 context-switches # 0.036 K/sec 21 cpu-migrations # 0.004 K/sec 3,604 page-faults # 0.651 K/sec 16,870,541,124 cycles # 3.047 GHz 5,300,218,855 stalled-cycles-frontend # 31.42% frontend cycles idle <not supported> stalled-cycles-backend 31,526,006,118 instructions # 1.87 insns per cycle # 0.17 stalled cycles per insn 8,516,336,829 branches # 1538.347 M/sec 10,980,571 branch-misses # 0.13% of all branches jason@io /tmp $ lz4 -d ints | perf stat ./proc-gt >/dev/null Successfully decoded 104824717 bytes Performance counter stats for './proc-gt': 5293.065703 task-clock (msec) # 0.957 CPUs utilized 38 context-switches # 0.007 K/sec 50 cpu-migrations # 0.009 K/sec 3,466 page-faults # 0.655 K/sec 15,972,451,322 cycles # 3.018 GHz 4,350,726,606 stalled-cycles-frontend # 27.24% frontend cycles idle <not supported> stalled-cycles-backend 31,537,365,299 instructions # 1.97 insns per cycle # 0.14 stalled cycles per insn 8,515,606,640 branches # 1608.823 M/sec 15,241,198 branch-misses # 0.18% of all branches 5.532285374 seconds time elapsed jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-gt >/dev/null 15.930144154 seconds time elapsed Performance counter stats for './proc-gt': 5203.873321 task-clock (msec) # 0.339 CPUs utilized 7 context-switches # 0.001 K/sec 22 cpu-migrations # 0.004 K/sec 3,459 page-faults # 0.665 K/sec 15,830,273,846 cycles # 3.042 GHz 4,456,369,958 stalled-cycles-frontend # 28.15% frontend cycles idle <not supported> stalled-cycles-backend 31,540,409,224 instructions # 1.99 insns per cycle # 0.14 stalled cycles per insn 8,516,186,042 branches # 1636.509 M/sec 10,205,058 branch-misses # 0.12% of all branches 15.365528326 seconds time elapsed |