关于算法：在O（n log ^ 2（n））时间内在数组中查找重复元素

algorithmbig-osearchsorting

finding a repeat element in an array in O(n log^2(n))time

我正在研究一种算法，在一个数字数组中找到第一个重复的元素。我理解要做到这一点，我应该对数组进行排序(使用o(nlogn)中的合并排序)，然后在对元素(o(logn))执行二进制搜索时遍历数组。所以我的算法的总运行时间是o(nlogn)+o(logn)=o(nlogn)。我的问题是我错过了什么使它O(nlog2n)。我不明白为什么对数是平方的。

问题陈述：

You receive a list of integers, one at a time, a1,a2,... and I want
you to find the first time an ai comes up that has occurred
previously, i.e. find the smallest i such that all the a1,a2,...,ai-1
are distinct but ai=aj for some 1≤j. Describe an O(i log2 i) time
algorithm to find the smallest i. ( Give an algorithm in recursive
form and show its running time using the Master Theorem. )

号

相关讨论

您的算法不正确，因为它的运行速度比实际问题的要求慢。

这个问题特别要求在O(i logi)时间运行，其中i是第一个被重复的时间。如果你在数字流结束时执行算法，你的速度太慢了(数据流也可能是无限的！)如果对每个新的数字重复您的算法，它将运行得太慢(1log1 + 2log2 + 3log3 + ... + ilogi不在O(ilogi)中)。

您可以通过维护一个集合(基于一个自平衡的BST)来做到这一点，当一个新元素出现时：

1
2
3
4
5

check if it is in the set
if it is:
abort - found the first dupe
otherwise:
add it.

另请注意，要求(如您所给)是O(i log_2i)，而不是O(i*log^2(i))。

按如下方式使用BST：

Initialize a Binary Search Tree

For each new element that comes along, insert it in the tree and update the tree

Whenever you find an element that was already present in the BST, => you have found the smallest i index

号

在插入k'th元素时，更新树的复杂性是o(klogk)。

查找重复的元素只需要O(n log n)时间。

一旦对数组进行了排序，重复的元素将彼此相邻；不需要执行二进制搜索。

根据编辑的问题进行编辑：

如果需要在线流算法而不是批处理任务，那么可以使用已经遇到的元素的平衡搜索树。这应该在第一个i元素上提供o(i log i)性能。

相关讨论