关于列表：在Java中多次获取集合的第n个元素

Getting the nth element of a set several times in Java

有没有有效的方法来获取Java中的第n个元素？
我知道有两种方法：
- 通过迭代直到我到达所需的元素
- 通过将其转换为ArrayList并从该ArrayList获取元素
问题是，有没有其他方法可以快速获得它的第n个元素。我主要需要TreeSets这样的功能。

编辑：
例如，如果我想非常频繁地(即每2-3秒)从一个10 000 000元素长树图或树集中选择1000个随机元素，那么将它一直克隆到一个arraylist是非常低效的，并且迭代这么多元素也是低效的。

相关讨论

除非该集合是有序集合(例如TreeSet)，否则没有"nth"的概念。另一方面，TreeSet不提供索引访问，因为无论如何都需要迭代(很像访问链表中的第n个元素需要迭代)，因此我只是继续使用迭代方法或跟踪哪个element是集修改操作期间的第n个元素。
有没有办法通过使用某种特殊键来获取它，它具有自定义compareTo，equals和hashCode方法？
即使那是可能的(并且几乎所有东西都可以)，你也不会真正获得任何东西：你必须知道树的确切布局，哪个索引对应于根节点，树是否反转或者不是等等，你必须跟踪查找步骤并从中计算当前索引 - 最后你可能会做更多，如果不是你在迭代时做的更多。另一方面，使用这样的特殊键会非常容易出错，实际上会破坏你的树 - 所以我不会这样做。
我想首先看到它的解决方案，然后我们可以考虑它将获得什么:)顺便说一句，例如，如果我想从一个10 000 000元素长树集中选择1000个随机元素，非常频繁，然后克隆它一直到arraylist是非常低效的，并且迭代这么多元素也是低效的。
"我希望首先看到一个解决方案" - 我宁愿不投资一个解决方案，它必然会造成更多伤害而不是解决任何问题:)我会为你面临的挑战添加一个答案(至少其中一些)但最好的选择可能是退一步思考一套是否是正确的结构。你能编辑你的问题并添加一些更多的要求，比如你需要什么，为什么你需要使用一套(或你为什么认为你这样做)等等？
该集合是否随着时间的推移而改变其结构或其元素是否"固定"？ (我的意思是如果你随着时间的推移添加或删除元素)
是的当然我打算一直添加/删除元素。如果我有固定数量的数据，那么我可以使用数组而不是设置：D
如果真正的问题是从包含N个元素的集合中随机选择n个元素，那么这可能会有所帮助：stackoverflow.com/a/28655112/1441122
是的，Stuart Marks，确切地说，这是最低效的方式。该解决方案适用于少量数据，例如从150个元素中选择20个随机元素，但绝对不适用于大量数据。

如果您确定需要来自集合中随机位置的n个元素(类似于统计抽样)，那么您可能需要考虑只迭代集合一次并在迭代时按所需概率选取样本通过集合。这种方式更有效，因为您只需迭代一次该集合。

以下程序演示了这个想法：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Random;
import java.util.Set;
import java.util.TreeSet;

public class SamplingFromSet {

public static void main(String[] args) {
Set<String> population = new TreeSet<>();

/*
* Populate the set
*/
final int popSize = 17;
for (int i=0; i<popSize; i++) {
population.add(getRandomString());
}

List<String> sample
= sampleFromPopulation(population, 3 /*sampleSize */);

System.out.println("population is");
System.out.println(population.toString());
System.out.println("sample is");
System.out.println(sample.toString());

}

/**
* Pick some samples for a population
* @param population
* @param sampleSize - number of samples
* @return
*/
private static < T >
List< T > sampleFromPopulation(Set< T > population
, int sampleSize) {
float sampleProb = ((float) sampleSize) / population.size();
List< T > sample = new ArrayList<>();
Iterator< T > iter = population.iterator();
while (iter.hasNext()) {
T element = iter.next();
if (random.nextFloat()<sampleProb) {
/*
* Lucky Draw!
*/
sample.add(element);
}
}
return sample;
}

private static Random random = new Random();

private static String getRandomString() {
return String.valueOf(random.nextInt());
}
}

该计划的输出：

1
2
3
4

population is
[-1488564139, -1510380623, -1980218182, -354029751, -564386445, -57285541, -753388655, -775519772, 1538266464, 2006248253, 287039585, 386398836, 435619764, 48109172, 580324150, 64275438, 860615531]
sample is
[-57285541, -753388655, 386398836]

更新

然而，上述计划有一个警告 - 自收拾以来
通过概率完成那一个走过集合的样本，
返回的sample可能，取决于你当天的运气，
比指定的样本少或多。
但是，这个问题可以通过略微改变程序来弥补，
它使用略有不同的方法签名：

1
2
3
4
5
6
7
8
9
10
11
12
13

/**
* Pick some samples from a population
* @param population
* @param sampleSize - number of samples
* @param exactSize - a boolean to control whether or not
* the returned sample list must be of the exact size as
* specified.
* @return
*/
private static < T >
List< T > sampleFromPopulation(Set< T > population
, int sampleSize
, boolean exactSize);

防止过采样

在通过人口的一次迭代中，我们过度采样，
如果我们确实有太多样品，那么最后我们会丢弃一些样品。

防止欠采样

还要注意，即使进行过采样，也存在非零概率
那，在通过人口的一次迭代结束时，我们仍然
得到的样本少于预期。如果发生这种情况(不太可能)，我们将递归调用
同样的方法再次尝试。 (这种递归有可能接近一次
终止，因为它与重复递归非常不同
调用方法，我们一直得到欠采样。)

以下代码实现了新的sampleFromPopulation()方法：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

private static < T >
List< T > sampleFromPopulation(Set< T > population
, int sampleSize
, boolean exactSize) {
int popSize = population.size();
double sampleProb = ((double) sampleSize) / popSize;

final double OVER_SAMPLING_MULIT = 1.2;
if (exactSize) {
/*
* Oversampling to enhance of chance of getting enough
* samples (if we then have too many, we will drop them
* later)
*/
sampleProb = sampleProb * OVER_SAMPLING_MULIT;
}
List< T > sample = new LinkedList<>(); // linked list for fast removal
Iterator< T > iter = population.iterator();
while (iter.hasNext()) {
T element = iter.next();
if (random.nextFloat()<sampleProb) {
/*
* Lucky Draw!
*/
sample.add(element);
}
}
int samplesTooMany = sample.size() - sampleSize;
if (!exactSize || samplesTooMany==0) {
return sample;
} else if (samplesTooMany>0) {
Set<Integer> indexesToRemoveAsSet = new HashSet<>();
for (int i=0; i<samplesTooMany; ) {
int candidate = random.nextInt(sample.size());
if (indexesToRemoveAsSet.add(candidate)) {
/*
* add() returns true if candidate was not
* previously in the set
*/
i++; // proceed to draw next index
}
}
List<Integer> indexesToRemoveAsList
= new ArrayList<>(indexesToRemoveAsSet);
Collections.sort(indexesToRemoveAsList
, (i1, i2) -> i2.intValue() - i1.intValue()); // desc order
/*
* Now we drop from the tail of the list
*/
for (Integer index : indexesToRemoveAsList) {
sample.remove((int) index); // remove by index (not by element)
}
return sample;
} else {
/*
* we were unluckly that we oversampling we still
* get less samples than specified, so here we call
* this very same method again recursively
*/
return sampleFromPopulation(population, sampleSize, exactSize);
}
}