Sieve of Eratosthenes - Finding Primes Python
只是为了澄清,这不是一个功课问题:)
我想找到我正在建造的数学应用程序的素数并且遇到了Eratosthenes的Sieve方法。
我用Python编写了它的实现。 但它非常慢。 比方说,如果我想找到不到200万的所有素数。 大约需要20分钟。 (此时我停了下来)。 我怎样才能加快速度呢?
1 2 3 4 5 6 7 8 9 10 11 12 | def primes_sieve(limit): limitn = limit+1 primes = range(2, limitn) for i in primes: factors = range(i, limitn, i) for f in factors[1:]: if f in primes: primes.remove(f) return primes print primes_sieve(2000) |
更新:
我最后对这段代码进行了分析,发现花了很多时间从列表中删除一个元素。 考虑到它必须遍历整个列表(最坏情况)才能找到元素然后删除它然后重新调整列表(可能还有一些副本继续?),这是相当容易理解的。 无论如何,我把字典列表删掉了。 我的新实施 -
1 2 3 4 5 6 7 8 9 10 11 12 | def primes_sieve1(limit): limitn = limit+1 primes = dict() for i in range(2, limitn): primes[i] = True for i in primes: factors = range(i,limitn, i) for f in factors[1:]: primes[f] = False return [i for i in primes if primes[i]==True] print primes_sieve1(2000000) |
您还没有完全实现正确的算法:
在您的第一个示例中,
在第二个例子中,
正确的算法(使用列表而不是字典)看起来像:
1 2 3 4 5 6 7 8 9 | def primes_sieve2(limit): a = [True] * limit # Initialize the primality list a[0] = a[1] = False for (i, isprime) in enumerate(a): if isprime: yield i for n in range(i*i, limit, i): # Mark factors non-prime a[n] = False |
(请注意,这还包括在素数平方(
1 2 3 4 5 6 7 8 9 | def eratosthenes(n): multiples = [] for i in range(2, n+1): if i not in multiples: print (i) for j in range(i*i, n+1, i): multiples.append(j) eratosthenes(100) |
从数组(列表)的开头删除需要将其后的所有项目移动。这意味着从前面开始以这种方式从列表中删除每个元素是O(n ^ 2)操作。
您可以使用集合更有效地执行此操作:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | def primes_sieve(limit): limitn = limit+1 not_prime = set() primes = [] for i in range(2, limitn): if i in not_prime: continue for f in range(i*2, limitn, i): not_prime.add(f) primes.append(i) return primes print primes_sieve(1000000) |
...或者,避免重新排列列表:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def primes_sieve(limit): limitn = limit+1 not_prime = [False] * limitn primes = [] for i in range(2, limitn): if not_prime[i]: continue for f in xrange(i*2, limitn, i): not_prime[f] = True primes.append(i) return primes |
快多了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import time def get_primes(n): m = n+1 #numbers = [True for i in range(m)] numbers = [True] * m #EDIT: faster for i in range(2, int(n**0.5 + 1)): if numbers[i]: for j in range(i*i, m, i): numbers[j] = False primes = [] for i in range(2, m): if numbers[i]: primes.append(i) return primes start = time.time() primes = get_primes(10000) print(time.time() - start) print(get_primes(100)) |
通过结合许多爱好者(包括上面评论中的Glenn Maynard和MrHIDEn)的贡献,我在python 2中想出了以下代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | def simpleSieve(sieveSize): #creating Sieve. sieve = [True] * (sieveSize+1) # 0 and 1 are not considered prime. sieve[0] = False sieve[1] = False for i in xrange(2,int(math.sqrt(sieveSize))+1): if sieve[i] == False: continue for pointer in xrange(i**2, sieveSize+1, i): sieve[pointer] = False # Sieve is left with prime numbers == True primes = [] for i in xrange(sieveSize+1): if sieve[i] == True: primes.append(i) return primes sieveSize = input() primes = simpleSieve(sieveSize) |
对于功率为10的不同输入,我的机器上的计算时间是:
- 3:0.3毫秒
- 4:2.4毫秒
- 5:23 ms
- 6:0.26秒
- 7:3.1秒
- 8:33 s
我意识到这并没有真正回答如何快速生成质数的问题,但也许有些人会发现这个替代方案很有意思:因为python通过生成器提供了懒惰的评估,eratosthenes的筛子可以完全按照说明实现:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | def intsfrom(n): while True: yield n n += 1 def sieve(ilist): p = next(ilist) yield p for q in sieve(n for n in ilist if n%p != 0): yield q try: for p in sieve(intsfrom(2)): print p, print '' except RuntimeError as e: print e |
try块就在那里,因为算法运行直到它吹掉堆栈而没有
尝试阻止回溯显示按下你想看到的实际输出屏幕。
我认为这是使用eratosthenes方法查找素数的最短代码
1 2 3 4 5 6 7 8 | def prime(r): n = range(2,r) while len(n)>0: yield n[0] n = [x for x in n if x not in range(n[0],r,n[0])] print(list(prime(r))) |
我最快的实施:
1 2 3 4 5 6 7 8 | isprime = [True]*N isprime[0] = isprime[1] = False for i in range(4, N, 2): isprime[i] = False for i in range(3, N, 2): if isprime[i]: for j in range(i*i, N, 2*i): isprime[j] = False |
1 2 3 4 5 6 7 8 9 10 11 | import math def sieve(n): primes = [True]*n primes[0] = False primes[1] = False for i in range(2,int(math.sqrt(n))+1): j = i*i while j < n: primes[j] = False j = j+i return [x for x in range(n) if primes[x] == True] |
我认为必须可以简单地使用空列表作为循环的终止条件,并提出这个:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | limit = 100 ints = list(range(2, limit)) # Will end up empty while len(ints) > 0: prime = ints[0] print prime ints.remove(prime) i = 2 multiple = prime * i while multiple <= limit: if multiple in ints: ints.remove(multiple) i += 1 multiple = prime * i |
因为速度,我更喜欢NumPy。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import numpy as np # Find all prime numbers using Sieve of Eratosthenes def get_primes1(n): m = int(np.sqrt(n)) is_prime = np.ones(n, dtype=bool) is_prime[:2] = False # 0 and 1 are not primes for i in range(2, m): if is_prime[i] == False: continue is_prime[i*i::i] = False return np.nonzero(is_prime)[0] # Find all prime numbers using brute-force. def isprime(n): ''' Check if integer n is a prime ''' n = abs(int(n)) # n is a positive integer if n < 2: # 0 and 1 are not primes return False if n == 2: # 2 is the only even prime number return True if not n & 1: # all other even numbers are not primes return False # Range starts with 3 and only needs to go up the square root # of n for all odd numbers for x in range(3, int(n**0.5)+1, 2): if n % x == 0: return False return True # To apply a function to a numpy array, one have to vectorize the function def get_primes2(n): vectorized_isprime = np.vectorize(isprime) a = np.arange(n) return a[vectorized_isprime(a)] |
检查输出:
1 2 3 4 5 | n = 100 print(get_primes1(n)) print(get_primes2(n)) [ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97] [ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97] |
比较Eratosthenes筛子的速度和Jupyter笔记本上的蛮力。 Eratosthenes的筛子比百万元素的蛮力快539倍。
1 2 3 4 | %timeit get_primes1(1000000) %timeit get_primes2(1000000) 4.79 ms ± 90.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.58 s ± 31.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
这是一个更节省内存的版本(并且:适当的筛选,而不是试验分区)。基本上,不是保留所有数字的数组,而是交叉那些不是素数的数组,而是保留一组计数器 - 一个用于发现它的每个素数 - 并在推定的素数之前跳跃它们。这样,它使用与素数成比例的存储,而不是最高素数。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import itertools def primes(): class counter: def __init__ (this, n): this.n, this.current, this.isVirgin = n, n*n, True # isVirgin means it's never been incremented def advancePast (this, n): # return true if the counter advanced if this.current > n: if this.isVirgin: raise StopIteration # if this is virgin, then so will be all the subsequent counters. Don't need to iterate further. return False this.current += this.n # pre: this.current == n; post: this.current > n. this.isVirgin = False # when it's gone, it's gone return True yield 1 multiples = [] for n in itertools.count(2): isPrime = True for p in (m.advancePast(n) for m in multiples): if p: isPrime = False if isPrime: yield n multiples.append (counter (n)) |
您会注意到
1 2 3 4 | import itertools for k in itertools.islice (primes(), n): print (k) |
而且,为了完整性,这里有一个衡量性能的计时器:
1 2 3 4 5 6 7 8 9 10 | import time def timer (): t, k = time.process_time(), 10 for p in primes(): if p>k: print (time.process_time()-t, " to", p, " ") k *= 10 if k>100000: return |
为了防止你想知道,我还将
我的实施:
1 2 3 4 5 6 7 8 9 10 11 | import math n = 100 marked = {} for i in range(2, int(math.sqrt(n))): if not marked.get(i): for x in range(i * i, n, i): marked[x] = True for i in range(2, n): if not marked.get(i): print i |
简单的速度破解:当您定义变量"primes"时,将步长设置为2以自动跳过所有偶数,并将起点设置为1。
然后你可以进一步优化而不是素数中的i,在素数中使用i [:round(len(primes)** 0.5)]。这将大大提高性能。此外,您可以消除以5结尾的数字以进一步提高速度。