关于安全性:python的随机数生成是否易于重现?

Is python's random number generation easily reproducible?

我在标准库中阅读了关于Python随机模块的内容。当我设定种子并产生一些随机数时,我感到惊讶:

1
2
3
random.seed(1)
for i in range(5):
    print random.random()

所产生的数字与文章中的样本完全相同。我认为可以肯定地说,当设置种子时,算法是确定性的。

当种子未设定时,标准库种子带time.time()。现在假设一个在线服务使用random.random()生成一个captcha代码,黑客可以使用相同的随机生成器轻松地复制captcha吗?

  • 假设黑客知道将随机数转换为验证码的算法。否则,这似乎是不可能的。
  • 由于在导入模块时调用了random.seed(),我假设对于Web应用程序,作为种子使用的时间大约是发送请求的时间(几秒钟内),那么通过几次尝试就可以轻松地进行校准了?
  • 我担心的太多了,还是这是一个真正的弱点?


    现有的答案很好,但我会加几分。

    更新:

    实际上,如果您不提供种子,随机数生成器将使用来自系统随机源的随机位来种子,如果操作系统没有随机源,它只返回到使用系统时间作为种子。另外请注意,最近的Python版本可以使用改进的种子设定方案。从文档中:

    random.seed(a=None, version=2)

    Initialize the random number generator.

    If a is omitted or None, the current system time is used. If
    randomness sources are provided by the operating system, they are used
    instead of the system time (see the os.urandom() function for
    details on availability).

    If a is an int, it is used directly.

    With version 2 (the default), a str, bytes, or bytearray object gets
    converted to an int and all of its bits are used.

    With version 1 (provided for reproducing random sequences from older
    versions of Python), the algorithm for str and bytes generates a
    narrower range of seeds.

    Changed in version 3.2: Moved to the version 2 scheme which uses all of the bits in a string seed.

    与生成秘密密钥相比,生成验证码并不是一个高安全性的应用程序,尤其是要多次使用的密钥。作为推论,生成验证码所需的熵比密码密钥所需的熵小。

    请记住,用于设定random种子的系统时间(可能)不是以秒为单位的系统时间-它更可能是以微秒甚至纳秒为单位的时间,因此,除了内德提到的考虑之外,攻击者不容易从野蛮的搜索中找出种子。

    下面是一个快速的演示,在2Ghz Linux系统上运行python 2.6.6。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    #!/usr/bin/env python
    ''' random seeding demo'''

    from __future__ import print_function
    import time
    from random import seed, randint, random

    def rf():
        return randint(10, 99)

    def put_time():
        print('%.15f' % time.time())

    r = range(10)
    a = []

    put_time()
    for i in r:
        seed()
        a.append([rf() for j in r])
    put_time()

    for row in a:
        print(row)

    典型输出

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    1436617059.071794986724854
    1436617059.074091911315918
    [95, 25, 50, 75, 80, 38, 21, 26, 85, 82]
    [75, 96, 14, 13, 76, 53, 94, 68, 80, 66]
    [79, 33, 65, 86, 12, 32, 80, 83, 36, 42]
    [28, 47, 62, 21, 52, 30, 54, 62, 22, 28]
    [22, 40, 71, 36, 78, 64, 17, 33, 99, 43]
    [81, 15, 32, 15, 63, 57, 83, 67, 12, 62]
    [22, 56, 54, 55, 51, 56, 34, 56, 94, 16]
    [64, 82, 37, 80, 70, 91, 56, 41, 55, 12]
    [47, 37, 64, 14, 69, 65, 42, 17, 22, 17]
    [43, 43, 73, 82, 61, 55, 32, 52, 86, 74]

    如您所见,外部循环开始和结束之间的时间间隔不到3毫秒,但是a中的所有列表都是完全不同的。

    注意,传递给random.seed()的种子可以是任何可散列的对象,当您传递一个非整数(如float和系统时间一样)时,它首先被散列以创建一个整数。

    不过,不需要只使用系统时间作为种子:您可以使用SystemRandomos.urandom()来获取种子。这样一来,种子就更难以预测了,但你可以得到梅森缠绕机的速度;SystemRandom比梅森缠绕机慢一点,因为它必须进行系统调用。然而,即使是urandom也不完全安全。

    从GNU Urandom手册页:

    The random number generator gathers environmental noise from device
    drivers and other sources into an entropy pool. The generator also
    keeps an estimate of the number of bits of noise in the entropy pool.
    From this entropy pool random numbers are created.

    When read, the /dev/random device will only return random bytes
    within the estimated number of bits of noise in the entropy pool.
    /dev/random should be suitable for uses that need very high quality
    randomness such as one-time pad or key generation. When the entropy
    pool is empty, reads from /dev/random will block until additional
    environmental noise is gathered.

    A read from the /dev/urandom device will not block waiting for more
    entropy. As a result, if there is not sufficient entropy in the
    entropy pool, the returned values are theoretically vulnerable to a
    cryptographic attack on the algorithms used by the driver. Knowledge
    of how to do this is not available in the current unclassified
    literature, but it is theoretically possible that such an attack may
    exist. If this is a concern in your application, use /dev/random
    instead.

    Usage

    If you are unsure about whether you should use
    /dev/random or /dev/urandom, then probably you want to use the latter.
    As a general rule, /dev/urandom should be used for everything except
    long-lived GPG/SSL/SSH keys.


    播种后序列是确定的,这并不奇怪。这就是播种的关键。random.random被称为伪随机数发生器prng。这并不是Python独有的,每种语言的简单随机源都是以这种方式确定的。

    是的,真正关心安全的人会担心攻击者会复制序列。这就是为什么其他随机性来源也可以使用,比如os.urandom,但它们更昂贵。

    但问题并不像您所说的那么严重:对于Web请求,通常一个进程处理多个请求,因此模块在过去的某个未知点初始化,而不是在收到Web请求时初始化。


    Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

    请参阅此安全随机答案。


    python文档有这样的说法:

    Warning
    The pseudo-random generators of this module should not be used for
    security purposes. Use os.urandom() or SystemRandom if you require a
    cryptographically secure pseudo-random number generator.

    所以,用它来验证码不太可能是个好主意。