Computation of Randomness

Produxt Manager
The Startup
Published in
7 min readMay 9, 2020

--

Python random.random() on Linux.

If computers are deterministic, how the hell does it calculate randomness? After traversing through blogs and forums, I could not find all encompassing article that went through the steps from input to output for this:

import randomrandom.random()
>>0.08436837268585984

Explore with me as I dive into the concepts and the code on computing randomness. I am using Python 3.7 on Linux.

Disclaimer: Take this article as exploratory technical journalism, not as an authoritative source — I am more interested in concepts rather than precise details. I look at the happy path and do not cover any conditions throughout the process of generating a random number or error handling. Please feel free to point out any mistakes that need correcting, to clarify any confusing points, or to share interesting information that should be added. Thanks!

Adding random bytes to /dev/urandom

The first step of our journey is looking at how the computer can even get random numbers. There are pseudorandom number generators where there are a lot of processing that can simulate randomness but given enough runs, we can start to see some patterns and predictability emerging.

See the blue plotting in the below gif. This is from a pseudorandom number generator and given enough steps you can see a pattern here — randomness is not supposed to be predictable.

credit to Khan Academy: https://www.youtube.com/watch?v=GtOt7EBNEwQ&feature=youtu.be)

Do you recall cryptoquotes in your daily newspaper (when newspapers were a thing)? There are seemingly random characters in the quote. But knowing the patterns of English, you can decipher the message and find the generators for this pseudorandomness e.g. A is actually T, T is actually I.

There are better ways to produce randomness and a solution that is widely used is to Hardware Random Number Generators which we will explore here. There are a lot of steps from between your input, running Python’s random.random() function and Python’s output. We will first look on how to get a bunch of random data from the Hardware Random Number Generator to put into a file called /dev/urandom.

Workflow from bits to output

Hardware Random Number Generators collect information running through CPU or the motherboard. It will collect bytes that run through the CPU and motherboard from external sources that is supposed to generate noise, random data e.g. keyboard, mouse, audio, video. It is not so much generating random data. It is more like sampling data randomly at certain intervals through an oscillator e.g. 300 MHz. This sampling is then added to a file called an Entropy Pool.

Sampling bytes at certain intervals

Have you played the game with your spouse of coming up with your baby’s name? You probably sampled names you hear from all kinds of random sources, such as listening to some side conversation in a cafe, passing by signs on the highway and looking for interesting street names, etc. This sampling is very similar in that the external inputs are randomized instead internally generating a random name yourself.

There are more processes than just sampling, such as ensuring the sampled bytes are valid random bytes and controlling for biases, or changing the frequency in which you sample which is supposed to create more randomness. A lot of the design is not public knowledge. In the Linux code, it comments how it interfaces with Intel Hardware Random Number Generators but I could not find any precise details. I understand that this may be proprietary and a sense that if this knowledge is publicly known, then it would be easier to hack systems by understanding the generators. I did find some designs through OneRNG which is a USB random number generator that uses an avalanche diode to sample random bytes for the Entropy Pool. http://onerng.info/ . The premise of OneRNG is that the generator should verifiable in order to be secured. I would like to explore this as some point.

The Entropy Pool is not the entropy as you would suspect if you have studied Thermodynamics but Entropy as in the Information Theory sense from Claude Shannon. http://math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

Noise i.e entropy is produced by environmental factors like mouse clicks which is supposed to be outside the deterministic framework of the computer.

Next, there is a Linux driver that is able to access the Entropy pool called /dev/hwrng and starts writing random bytes to a file called dev/urandom.

To see these random bytes, open Terminal and do the following command:

hexdump /dev/urandom
Note: This will keep running in your Terminal so use Command + Z to suspend the command

Another Note: I ran cat /dev/urandom and it did a lot of funky things.

The numbers on the left are the addresses in memory followed by the sampled random bytes in hexadecimal form.

There is another file called /dev/random that we will not go into but a note is that random bytes are constantly being consumed by other programs such as encryption. It is possible to exhaust, or use up, all of the random bytes in the /dev/random file or Entropy Pool. In this event, random numbers are then calculated through some algorithms such as SHA-1. It being deterministic, it is easy for hackers to reverse engineer the random byte used to create an encryption key. It is highly advisable in the community to use /dev/urandom. Others are trying to explicitly add other sources of random data that they keep feeding random bytes into Entropy Pool such USB, webcam, etc…

Now that we have random bytes in the /dev/urandom file, we are now on our next step of pulling those random bytes to produce a random number from our Python random.random() function.

Grabbing bytes from /dev/urandom

When you do random.random() in Python, it makes use of some Linux OS files to grab the random bytes from dev/devurandom.

random.random() is a function of the random library that is shipped with Python. Python, being built on top of C, can access some of the C files that runs on Linux, in this case the random.c file which is able to interact with the /dev/urandom file where our random bytes are.

In Linux, a program called random.c will grab some bytes from this file.

This module provides access to operating system functionality that is
standardized by the C Standard and the POSIX standard (a thinly
disguised Unix interface).

ef random(self):
“””Get the next random number in the range [0.0, 1.0).”””
return (int.from_bytes(_urandom(7), ‘big’) >> 3) * RECIP_BPF

os/_init.py

def urandom(n):
"""Return a string of n random bytes suitable for cryptographic use.

:type n: int
:rtype: bytes
"""
return b''

Then this goes to builtins.py

class int(object):
"""
int([x]) -> integer
int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments
are given. If x is a number, return x.__int__(). For floating point
numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in the
given base. The literal can be preceded by '+' or '-' and be surrounded
by whitespace. The base defaults to 10. Valid bases are 0 and 2-36.
Base 0 means to interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4
"""

Then this pass a cls into from_bytes


def from_bytes(cls, *args, **kwargs): # real signature unknown
"""
Return the integer represented by the given array of bytes.

bytes
Holds the array of bytes to convert. The argument must either
support the buffer protocol or be an iterable object producing bytes.
Bytes and bytearray are examples of built-in objects that support the
buffer protocol.
byteorder
The byte order used to represent the integer. If byteorder is 'big',
the most significant byte is at the beginning of the byte array. If
byteorder is 'little', the most significant byte is at the end of the
byte array. To request the native byte order of the host system, use
`sys.byteorder' as the byte order value.
signed
Indicates whether two's complement is used to represent the integer.
"""
pass

Then off to posix.py

def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
return (int.from_bytes(_urandom(7), 'big') >> 3) * RECIP_BPF

Python processing of the random bytes

This gets 7 bytes from urandom, choose the first byte, shift 3 bits to the right

b’\x90/\x05\xf5hu\x11'

I am not sure why getting ‘Big’ Significant Bit is important, so if you know, let me know. Also, not sure why we shift 3 bits to the right.

Then we multiple by RECIP_BPF

BPF = 53        # Number of bits in a float
RECIP_BPF = 2**-BPF

2^-53

https://docs.python.org/2.5/ref/power.html

Finally, you will get your random number between 0 and 1!

random.random()
>>0.08436837268585984

There are a lot of questions that I would like to explore at some time.

How do Hardware Random Generators Work like OneRNG?

How does random.c handle the Entropy Pool?

What happens when you run out of random bytes while running a statistical experiment that runs let’s say 1,000,000 trials in a short amount of time?

There are more processing of the random bytes collected from the Entropy Pool such as the Mersenne Twister that Python uses? How does this work?

--

--

Produxt Manager
The Startup

What I write here is not my teaching, but my study; it is not a lesson for others, but for me. — Montaigne