Dr. Dobb's Digest May 2009

LibMTPRNG: A Multithreaded Pseudo Random Number Generator

A library that uses threads to generate pseudo-random values

By Matthew Davis and Sameer Niphadkar

The authors can be contacted at mdavig@gmu.edu and sniphadk@gmu.edu, respectively.

Random number generation (RNG) is an important and desired property for any Monte Carlo simulation or cryptography application. The strength of such systems depend on the appearance of randomness amongst a desired set of generated numbers. Since the random numbers required are often in high orders of magnitude, computer-based systems for RNG are widely used. However, these systems often fall short of the goal in producing an apparent patternless stream of values, even though they still might meet some statistical tests for fitness. Such fitness tests are used to analyze the randomness of numbers to ensure that the values are void of any discernible patterns. The numerous applications of randomness have led to many different methods of generating random data. These methods vary regarding the fitness, or how unpredictable/statistically random their generated values appear, and how quickly the values can be created.

In Practical Random Number Generation in Software, John Viega demonstrated that there is a large gap between the theory and practice for random number generation and proposed various solutions to the real world pseudo-RNGs (PRNG). Here we reference all random number generators as being RNG and do not try to further qualify them into being pseudo versus non-pseudo random number generators.

In this article we present the idea of using threaded concurrent systems for seedless random number generation. The motivation comes from the notion that each thread executes independently of its concurrent brethren, spawned by a single parent process. These threads are responsible for flipping the bits of a block of memory which is then used as a random value.

Broadly speaking, there are two forms of random number generators:

Generators that rely on mathematical algorithms, such as linear congruential generators. They are often provided via a specific platform's implementation of the Standard C99 rand() routine.
Generators that derive their pseudo-random values from external or seemingly random sources, such as radio waves or motions of the mouse.

It is not uncommon for RNGs generated from a deterministic environment to be based on mathematical algorithms, which inherently limit the randomness of the system. Such algorithms can replicate a pattern over a period of time and quantity, which can eventually compromise the randomness of the system. The computational algorithms that produce long sequences of apparently random results are in fact completely determined by a shorter initial value, known as a seed or key.

Here we explore the idea of using threads as a seedless means of producing pseudo-random values in a deterministic system without relying on the properties of a mathematical algorithm. We can thus uniformly eliminate the issues associated with seeded RNGs. Besides we rely on an implementation which is similar to Standard C99 rand() routine, so all that needs to be done is replacing the standard function calls with our library calls. Finally, non-determinism by massively parallel thread executions is a concept unexplored till date. Though there are RNGs which use parallel threads per seed -- for selective seed policy the idea of using a thread lifecycle for generating a random number itself is new.

Concurrent Systems and Non-Determinism

In a concurrent system, each thread, by itself, is a lightweight process. Such threads are scheduled to execute once a processor is available. Multiple threads can be executed in parallel, on a multiprocessor system since they all are a part of the same process. While these threads are executing, any of them can be stopped and the order in which they are paused may be completely random for every turn. Since there is no priority among the threads, the operation of stopping and restarting any number of threads is carried in a non-deterministic manner. Thus, for each of the stop and restart operations, no particular thread can be prototyped to follow a pattern, unless the operating system scheduler has some influence. In other words, the programmer cannot make any assumptions about the order in which threads are scheduled.

Non-Determinism

Non-determinism is an aspect linked to the operating system process scheduler. Most of the modern operating systems use a hybrid between the deterministic and non-deterministic scheduling techniques. Real time, processor affinity or priority processes, require deterministic scheduling, whereas other processes or threads use a fair non-deterministic algorithm.

In such "fair algorithms" each process is given a time quantum for its execution. This is usually done to load balance a system effectively or achieve a target quality of service so that no one thread keeps the CPU to itself. When it comes to using threads, since every thread is scheduled to run in a process, the order of scheduling is not given much importance. Most of the schedulers use a random thread pickup, or a round-robin ordering, to execute threads. In fact in the case of some of the modern multicore systems, multiple threads can be executed in parallel on different cores of the system. This parallel execution results in the threads being executed and scheduled in a random manner. Such non-determinism depends on the processor architecture, libraries used, concurrent programming methodology, number of threads used to get an idea of the parallel operation at that instant.

The concept of concurrent programming and threading outlines the concept for our RNG. Our goal is to use the non-deterministic thread behavior to manipulate a block of memory. This block will then be converted to an integer and outputs as a single random value. We rely on the fact that since each thread uniquely represents a bit boolean value and since the threads are arbitrarily executing from our pool per operation, the random number generated must also be independent of the previous output.

Operating System Scheduling

The non-determinism of how a thread is accessed, when only a shared mutex amongst the threads prevents them from accessing a single shared resource, is a function dependant on the operating system's kernel scheduler. This scheduling of computational resources is the sole purpose of a process scheduler. Likewise, there are numerous scheduling algorithms that kernels can utilize to determine which process has time to execute operations on the system. This scheduling of process executions provides a kernel the ability to multi-task. Adding to the non-determinism potential of a threaded application, is user input. Consider a concurrent application running on the same system that a hypothetical threaded application is simultaneously running on. If any one of the applications is awaiting user input, the non-determinism of the system is further complicated. Depending on how "quick" the user is to respond, and what the user might have to do, the kernel must determine if other precesses are to be executed while awaiting user input. The scheduling framework is one of the primary driving factors that our threaded RNG relies on.

On a similar note, it might also be possible to extract patterns of the scheduler and determine the operating system that was using our tool to generate random values.

While this patterning would obviously not demonstrate successful pseudo-randomness of our concept, it could be interesting for anyone exploring finger-printing of operating systems.

Multithreaded Pseudo-Random Number Generator Library (MTPRNG)

To test the concept of generating pseudo-random values via a threaded implementation, we introduce MTPRNG (short for "Multi Threaded Pseudo Random Generator"), a static library that other applications can use in place of the common Standard C99 implementation. MTPRNG is available for download from the project homepage.

MTPRNG is designed as a static library that other applications can utilize in place of the Standard C99 rand() routine. The design of this library was to create a "base-case" for RNG via threads. In other words, efficiency was not a goal, rather we developed the library to implement our concept in a straightforward manner while trying to maintain some level of simplicity. This simplicity was necessary to assure that we implemented the threading properly, as thread development can become rather complicated. It was also necessary that our implementation is free of deadlock or other resource starvations, as such weaknesses would tarnish the validity of our study. While we cannot vouch that our system is free from these weaknesses, it has been run to completion without any locks, and the use of a mutex helps prevent resource mishandling.

Again, we implemented our RNG in a similar fashion to that of the Standard C99 specification. This similarity provides us with another RNG having similar parameters from which we could perform our analysis against. The following is how the ISO/SEC9899:TC3 C99 Standard Draft defines what the rand() routine should return:

int rand(void);
The rand function computes a sequence of pseudo-random integers in range 0 to RAND MAX.
The implementation shall behave as if no library function calls the rand unction.
The rand function returns a pseudo-random integer.
The value of the RAND MAX macro shall be at least 32767.

MTPRNG utilizes the non-determinism property of a threaded application. The pool of random data, from which the RNG value is picked, is a shared resource that all the threads have access to. This data is randomized based on the number of bits that are to be returned. The C99 specification states that the rand() routine is to return an integer. Therefore, our implementation is setup to return an integer size of bits. The number of values returned by the C99 rand() is that of a signed integer from 0 to RAND MAX. An integer in the C99 specification is said to have a maximum value of 2¹⁵- 1 or 32767. The actual size an architecture uses for an integer is not defined, rather it must hold at least the number of bits needed to represent the standard's range of values (-32767 to 32767). In the case of Linux, an integer is of size 32-bits.

MTPRNG uses threads to represent each bit of a 32-bit integer. This is horrendous performance-wise, because a bit has two values 0 or 1. Therefore, we have two threads for each bit, one thread has a value of '0' while another has a value of '1' for that same bit. Since we wanted to maintain simplicity in how we implemented the RNG, we have 64 threads to represent 32 bits. When a random value is requested, a mutex on a shared object is released and the first 32 threads that see this mutex available, enter the critical section where they can manipulate the shared critical resource. In certain cases, a value will be set and unset from a thread and thread composing its opposing value. In our smaller sized testing, it did not seem that the critical reference was getting mangled, in other words, when a value was stating that it got set, the collective result by all threads produced the expected value. Therefore, on an x86 architecture running a Linux kernel, our critical references seemed atomic in nature.

Mutual Exclusion

Mutual exclusion is another property that is often considered in a concurrent environment. The context of mutual exclusion is usually associated with access to a shared resource. Mutexes are utilized to prevent a mis-ordering of access between threads on a shared resource. When a shared resource is accessed, the mutex allows for a thread to block other threads so that the outcome of the thread's operation against that resource is predictable, and not confused by another thread trying to access that resource at the same time. Therefore, the access to such a resource is said to be mutually exclusive between threads. Aside progress and bounded waiting, mutual exclusion is also considered one of the three properties a concurrent application must utilize properly in ordered to be considered "correct" when considering critical sections within applications (see Modern Multithreading: Implementing, Testing, and Debugging Multithreaded Java and C++/Pthreads/Win32 Programs by Richard H. Carver and Kuo-Chung Tai).

MTPRNG utilizes a mutex to atomize access to the pool of random data that is to be manipulated. Therefore, when the lock is released a single thread can enter its critical section and inject its bit value into the proper location of the random data block of memory. Once that thread has updated the state of the random number generation, the lock is released and another thread can do the same thing. Just before the mutex is released, the thread increments a counter allowing the parent thread to know how many threads have manipulated the random data pool. Once enough bits have set, in our case one half of the total number of threads access the critical section, then the parent thread will set the bounded wait variable signaling that all threads are to stop manipulating the random pool. This mutex prevents all of the threads accessing the random data, or more than one thread manipulating the data at a time. While, the data is supposed to be pseudo-random, inherent from the scheduler, the reason for the mutual exclusion was to assure that only one thread at a time has atomic access to the critical reference (e.g., the random data pool). Such exclusion was implemented to assure that the threads are summoned in a non-deterministic order, one at a time, to manipulate the pool. This ordering of threads by the scheduler might provide insight as to what kernel was used, if the random data is not truly random, but consists of a pattern.

Bounded Waiting

The parent thread monitors the request for a random value. When a value is requested, it sets a value that all threads are concurrently looking at. This action is effectively a bounded wait that the threads are performing. When the value is set, the threads are then allowed to enter their critical section, one at a time, to manipulate the random data pool. After the pool is filled, the random value is returned to the application requesting it, and the pool is reset.

Progress

The mutex and bounded waiting concepts outlined in the previous two sections are set in place to assure that the RNG will return a value and not an unexpected infinite delay. The atomicity and count of number of accesses to the pool, per random number request, are to aid progress of the threads so that they do not starve or cause a stall (i.e. live or deadlock). It is important to reduce any kind of starvation potentials, as stalling for a random number is not acceptable. The user requests data and not an excessive or never-ending deadlock on the shared resource.

Study

Once RNGs are constructed and implemented, they are usually submitted to empirical statistical tests that try detecting statistical deficiencies by looking for evidence against the hypothesis. A test is defined by a statistic T function of a fixed set of numbers, whose distribution is known. As per P. L'Ecuyer's paper "Random Numbers" in the International Encyclopedia of Social and Behavioral Sciences, there is no universal test or battery of tests that can guarantee, when passed, that a given generator is fully reliable for all kinds of applications. However, passing many tests may improve one's confidence in the RNG.

When it comes to MTPRNG, our model is by far anywhere near perfect. However, such a model deserves studying, as its flaws might be correctable, which could result in a well-rounded RNG. Likewise, the utility of such a concept might stretch outside the bounds of merely providing values, as it could be suited as a scheduler profiling mechanism. It has been observed that the randomness of the numbers our implementation produces varies over the type of underlying kernel being used. Since the same compiler and the library were being used on two separate systems with different kernels, the kernels do seem to play an important role in characterizing the randomness. For instance, the Linux kernel, which characterizes a single thread per-process and uses a Completely Fair Scheduler, generated considerable random values over a number of samples compared to the Solaris kernel, which uses its own fair scheduler. Such results elude to the notion that the fingerprinting of the operating system and library (both kernels used POSIX thread generation) can play an important role in the RNG model.

Though it has still not being tested, the authors are confident that the processor architectures would play an important part in the model as well.

Methodology

We conducted some tests to compare the randomness of our RNG to a standard RNG implemented with a seed and using algorithmic computations. We used two of the "tuftests" to analyze the randomness (see Some Difficult-to-pass Tests of Randomness, by George Marsaglia and Wai Wan Tsang). These tests are simplified variations of the battery of tests provided by the common diehard suite. Therefore, for a general perspective of our implementation, there was no reason to run the diehard set. We compare a sample distribution from our RNG with the standard provided by a number of presumably good RNGs -- "presumably good" meaning that they produce results similar to that of a well tested and accepted RNG as a standard. We decided to take 100,000 samples from our RNG and compare it with the lagged-Fibonacci RNG using the gcd() and bday() tuftests as a benchmark.

Results

The results of our tests are as follows: A probability-value or "p-value" of 0.0 suggests that the values analyzed are completely random, as opposed to the lesser fit maximum range of 1.0. It was clear that the lesser the number of samples per implementation resulted in a p-value further away from 1.0. For a larger sample size, our RNG could not clear the gcd() tuftest but so is the case with the linear congruential generator.

Table 1: Tuftest gcd() Results

We then carried out a test for 100,000 samples, which are displayed in Table 2. These results suggest that, as the number of samples reached 100,000, our randomness indicator was on 1.0. For the kernel specific analysis, the smaller sample size indicated a discrepancy in the results thus potentially providing a value to fingerprint the operating system.

Table 2: Generator for 100,000 Samples

We can thus discreetly claim that the threads scheduled on different kernels result in varied patterns of random numbers generated. It may be possible that the kernel schedulers thus play a role in the thread synchronization operation. However, more research needs to be conducted with this method of fingerprinting before we can come to an appropriate conclusion.

Table 3: Tuftest gcd() Linux vs. Solaris

The second tuftest conducted was the bday() test. This test is an extension of the iterated spacings test. Marsaglia defines the birthday test bday() to treat each 32-bit number from the RNG as a birthday. The test generates 232 RNG samples (birthdays), sorts them, and then computes the differences between values. The result provides a purported Poisson value with a mean of 4. This process is repeated to get a sample of 5000 from the Poisson distribution. Then a goodness-of-fit test is applied to this sample and the result is inserted into a chi-square distribution function to get a p-value [7]. In our case the bday() test showed mixed results, the duplicates expected and observed, varied based on the number of samples. For more duplicates the values were closer together compared to the fewer ones. The detailed results can be obtained from our original study (see 1CS-706: Random Number Generation via Threading). The original study, due to complications and time constraints, conducts tests on a set of values from MTPRNG using less than 100,000 samples.

Mere observation of a set of MTPRNG generated random values appears less than random. While such findings are not desired, we attribute the non-randomness to the kernel scheduler. Such conclusion, while we cannot say with complete certainty, might be useful as a kernel profiling/fingerprinting tool.

Conclusion and Other Work

It is clear that our thread-based seedless RNG is not perfect as it is an indirect function of the kernel scheduler, library, and underlying computational architecture. Though our implementation does not scale well with more samples, we can plan to carry further investigations on the per-thread lifecycle. By further exploring concepts in the concurrent programming realm, one can consider performing reachability testing on our threads. Such testing allows each thread to exercise a sequence of events based on various race-conditions. The various states might well provide a way of adding more nondeterminism to our RNG. The coordination of the generated random values with the state flow chart, per-thread, could provide hints as to how to increase the RNG's potential randomness. With the different "reachable" states, the library could then analyze the results and cull certain threading states that might produce less random data. There are some RNGs which use parallel threads per seed where multiple threads access the same generator or different generators. Then we have JAPARA which provides a selection of random number generator algorithms that allow the independent generation of random number streams on different threads, so there is no need for synchronized methods for generating the random numbers. Perhaps the simplest approach to constructing the parallel random number generator would be to set the seed once when the first generator object is instantiated and then all subsequent generator objects that are instantiated in other threads would be automatically seeded to ensure that the sequences for each generator instance do not overlap. This is the approach used in the GNU Scientific Library Reference Manual and Substream package. But all these approaches differ to our work since they all fall in the family of parallel seed based RNGs.

In summary, we used concurrency and thread synchronization as a means to generate random numbers. Although the implementation may not have strong randomness, it provides an initial concept to build upon. There are numerous methodologies by which we can enhance the design pattern of our library model, such as supporting an increase in the threads or using a combination of other RNGs.