Files in the top-level directory in any check-in

parallel
test1
dists.d
Makefile
prng-obsolete.d
prng.d
prng2.d
readme.md
test-obsolete.d
test.d
test1.d
test2-obsolete.d
test2.d
test3.d
test4.d

prng

This is a D port of the file MRG31k3p.java from project SSJ. All of the code that was ported was originally written in Java.

The original code can be found at https://github.com/umontreal-simul/ssj/blob/master/src/main/java/umontreal/ssj/rng/MRG31k3p.java.

Where noted, some supporting functions were taken from other parts of the SSJ project and ported to D as well.

Author: Lance Bachmeier Date: May 1, 2017

The license is Apache 2.0, which is the same as the original code.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Useless technical details

This started out with struct Stream in prng.d. That didn't work (I don't recall the problem, but passing by reference solved whatever it was). prng.d is now obsolete. prng2.d is what you should be using.

The Stream class holds the state of an individual generator. The state is defined by the six integers x11, x12, x13, x21, x22, and x23. The initial values come from the StreamGen struct that creates the Stream. There are also a bunch of constant terms, which I've made into enums since they'll never change.

When creating a multidimensional array in Java, the row index comes first. m[2][6] refers to the 7th element of the third row. We can do the same thing in D.

The class has a single method, gen, which returns the next U(0,1) draw and updates the state.

Basic usage from a D program

Save prng.d in your working directory. Create the file test1.d with this content:

import prng.gen;
import std.stdio;

void main() {
	// sg will create streams
	StreamGen sg;
	
	// Use sg to create two quality parallel streams
	Stream s1 = sg.createStream();
	Stream s2 = sg.createStream();
	
	// Take draws from the two streams
	foreach(_; 0..5) {
		writeln(s1.gen());
	}
	writeln("---");
	foreach(_; 0..5) {
		writeln(s2.gen());
	}
}

Compile, using the appropriate line for whichever of the three compilers you're using.

dmd test1.d prng.d
ldmd2 test1.d prng.d
gdc test1.d prng.d

Running the resulting executable, you can see the output. Let's verify that the parallel streams really are generated independently of one another. We'll output from the streams in alternating fashion.

Create the file test1a.d with this content:

import prng.gen;
import std.stdio;

void main() {
	// sg will create streams
	StreamGen sg;
	
	// Use sg to create two quality parallel streams
	Stream s1 = sg.createStream();
	Stream s2 = sg.createStream();
	
	// Take draws from the two streams
	foreach(_; 0..5) {
		writeln(s1.gen());
		writeln(s2.gen());
		writeln("---");
	}
}

Compile test1a.d and run it. You'll see the exact same numbers, but in a different order. That's the point of parallel random number generation. You can use one core for each stream simulataneously.

Generating random numbers in parallel

For "embarrassingly parallel" programs, which we often have in economics, D makes parallelizing your program nearly trivial.

Not embarrassingly parallel

An example of a problem that's not embarrassingly parallel is generating a single very long AR(1) series according to the model $$y_{t} = 0.9 y_{t-1} + varepsilon_{t}.$$ Suppose you want to generate one million observations. You couldn't do this by having each of two cores generate 500,000 observations and then stacking them. That wouldn't work, because the end of the first vector (observation 500,000 of the series) would partially determine the start of the second vector (observation 500,001 of the series). That means you'd either have to generate the entire first vector before starting on the second vector, or you'd have to make some kind of modification. The outcome of actions on one of the cores depends on the outcome on another core.

Embarrassingly parallel

What is an example of a problem that is embarrassingly parallel? Suppose that rather than generating one vector of one million observations, we wanted to generate 1000 vectors of one million observations. That can be trivially done in parallel. You can simultaneously generate 500 vectors on each of the two cores. If the individual samples you're generating are not independent of the outcome of all the other samples, you're doing something wrong.

An example of this would be using the bootstrap for inference. D makes it simple to implement a parallel version of a bootstrap.

std.parallelism

D's standard library includes std.parallel. To print out the total number of CPUs available on your computer, use the function totalCPUs:

import std.parallelism, std.stdio;

void main() {
	writeln(totalCPUs);
}

On my work machine, that returns 8 (four cores plus hyperthreading). Let's say we want to divide up the work of summing the values of an array across two of the CPUs. We'll create an array with two elements, each of which itself holds an array. The compiler will use two CPUs and pass one of the arrays to each CPU for summation.

import std.parallelism, std.stdio;

void main() {
	double[] arr1 = [1.0, 2, 3, 4, 5];
	double[] arr2 = [6.0, 7, 8, 9, 10];
	foreach(arr; parallel([arr1, arr2])) {
		double result = 0.0;
		foreach(val; arr) {
			result += val;
		}
		writeln("Sum of elements: ", result);
	}
}

With that out of the way, let's redo the example above, but we need to pass the RNG stream to each core to use to generate the sequence of random numbers.

import prng.gen;
import std.parallelism, std.stdio;

void main() {
	// sg will create streams
	StreamGen sg;
	
	// Use sg to create two quality parallel streams
	Stream s1 = sg.createStream();
	Stream s2 = sg.createStream();
	
	// Hold the two arrays of numbers
	double[][2] numbers;
	
	// Take draws from the two streams
	foreach(stream; parallel([s1, s2])) {
		double[] result;
		foreach(_; 0..5) {
			result ~= stream.gen();
		}
		numbers ~= result;
	}
	writeln(numbers);
}

You'll notice that the two arrays of generated numbers are the same as above. Due to the simple nature of this program, you'll probably get the same array of numbers listed first every time you run the program. In general, however, that is not the case. There's no guarantee of the order in which the individual cores will complete their work. That's why we wrote the program in such a way that it doesn't make any difference. You could make the program more sophisticated, by passing the id of the stream with the stream and then sorting. For the most part, for the problems we solve in economics, you're doing something wrong if the order of the output matters.

Unnamed Fossil Project