Parallel for-loops¶

The parallel for-loop is a central construction in parallel frameworks such as Scala parallel collection and OpenMP. In pseudo-code, we can express a parallel for-loop as

parallel for \( i \) = 1 to \( n \):
  \( f(i) \)
 

It simply computes the function \(f(i)\) for each \(i = 1,2,....,n\). The computations are done in parallel in tasks. The computations can start and end in an arbitrary order. In an idealized setting, each \(f(i)\) is executed in its own task and there are enough threads to execute all of them at the same time. Thus the ideal work and span of parallel for-loops are:

The work is the sum of the works of \(f(1),....,f(n)\).
The span is the maximum of the spans of \(f(1),....,f(n)\).

The time needed to create the parallel tasks computing the values \(f(i)\) is usually ignored in the analysis. The idealized computation DAG for a parallel for loop is

In a real setting, when there are more tasks than available threads, a scheduler will execute the tasks in some order in the available threads. As an example, one possible computation DAG for executing the 4 tasks of a parallel loop with two threads is

In addition, a parallel framework, such as Scala parallel collection, may group several loop-body computations in a single tasks to reduce task creation overhead.

To use parallel collection methods, such as for-loops, in Scala, one can use the scala.collection.parallel classes. For instance, the plain parallel for-loop construction can be achieved easily with ParRange:

(0 until n).par.foreach(i => f(i))

This applies the function \(f(i)\) in parallel to each \(i = 0,...,n-1\). Again, the applications are done in parallel and get finished in an arbitrary order:

scala> (0 until 20).par.foreach(i => print(s"$i "))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
scala> (0 until 20).par.foreach(i => print(s"$i "))
0 15 1 2 3 10 11 12 13 14 17 18 19 16 4 5 6 7 8 9 

Internally, the scala.collection.parallel uses the Java fork-join framework to implement the parallel operations.

As shown above, the bodies of a parallel for-loop are executed in parallel and in an arbitrary order. To ensure determinism (that the same end result is always obtained), the executions should be independent:

if one execution modifies some mutable structure (variables, array elements, mutable lists etc), then the other executions should not read or write the same structure.

In the previous example, the loop bodies are not independent because they write to the same console.