Numeric integrationΒΆ
In numeric integration, one aims to compute (an approximate) solution of a definite integral. One of the simplest method is to use the trapezoidal rule. With it a definite integral \(\int_a^b f(x) dx\) is approximated by computing the sum
where \(N > 0\) and \(x_k = a + \frac{k}{N}(b - a)\). That is, the interval \([a,b]\) is divided into \(N\) sub-intervals \([x_{k-1},x_k]\) and for each sub-interval the area \(\int_{x_{k-1}}^{x_k}f(x) dx\) is approximated with the formula \((x_k - x_{k-1})\frac{f(x_{k-1}) + f(x_k)}{2}\).
The example figure below shows the approximation of \(\int_{0.3}^{2.9} 2\sin(x) dx\) (the area shaded in blue) when \(N=4\): the approximating area of the sub-interval \([x_2,x_3]\) is shaded in red.
The larger the number \(N\) of sub-intervals is, the better is the approximation. Of course, if the method is applied in some safety-critical application, one should also take into account the precision of the numeric data type used in the implementation (32-bit floats, 64-bit doubles or something else).
Computing the sum is an embarrassingly parallel problem as each sub-interval area approximation \((x_k - x_{k-1})\frac{f(x_{k-1}) + f(x_k)}{2}\) can be computed independently in parallel. The Scala implementation methods below are as follows:
β
integrate
: the sequential reference version,β
integratePar1
: a straightforward but not so successful attempt to parallelize with.par
,map
, andsum
, andβ
integratePar2
: parallelization by dividing the sum index set \([1..N]\) into \(128 P\) sub-sets, where \(P\) is the number of available CPU cores, and then computing the sum for each sub-set in parallel, The amount \(128 P\) is heuristically selected instead of \(P\) so that if one thread gets interrupted because some other process is scheduled to run, the overall computation can proceed in other sub-sets. Observe that computing the final sum is not parallelized; with current multicore machines, \(P\) is small and the time of computing the sum of \(128 P\) values is negligible compared to the other computations.
// A helper to sum over a sub-set [start..end] of indices [1..n]
def integrateInterval(f: Double => Double,
a: Double, b: Double,
start: Int, end: Int,
n: Int): Double = {
require(0 <= start)
require(start < end)
require(end <= n)
val range = b - a
var xPrev = a + start * range / n
var fPrev = f(xPrev)
var sum: Double = 0.0
var i = start+1
while(i <= end) {
val xCurrent = a + i * range / n
var fCurrent = f(xCurrent)
sum += (xCurrent - xPrev) * (fPrev + fCurrent) / 2
xPrev = xCurrent
fPrev = fCurrent
i += 1
}
sum
}
def integrate(f: Double => Double, a: Double, b: Double, n: Int): Double = {
require(a <= b)
require(n > 0)
integrateInterval(f, a, b, 0, n, n)
}
def integratePar1(f: Double => Double, a: Double, b: Double, n: Int): Double = {
require(a <= b)
require(n > 0)
val range = b - a
(0 until n).par.map(i => {
val xCurr = a + i * range / n
val xNext = a + (i + 1) * range / n
(xNext - xCurr) * (f(xCurr) + f(xNext)) / 2
}).sum
}
def integratePar2(f: Double => Double, a: Double, b: Double, n: Int): Double = {
val P = Runtime.getRuntime.availableProcessors
val nofTasks = n min 128*P
(0 until nofTasks).par.map(t => {
val start = ((n.toLong * t) / nofTasks).toInt
val end = ((n.toLong * (t+1)) / nofTasks).toInt
integrateInterval(f, a, b, start, end, n)
}).sum
}
The plot below shows the running times of different methods when computing \(\int_0^1 \sqrt{1-x^2} dx\) for different values of \(N\). The results are obtained on a desktop machine with a quad-core Intel i5-2400 processor running at 3.1 GHz and Scala version 2.12.2.
The method integratePar1
is not included in the plot because it is
slower than the sequential version:
the map
method probably builds an intermediate array over which the final sum is then computed.
It also has to compute the \(x_k\) and \(f(x_k)\) values twice for each \(k\) while
the others are iterative and compute the values only once for most of the indices.
The parallel version integratePar2
obtains a near-linear speedup of roughly 3.7 when \(N = 5 000 000\).