\(\)
\(%\newcommand{\CLRS}{\href{https://mitpress.mit.edu/books/introduction-algorithms}{Cormen et al}}\)
\(\newcommand{\CLRS}{\href{https://mitpress.mit.edu/books/introduction-algorithms}{Introduction to Algorithms, 3rd ed.} (\href{http://libproxy.aalto.fi/login?url=http://site.ebrary.com/lib/aalto/Doc?id=10397652}{online via Aalto lib})}\)
\(\newcommand{\SW}{\href{http://algs4.cs.princeton.edu/home/}{Algorithms, 4th ed.}}\)
\(%\newcommand{\SW}{\href{http://algs4.cs.princeton.edu/home/}{Sedgewick and Wayne}}\)
\(\)
\(\newcommand{\Java}{\href{http://java.com/en/}{Java}}\)
\(\newcommand{\Python}{\href{https://www.python.org/}{Python}}\)
\(\newcommand{\CPP}{\href{http://www.cplusplus.com/}{C++}}\)
\(\newcommand{\ST}[1]{{\Blue{\textsf{#1}}}}\)
\(\newcommand{\PseudoCode}[1]{{\color{blue}\textsf{#1}}}\)
\(\)
\(%\newcommand{\subheading}[1]{\textbf{\large\color{aaltodgreen}#1}}\)
\(\newcommand{\subheading}[1]{\large{\usebeamercolor[fg]{frametitle} #1}}\)
\(\)
\(\newcommand{\Blue}[1]{{\color{flagblue}#1}}\)
\(\newcommand{\Red}[1]{{\color{aaltored}#1}}\)
\(\newcommand{\Emph}[1]{\emph{\color{flagblue}#1}}\)
\(\)
\(\newcommand{\Engl}[1]{({\em engl.}\ #1)}\)
\(\)
\(\newcommand{\Pointer}{\raisebox{-1ex}{\huge\ding{43}}\ }\)
\(\)
\(\newcommand{\Set}[1]{\{#1\}}\)
\(\newcommand{\Setdef}[2]{\{{#1}\mid{#2}\}}\)
\(\newcommand{\PSet}[1]{\mathcal{P}(#1)}\)
\(\newcommand{\Card}[1]{{\vert{#1}\vert}}\)
\(\newcommand{\Tuple}[1]{(#1)}\)
\(\newcommand{\Implies}{\Rightarrow}\)
\(\newcommand{\Reals}{\mathbb{R}}\)
\(\newcommand{\Seq}[1]{(#1)}\)
\(\newcommand{\Arr}[1]{[#1]}\)
\(\newcommand{\Floor}[1]{{\lfloor{#1}\rfloor}}\)
\(\newcommand{\Ceil}[1]{{\lceil{#1}\rceil}}\)
\(\newcommand{\Path}[1]{(#1)}\)
\(\)
\(%\newcommand{\Lg}{\lg}\)
\(\newcommand{\Lg}{\log_2}\)
\(\)
\(\newcommand{\BigOh}{O}\)
\(%\newcommand{\BigOh}{\mathcal{O}}\)
\(\newcommand{\Oh}[1]{\BigOh(#1)}\)
\(\)
\(\newcommand{\todo}[1]{\Red{\textbf{TO DO: #1}}}\)
\(\)
\(\newcommand{\NULL}{\textsf{null}}\)
\(\)
\(\newcommand{\Insert}{\ensuremath{\textsc{insert}}}\)
\(\newcommand{\Search}{\ensuremath{\textsc{search}}}\)
\(\newcommand{\Delete}{\ensuremath{\textsc{delete}}}\)
\(\newcommand{\Remove}{\ensuremath{\textsc{remove}}}\)
\(\)
\(\newcommand{\Parent}[1]{\mathop{parent}(#1)}\)
\(\)
\(\newcommand{\ALengthOf}[1]{{#1}.\textit{length}}\)
\(\)
\(\newcommand{\TRootOf}[1]{{#1}.\textit{root}}\)
\(\newcommand{\TLChildOf}[1]{{#1}.\textit{leftChild}}\)
\(\newcommand{\TRChildOf}[1]{{#1}.\textit{rightChild}}\)
\(\newcommand{\TNode}{x}\)
\(\newcommand{\TNodeI}{y}\)
\(\newcommand{\TKeyOf}[1]{{#1}.\textit{key}}\)
\(\)
\(\newcommand{\PEnqueue}[2]{{#1}.\textsf{enqueue}(#2)}\)
\(\newcommand{\PDequeue}[1]{{#1}.\textsf{dequeue}()}\)
\(\)
\(\newcommand{\Def}{\mathrel{:=}}\)
\(\)
\(\newcommand{\Eq}{\mathrel{=}}\)
\(\newcommand{\Asgn}{\mathrel{\leftarrow}}\)
\(%\newcommand{\Asgn}{\mathrel{:=}}\)
\(\)
\(%\)
\(% Heaps\)
\(%\)
\(\newcommand{\Downheap}{\textsc{downheap}}\)
\(\newcommand{\Upheap}{\textsc{upheap}}\)
\(\newcommand{\Makeheap}{\textsc{makeheap}}\)
\(\)
\(%\)
\(% Dynamic sets\)
\(%\)
\(\newcommand{\SInsert}[1]{\textsc{insert}(#1)}\)
\(\newcommand{\SSearch}[1]{\textsc{search}(#1)}\)
\(\newcommand{\SDelete}[1]{\textsc{delete}(#1)}\)
\(\newcommand{\SMin}{\textsc{min}()}\)
\(\newcommand{\SMax}{\textsc{max}()}\)
\(\newcommand{\SPredecessor}[1]{\textsc{predecessor}(#1)}\)
\(\newcommand{\SSuccessor}[1]{\textsc{successor}(#1)}\)
\(\)
\(%\)
\(% Union-find\)
\(%\)
\(\newcommand{\UFMS}[1]{\textsc{make-set}(#1)}\)
\(\newcommand{\UFFS}[1]{\textsc{find-set}(#1)}\)
\(\newcommand{\UFCompress}[1]{\textsc{find-and-compress}(#1)}\)
\(\newcommand{\UFUnion}[2]{\textsc{union}(#1,#2)}\)
\(\)
\(\)
\(%\)
\(% Graphs\)
\(%\)
\(\newcommand{\Verts}{V}\)
\(\newcommand{\Vtx}{v}\)
\(\newcommand{\VtxA}{v_1}\)
\(\newcommand{\VtxB}{v_2}\)
\(\newcommand{\VertsA}{V_\textup{A}}\)
\(\newcommand{\VertsB}{V_\textup{B}}\)
\(\newcommand{\Edges}{E}\)
\(\newcommand{\Edge}{e}\)
\(\newcommand{\NofV}{\Card{V}}\)
\(\newcommand{\NofE}{\Card{E}}\)
\(\newcommand{\Graph}{G}\)
\(\)
\(\newcommand{\SCC}{C}\)
\(\newcommand{\GraphSCC}{G^\text{SCC}}\)
\(\newcommand{\VertsSCC}{V^\text{SCC}}\)
\(\newcommand{\EdgesSCC}{E^\text{SCC}}\)
\(\)
\(\newcommand{\GraphT}{G^\text{T}}\)
\(%\newcommand{\VertsT}{V^\textup{T}}\)
\(\newcommand{\EdgesT}{E^\text{T}}\)
\(\)
\(%\)
\(% NP-completeness etc\)
\(%\)
\(\newcommand{\Poly}{\textbf{P}}\)
\(\newcommand{\NP}{\textbf{NP}}\)
\(\newcommand{\PSPACE}{\textbf{PSPACE}}\)
\(\newcommand{\EXPTIME}{\textbf{EXPTIME}}\)
Image filtering with convolution
In image processing,
one can filter an image by applying a kernel, also called a mask, to it with convolution.
Formally, a kernel \( K \) is a \( k \times l \) real matrix (where \( k,l > 0 \) are odd).
Applying the kernel to an \( n \times m \) image \( G \) results in the image \( H \)
defined as follows
$$H(x,y) = \sum_{i=1}^k\sum_{j=1}^l K(i,j) G(x-\Floor{k/2}+i,y-\Floor{l / 2}+j)$$
for all \( \Floor{k/2} \le x < n-\Floor{k/2} \) and
\( \Floor{l/2} \le y < m-\Floor{l/2} \).
That is, the filtered value of each pixel is the weighted sum of
its neighbors.
The borders are handled in some manner, for instance, set to zero.
For instance, when we apply the \( 7 \times 7 \) Gaussian blur kernel (with the standard deviation \( \sigma=2.0 \))
$$K \approx \left(\begin{smallmatrix}0.005&0.009&0.013&0.015&0.013&0.009&0.005\\0.009&0.017&0.025&0.028&0.025&0.017&0.009\\0.013&0.025&0.036&0.041&0.036&0.025&0.013\\0.015&0.028&0.041&0.047&0.041&0.028&0.015\\0.013&0.025&0.036&0.041&0.036&0.025&0.013\\0.009&0.017&0.025&0.028&0.025&0.017&0.009\\0.005&0.009&0.013&0.015&0.013&0.009&0.005\end{smallmatrix}\right)$$
to the image on the left below,
the result is the image on the right (only the upper left corner of the \( 4032 \times 3024 \) image is shown).
Example: Gaussian blur filtering with a 7×7 kernel (σ = 2.0)
|
|
The straightforward sequential implementation of the approach has
the running time (work and span) of \( \Oh{nmkl} \)
because \( \Oh{kl} \) operations are needed to compute the value of each \( n m \) pixels
(assuming that \( k \) and \( l \) are small compared to \( n \) and \( m \)).
Applying a kernel to an image is an embarrassingly parallel problem
as the value
$$H(x,y) = \sum_{i=1}^k\sum_{j=1}^l K(i,j) G(x-\Floor{k/2}+i,y-\Floor{l / 2}+j)$$
of each result image pixel can be computed independently and thus concurrently.
A parallel version that computes the value of each result pixel
in parallel will have the following characteristics:
Work is \( \Oh{nmkl} \) because \( \Oh{kl} \) operations are needed to compute the value of each \( n m \) pixels.
Span is \( \Oh{kl} \) as this is the running time of computing the value of a single output pixel and the pixels are computed in parallel in the ideal case.
The amount of parallelism is thus \( \Oh{nm} \).
Let’s now compare the performance of a straightforward Scala implementation
of the sequential version against the parallel version
that uses a parallel for-loop to parallelize the individual pixel computations.
Running times for a Gaussian \( 7 \times 7 \) kernel on a \( 4032 \times 3024 \) image,
with a quad-core desktop machine,
are as follows:
Sequential: 26.72s
Parallel: 7.01s
Speedup = 3.8
Of course,
easy parallelization is no excuse for not trying to improve an algorithm.
In the case of applying Gaussian blur kernel to images,
further speedup can be obtained by noticing that
Gaussian blur is a separable filter.
For instance, the \( 7 \times 7 \) kernel
$$\left(\begin{smallmatrix}0.005&0.009&0.013&0.015&0.013&0.009&0.005\\0.009&0.017&0.025&0.028&0.025&0.017&0.009\\0.013&0.025&0.036&0.041&0.036&0.025&0.013\\0.015&0.028&0.041&0.047&0.041&0.028&0.015\\0.013&0.025&0.036&0.041&0.036&0.025&0.013\\0.009&0.017&0.025&0.028&0.025&0.017&0.009\\0.005&0.009&0.013&0.015&0.013&0.009&0.005\end{smallmatrix}\right) \approx v^T v$$
where
\( v \approx \left(\begin{smallmatrix}0.070&0.131&0.191&0.216&0.191&0.131&0.070\end{smallmatrix}\right) \).
In general, for separable \( k \times l \) filters, one can
apply an \( 1 \times l \) kernel to the original image, and then
apply a \( k \times 1 \) kernel to the intermediate result.
That is, one first performs filtering horizontally on individual rows and
then filtering vertically on individual columns (or vice versa).
The sequential version has the work and span \( \Oh{nmk+nml} \).
Again, both phases can be parallelized,
resulting in an algorithm having
the work \( \Oh{nmk+nml} \), span \( \Oh{k+l} \), and the amount of parallelism \( \Oh{nm} \).
The running times for applying a Gaussian \( 7 \times 7 \) kernel on a \( 4032 \times 3024 \) image, with a quad-core machine, are now as follows:
Performance of a simple Scala implementation: applying a 7×7 Gaussian blur kernel on a 4032×3024 image in a quad-core machine
Sequential |
Sequential
separable |
Parallel |
Parallel
separable |
26.72s |
4.30s |
7.01s |
1.50s |