\(\) \(%\newcommand{\CLRS}{\href{https://mitpress.mit.edu/books/introduction-algorithms}{Cormen et al}}\) \(\newcommand{\CLRS}{\href{https://mitpress.mit.edu/books/introduction-algorithms}{Introduction to Algorithms, 3rd ed.} (\href{http://libproxy.aalto.fi/login?url=http://site.ebrary.com/lib/aalto/Doc?id=10397652}{online via Aalto lib})}\) \(\newcommand{\SW}{\href{http://algs4.cs.princeton.edu/home/}{Algorithms, 4th ed.}}\) \(%\newcommand{\SW}{\href{http://algs4.cs.princeton.edu/home/}{Sedgewick and Wayne}}\) \(\) \(\newcommand{\Java}{\href{http://java.com/en/}{Java}}\) \(\newcommand{\Python}{\href{https://www.python.org/}{Python}}\) \(\newcommand{\CPP}{\href{http://www.cplusplus.com/}{C++}}\) \(\newcommand{\ST}[1]{{\Blue{\textsf{#1}}}}\) \(\newcommand{\PseudoCode}[1]{{\color{blue}\textsf{#1}}}\) \(\) \(%\newcommand{\subheading}[1]{\textbf{\large\color{aaltodgreen}#1}}\) \(\newcommand{\subheading}[1]{\large{\usebeamercolor[fg]{frametitle} #1}}\) \(\) \(\newcommand{\Blue}[1]{{\color{flagblue}#1}}\) \(\newcommand{\Red}[1]{{\color{aaltored}#1}}\) \(\newcommand{\Emph}[1]{\emph{\color{flagblue}#1}}\) \(\) \(\newcommand{\Engl}[1]{({\em engl.}\ #1)}\) \(\) \(\newcommand{\Pointer}{\raisebox{-1ex}{\huge\ding{43}}\ }\) \(\) \(\newcommand{\Set}[1]{\{#1\}}\) \(\newcommand{\Setdef}[2]{\{{#1}\mid{#2}\}}\) \(\newcommand{\PSet}[1]{\mathcal{P}(#1)}\) \(\newcommand{\Card}[1]{{\vert{#1}\vert}}\) \(\newcommand{\Tuple}[1]{(#1)}\) \(\newcommand{\Implies}{\Rightarrow}\) \(\newcommand{\Reals}{\mathbb{R}}\) \(\newcommand{\Seq}[1]{(#1)}\) \(\newcommand{\Arr}[1]{[#1]}\) \(\newcommand{\Floor}[1]{{\lfloor{#1}\rfloor}}\) \(\newcommand{\Ceil}[1]{{\lceil{#1}\rceil}}\) \(\newcommand{\Path}[1]{(#1)}\) \(\) \(%\newcommand{\Lg}{\lg}\) \(\newcommand{\Lg}{\log_2}\) \(\) \(\newcommand{\BigOh}{O}\) \(%\newcommand{\BigOh}{\mathcal{O}}\) \(\newcommand{\Oh}[1]{\BigOh(#1)}\) \(\) \(\newcommand{\todo}[1]{\Red{\textbf{TO DO: #1}}}\) \(\) \(\newcommand{\NULL}{\textsf{null}}\) \(\) \(\newcommand{\Insert}{\ensuremath{\textsc{insert}}}\) \(\newcommand{\Search}{\ensuremath{\textsc{search}}}\) \(\newcommand{\Delete}{\ensuremath{\textsc{delete}}}\) \(\newcommand{\Remove}{\ensuremath{\textsc{remove}}}\) \(\) \(\newcommand{\Parent}[1]{\mathop{parent}(#1)}\) \(\) \(\newcommand{\ALengthOf}[1]{{#1}.\textit{length}}\) \(\) \(\newcommand{\TRootOf}[1]{{#1}.\textit{root}}\) \(\newcommand{\TLChildOf}[1]{{#1}.\textit{leftChild}}\) \(\newcommand{\TRChildOf}[1]{{#1}.\textit{rightChild}}\) \(\newcommand{\TNode}{x}\) \(\newcommand{\TNodeI}{y}\) \(\newcommand{\TKeyOf}[1]{{#1}.\textit{key}}\) \(\) \(\newcommand{\PEnqueue}[2]{{#1}.\textsf{enqueue}(#2)}\) \(\newcommand{\PDequeue}[1]{{#1}.\textsf{dequeue}()}\) \(\) \(\newcommand{\Def}{\mathrel{:=}}\) \(\) \(\newcommand{\Eq}{\mathrel{=}}\) \(\newcommand{\Asgn}{\mathrel{\leftarrow}}\) \(%\newcommand{\Asgn}{\mathrel{:=}}\) \(\) \(%\) \(% Heaps\) \(%\) \(\newcommand{\Downheap}{\textsc{downheap}}\) \(\newcommand{\Upheap}{\textsc{upheap}}\) \(\newcommand{\Makeheap}{\textsc{makeheap}}\) \(\) \(%\) \(% Dynamic sets\) \(%\) \(\newcommand{\SInsert}[1]{\textsc{insert}(#1)}\) \(\newcommand{\SSearch}[1]{\textsc{search}(#1)}\) \(\newcommand{\SDelete}[1]{\textsc{delete}(#1)}\) \(\newcommand{\SMin}{\textsc{min}()}\) \(\newcommand{\SMax}{\textsc{max}()}\) \(\newcommand{\SPredecessor}[1]{\textsc{predecessor}(#1)}\) \(\newcommand{\SSuccessor}[1]{\textsc{successor}(#1)}\) \(\) \(%\) \(% Union-find\) \(%\) \(\newcommand{\UFMS}[1]{\textsc{make-set}(#1)}\) \(\newcommand{\UFFS}[1]{\textsc{find-set}(#1)}\) \(\newcommand{\UFCompress}[1]{\textsc{find-and-compress}(#1)}\) \(\newcommand{\UFUnion}[2]{\textsc{union}(#1,#2)}\) \(\) \(\) \(%\) \(% Graphs\) \(%\) \(\newcommand{\Verts}{V}\) \(\newcommand{\Vtx}{v}\) \(\newcommand{\VtxA}{v_1}\) \(\newcommand{\VtxB}{v_2}\) \(\newcommand{\VertsA}{V_\textup{A}}\) \(\newcommand{\VertsB}{V_\textup{B}}\) \(\newcommand{\Edges}{E}\) \(\newcommand{\Edge}{e}\) \(\newcommand{\NofV}{\Card{V}}\) \(\newcommand{\NofE}{\Card{E}}\) \(\newcommand{\Graph}{G}\) \(\) \(\newcommand{\SCC}{C}\) \(\newcommand{\GraphSCC}{G^\text{SCC}}\) \(\newcommand{\VertsSCC}{V^\text{SCC}}\) \(\newcommand{\EdgesSCC}{E^\text{SCC}}\) \(\) \(\newcommand{\GraphT}{G^\text{T}}\) \(%\newcommand{\VertsT}{V^\textup{T}}\) \(\newcommand{\EdgesT}{E^\text{T}}\) \(\) \(%\) \(% NP-completeness etc\) \(%\) \(\newcommand{\Poly}{\textbf{P}}\) \(\newcommand{\NP}{\textbf{NP}}\) \(\newcommand{\PSPACE}{\textbf{PSPACE}}\) \(\newcommand{\EXPTIME}{\textbf{EXPTIME}}\)
\(\newcommand{\Price}[1]{p_{#1}}\) \(\newcommand{\Profit}[1]{\mathit{cut}(#1)}\) \(\) \(\newcommand{\LCSOf}[2]{\mathop{lcs}(#1,#2)}\) \(\newcommand{\LCSA}{X}\) \(\newcommand{\LCSAP}[1]{X_{#1}}\) \(\newcommand{\LCSB}{Y}\) \(\newcommand{\LCSBP}[1]{Y_{#1}}\) \(\newcommand{\LCSa}[1]{x_{#1}}\) \(\newcommand{\LCSb}[1]{y_{#1}}\) \(%\newcommand{\LCSO}{\mathit{Opt}}\) \(%\newcommand{\LCSo}[1]{\mathit{opt}_{#1}}\) \(\newcommand{\LCSO}{Z}\) \(\newcommand{\LCSo}[1]{z_{#1}}\) \(\) \(\newcommand{\GAct}[1]{a_{#1}}\) \(\newcommand{\GActs}{A}\) \(\newcommand{\GActss}[2]{A_{#1,#2}}\) \(\newcommand{\GSol}{A'}\) \(\newcommand{\GSols}[2]{A'_{#1,#2}}\) \(\newcommand{\GSolsi}[3]{A'_{#1,#2\text{ at }#3}}\) \(\newcommand{\GStart}[1]{s_{#1}}\) \(\newcommand{\GFinish}[1]{f_{#1}}\) \(\)

Greedy Algorithms

In the two last examples of dynamic programming, we had to consider many sub-problems to solve a problem. For instance, in the rodding cutting problem we had to consider all the possible ways of making the first cut. Similarly, in the LCS problem there were two potential ways for obtaining the optimal solution when the last symbols in the sequence were not the same.

However, sometimes it is possible to greedily select just one case and thus reduce the problem into one smaller sub-problem. We next consider one such simple greedy algorithm. In the graphs rounds, we’ll see other greedy algorithms when we study the minimum spanning tree algorithms.

Example IV: Activity-selection problem

Assume that we are given a finite set of activities \( \GActs = \Set{\GAct{1},…,\GAct{n}} \). Each activity \( \GAct{i} \) is associated with

  • ​ a start time \( \GStart{i} \), and

  • ​ a finish time \( \GFinish{i} \) with \( \GStart{i} < \GFinish{i} \).

An activity executes in the half-open interval \( [\GStart{i},\GFinish{i}) \). Two activities, \( \GAct{i} \) and \( \GAct{j} \) for \( i \neq j \), are said to be compatible if they do not execute at the same time, meaning that either \( \GFinish{i} \le \GStart{j} \) or \( \GFinish{j} \le \GStart{i} \). Our task is find a maximal compatible subset of \( \GActs \), i.e., a subset \( \GSol \subseteq \GActs \) of largest size such that all the activities in the set are mutually compatible.

Example

Consider the set of activities \( \GActs = \Set{\GAct{1},…,\GAct{11}} \) with $$\begin{array}{c|ccccccccccc} i & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline \GStart{i} & 1 & 3 & 0 & 5 & 3 & 5 & 6 & 8 & 8 & 2 & 12 \\ \GFinish{i} & 4 & 5 & 6 & 7 & 9 & 9 & 10 & 11 & 12 & 14 & 16 \end{array}$$ A maximal compatible subset is \( \Set{\GAct{2},\GAct{4},\GAct{9},\GAct{11}} \).

_images/dynprog-activities.png

A dynamic programming formulation

In order to relate dynamic programming and greedy algorithms, let us first see a dynamic programming approach for solving the activity selection problem.

Assume that the set of activities is \( \GActs = \Set{\GAct{1},…,\GAct{n}} \). For obtaining the recursive definition for optimal solutions, let $$\GActss{l}{u} = \Setdef{\GAct{i}\in\GActs}{l \le \GStart{i} \land \GFinish{i} \le u}$$ be the set of activities that start at \( l \) or later and finish at latest at \( u \). If we use \( L = \min_{i \in [1..n]}{\GStart{i}} \) and \( U = \max_{i \in [1..n]}{\GFinish{i}} \), then the original problem instance \( \GActs \) is \( \GActss{L}{U} \).

Each (sub-)problem on \( \GActss{l}{u} \) has the optimal sub-solution property:

  • ​ Take any activity \( \GAct{i} \in \GActss{l}{u} \)

  • ​ Let \( \GSols{l}{\GStart{i}} \) and \( \GSols{\GFinish{i}}{u} \) be any maximal compatible subsets for the sets \( \GActss{l}{\GStart{i}} \) of activities in \( \GActss{l}{u} \) ending before \( \GAct{i} \) and \( \GActss{\GFinish{i}}{u} \) of activities in \( \GActss{l}{u} \) starting after \( \GAct{i} \).

  • ​ Then \( \GSolsi{l}{u}{i} = \GSols{l}{\GStart{i}} \cup \Set{\GAct{i}} \cup \GSols{\GFinish{i}}{u} \) is a maximal compatible subset for \( \GActss{l}{u} \) that contains \( \GAct{i} \).

Thus a maximal compatible subset for \( \GActss{l}{u} \) is $$\GSols{l}{u} = \max_{\GAct{j} \in \GActss{l}{u}} \GSolsi{l}{u}{j}.$$ The solution for the original problem is \( \GSols{L}{U} \). We now have a recursive definition that can be turned into a dynamic programming algorithm. Note that the number of sub-problems \( \GActss{l}{u} \) one has to consider is at most \( (n+1) \times (n+1) \) because (i) the start index \( l \) of each \( \GActss{l}{u} \) is either \( L \) or an ending time of an activity, and (ii) the ending index \( u \) of each \( \GActss{l}{u} \) is either \( U \) or a starting time of an activity. Thus the dynamic programming approach can be made to work in polynomial time. On the contrary, the straighforward approach of

  • ​ enumerating all the subsets of \( \GActs \),

  • ​ checking the compatibility of each, and

  • ​ remembering a largest compatible one

would require at least exponential time as there are exponetially many subsets.

In Scala, we can implement the dynamic programming algorithm as follows. Note that the implementation is not very well optimized.

def activitySelectorDynprog(acts: Set[(Int,Int)]) = {
  require(acts.forall(a => a._1 >= 0 && a._1 < a._2))
  val mem = collection.mutable.Map[Set[(Int,Int)],Set[(Int,Int)]]()
  def inner(current: Set[(Int,Int)]): Set[(Int,Int)] = mem.getOrElseUpdate(current, {
    var best = Set[(Int,Int)]()
    for(act <- current) {
      val earlier = current.filter(a => a._2 <= act._1)
      val later = current.filter(a => act._2 <= a._1)
      val earlierResult = inner(earlier)
      val laterResult = inner(later)
      if(earlierResult.size + 1 + laterResult.size > best.size)
	best = (earlierResult ++ laterResult) + act
    }
    best
  })
  inner(acts)
}

Running our example with it, we get:

scala> val acts = Set((1,4),(3,5),(0,6),(5,7),(3,9),(5,9),(6,10),(8,11),(8,12),(2,14),(12,16))
acts: ....Set[(Int, Int)] = Set((3,9), (6,10), (5,9), (8,11), (3,5), (1,4), (12,16), (5,7), (2,14), (8,12), (0,6))

scala> activitySelectorDynprog(acts).toVector.sortBy(_._1)
res0: ....Vector[(Int, Int)] = Vector((3,5), (5,7), (8,11), (12,16))

The greedy approach

The activity selection problem has a property that we can use to obtain an even faster and simpler algorithm. Namely, we do not have to consider all the activities in a subset but can

  • ​ just select one that finishes as early as possible, and

  • ​ then, consider the sub-problem obtained by removing that activity and all the activities that are not compatible with it.

This optimization is based on the observation that if we have a maximal compatible subset that does not contain an activity \( \GAct{i} \) that finishes as early as possible, we can substitute one that is in the set with \( \GAct{i} \) and obtain a compatible subset of the same maximal size. This is formalized in the following theorem.

Theorem

Let \( \GSol \subseteq \GActs \) be a maximal subset of compatible activities, \( \GAct{i} \) an activity with the earliest finish time among those in \( \GActs \), and \( \GAct{j} \) be the activity with the earliest finish time among those in \( \GSol \). Then \( (\GSol \setminus \Set{\GAct{j}}) \cup \Set{\GAct{i}} \) is also a maximal subset of compatible activities.

Proof

If \( i = j \), we are done.

Otherwise, it must be that \( \GFinish{i} \le \GFinish{j} \). As \( \GAct{j} \) is the activity with the earliest finish time among those in \( \GSol \), \( \GSol \) does not contain any \( \GAct{k} \), \( k \neq j \), with \( \GStart{k} \le \GFinish{i} \). Therefore, it is safe to substitute \( \GAct{j} \) with \( \GAct{i} \) in \( \GSol \) and obtain a set of compatible activities that has the same maximal size.

Example

Consider again the activities \( \GActs = \Set{\GAct{1},…,\GAct{11}} \), where $$\begin{array}{c|ccccccccccc} i & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline \GStart{i} & 1 & 3 & 0 & 5 & 3 & 5 & 6 & 8 & 8 & 2 & 12 \\ \GFinish{i} & 4 & 5 & 6 & 7 & 9 & 9 & 10 & 11 & 12 & 14 & 16 \end{array}$$ One maximal compatible subset is \( \Set{\GAct{2},\GAct{4},\GAct{9},\GAct{11}} \). It does not contain the activity \( \GAct{1} \) with the earliest finish time among those in \( \GActs \). If we substitute the activity \( \GAct{2} \) having the earliest finish time among those in \( \GActs \) with the activity \( \GAct{1} \), the result is the maximal compatible subset \( \Set{\GAct{1},\GAct{4},\GAct{9},\GAct{11}} \).

Observe that \( \Set{\GAct{4},\GAct{9},\GAct{11}} \) is a maximal compatible subset of the “reduced” sub-problem \( \Set{\GAct{4},\GAct{6},\GAct{7},\GAct{8},\GAct{9},\GAct{11}} \) obtained by removing \( \GAct{1} \) and all the activities that are not compatible with it.

_images/dynprog-activities-2.png

In Scala, we can implement the greedy algorithm as follows:

def activitySelector(acts: Set[(Int,Int)]) = {
  require(acts.forall(a => a._1 >= 0 && a._1 < a._2))
  val sorted = acts.toVector.sortBy(_._2)
  val sol = collection.mutable.ArrayBuffer[(Int,Int)]()
  var time = Int.MinValue
  for(act <- sorted) {
    if(act._1 >= time) {
      sol += act
      time = act._2
    }
  }
  sol.toSet
}

Running our example with it, we get:

scala> val acts = Set((1,4),(3,5),(0,6),(5,7),(3,9),(5,9),(6,10),(8,11),(8,12),(2,14),(12,16))
acts: ....Set[(Int, Int)] = Set((3,9), (6,10), (5,9), (8,11), (3,5), (1,4), (12,16), (5,7), (2,14), (8,12), (0,6))

scala> activitySelector(acts).toVector.sortBy(_._1)
res0: ....Vector[(Int, Int)] = Vector((1,4), (5,7), (8,11), (12,16))