Resizable arrays¶

Resizable arrays, also called dynamic arrays, dynamic tables, array lists and so on, are extensively used in many applications when the amount of data to be stored in the array is not known in advance or can change dynamically during the execution of the program. For instance, the Scala code below reads all the words in a text file into a resizable array (Scala ArrayBuffer) so that they can be later processed in memory in the same order they appear in the file.

import scala.io.Source

val words = collection.mutable.ArrayBuffer[String]()
val bufferedSource = Source.fromFile(fileName)
for(line <- bufferedSource.getLines)
  for(word <- line.trim.split("\\r+"))
    words += word
bufferedSource.close

Due to their importance in many applications, resizable arrays are implemented in many standard libraries:

ArrayBuffer in Scala
vector in the C++ standard library (NOTE: vector in C++ is very different to Vector in Scala!)
ArrayList in Java
list in Python

Their main benefit over arrays is that they allow amortized constant-time insertion and removal at the end. The benefit over mutable linked lists, that also allow constant time insertion at the end, is that constant-time random access to the elements is supported and much less memory allocations are performed. In general, memory allocations are not free but the runtime system must keep track of the free and allocated memory areas when objects are created and deleted (by garbage collection or by explicit deallocation). The plot below shows the performance of C++ vector (resizable array) and list (mutable linked list) when $ n $ elements are inserted at the end. These tests were run in a computer with Intel i5-2400 CPU running at 3.1GHz.

The idea in resizable arrays is as follows: allocate an array $ a $ with some extra space and maintain

the capacity $ c $, ie. the size, of the array $ a $ (in Java/Scala, this is already included in the low-level array but in C++, for instance, raw arrays do not carry any additional information), and
the size $ s $ of the resizable array, ie. the number of elements actually used in the underlying array $ a $.

When an element is inserted at the end of the resizable array, we do the following:

If $ s < c $, we simply write the element in the index $ s $ of the array $ a $ and increase $ s $ by one.
If $ s = c $, we
1. allocate a new array of $ \alpha c $ elements, where $ \alpha > 1 $ is an expansion factor constant (usually 1.5 or 2),
2. copy the $ s $ elements from the old array to the beginning of the new array, and
3. set the capacity to $ \alpha c $, insert the new element at index $ s $ in the new array and increase $ s $ by one.

Removing the last element is easy: the index $ s $ is decreased by one.

Example: Inserting an element into a full resizable array

Consider the resizable array of integers shown below and suppose that we wish to insert the element 21 at the end of it.

As the array was full, the insertion triggers expansion and results in the resizable array shown below.

The old array will be deallocated by the runtime system (such as Java virtual machine) in garbage-collected languages. In non-garbage collecting languages such as C++, the old array must be explicitly deallocated by the insertion operation code.

Example

The code below shows how appending elements in ArrayBuffer is implemented in the Scala version 2.11.8. The code is taken from the source code of ArrayBuffer and the source code of ResizableArray.

class ArrayBuffer[A](override protected val initialSize: Int)
  extends AbstractBuffer[A]
     with Buffer[A]
     with GenericTraversableTemplate[A, ArrayBuffer]
     with BufferLike[A, ArrayBuffer[A]]
     with IndexedSeqOptimized[A, ArrayBuffer[A]]
     with Builder[A, ArrayBuffer[A]]
     with ResizableArray[A]
     with CustomParallelizable[A, ParArray[A]]
     with Serializable
{
  // LINES REMOVED

  /** Appends a single element to this buffer and returns
   *  the identity of the buffer. It takes constant amortized time.
   *
   *  @param elem  the element to append.
   *  @return      the updated buffer.
   */
  def +=(elem: A): this.type = {
    ensureSize(size0 + 1)
    array(size0) = elem.asInstanceOf[AnyRef]
    size0 += 1
    this
  }

  // LINES REMOVED
}

trait ResizableArray[A] extends IndexedSeq[A]
                           with GenericTraversableTemplate[A, ResizableArray]
                           with IndexedSeqOptimized[A, ResizableArray[A]]
{
  override def companion: GenericCompanion[ResizableArray] = ResizableArray

  protected def initialSize: Int = 16
  protected var array: Array[AnyRef] = new Array[AnyRef](math.max(initialSize, 1))
  protected var size0: Int = 0

  // LINES REMOVED

  /** Ensure that the internal array has at least `n` cells. */
  protected def ensureSize(n: Int) {
    // Use a Long to prevent overflows
    val arrayLength: Long = array.length
    if (n > arrayLength) {
      var newSize: Long = arrayLength * 2
      while (n > newSize)
        newSize = newSize * 2
      // Clamp newSize to Int.MaxValue
      if (newSize > Int.MaxValue) newSize = Int.MaxValue

      val newArray: Array[AnyRef] = new Array(newSize.toInt)
      scala.compat.Platform.arraycopy(array, 0, newArray, 0, size0)
      array = newArray
    }
  }

  // LINES REMOVED

}

Resizing the underlying array multiplies its capacity with the constant $ \alpha > 1 $ and thus the new array sizes grow exponentially in the number of resize operations. However, in this way the resize operations also become more and more infrequent when the size of the array grows. And multiplying the size with a constant does not usually use too much extra memory either: if we only insert elements, then the array is always at least $ \frac{1}{\alpha} $-full. For instance, when $ \alpha = 2 $, we only “waste” at most the same amount of memory that we actually use for storing the elements.

Why do we multiply the size with a constant and not just add some non-neglible fixed number, say 1024, elements when resizing? The reason is that multiplying the size allows us to have amortized constant-time insertion of elements. That is, if we insert $ n $ elements in a resizable array, then

some insertions, ie. those on which the resizing occurs, are quite costly as memory allocation and copying must be done,
but the cumulative cost amortized to each insertion is constant.

Lets study the number of accesses done in each insertion (this is in linear correspondence to the running time). Suppose that (i) $ \alpha = 2 $, (ii) we have inserted $ n $ elements in an empty array, and (iii) that array resizing occurs when we insert the $ n+1 $th element. The number of extra array accesses when copying elements from an old array to the current new one in this and all the previous resizings is captured by the recurrence $$T(d) = 0 \text{ and } T(n) = n + T(n/2),$$ where $ d $ is the initial size of the array. Thus $$T(n) \le n + T(n/2) \le n + n/2 + n/4 + n/8 … \le 2n$$ Adding the 1 normal array access per insertion, we get the amortized number of array accesses per insertion: $ \frac{2n+n}{n} = 3 $.

Generally, for an arbitrary expansion factor $ \alpha > 1 $, the extra cost is at most $ n \sum_{i=0}^\infty \frac{1}{\alpha^i} = n \frac{1}{1-\frac{1}{\alpha}} = n \frac{\alpha}{\alpha-1} = \Oh{n} $ and thus the amortized cost per insertion is a constant as well. As an example, when $ \alpha=1.1 $, the amortized cost per insertion is at most $ 11+1=12 $ array accesses.

Example

For the expansion factor of 2, we can graphically illustrate the array accesses done when inserting $ 9 $ elements in a resizable array that initially has the capacity of $ 1 $ as follows:

The blue dots are regular accesses when writing the new element, red ones are copy accesses from the previous expansions and the pink ones are the ones of the latest expansion. If we collapse the red towers right and the pink tower left, we get

and see that each insert has at most 3 dots “amortized” to them.

Should one contract the resizable array when it only contains few elements compared to its capacity? Yes, we can do this and have amortized constant-time insert and remove operations. But the contraction triggering load factor, ie. the fraction $ \frac{s}{c} $ of entries actually used, should less than the multiplicative inverse of the expansion factor. That is, if we double the array capacity at some point and then immediately remove an element, we should not halve the array capacity immediately. If the expansion factor is 2.0 and contraction trigger is 0.25, we get amortized constant-time insert and remove operations, see Section 17.4 in Introduction to Algorithms, 3rd ed. (online via Aalto lib).