Round 5: Hashing
A hash function is a function that takes some data as input and produces a fixed-size hash value for the data. Depending on the application area, the hash values can also be called “hash codes” or “digests”.

In general, hash functions are used in
hash tables for implementing set and map (dictionary) data structures,
cryptography,
file integrity verification in Git and elsewhere,
and so on.
In this round, we will focus on the first area and show how hash tables can be used to implement set and map data structures. The main difference to the set and map data structures implemented with balanced binary search trees (recall Section Round 4: Binary Search Trees) is that the order of the keys is not preserved in hash table based implementations. In many applications this does not matter because sets and maps with only
insert
,
search
, and
remove
operations are enough,
and one does not need efficient implementations
for min
and successor
, for instance.
With hash tables we can perform the insert
, search
, and remove
operations
in
In the C++11 standard library, unordered_set and unordered_map are usually implemented with hash tables.
In Java, java.util.HashSet and java.util.HashMap are implemented with hash tables.
In Scala, the mutable HashSet and HashMap classes use hash tables.
To get a feeling how the performance of hashing based sets, or “hash sets”, relates to “tree sets” based on balanced binary search trees, the figure below shows the running times of inserting up to two million random 32-bit integers into tree and hash sets. As we can see, hash sets perform better when we do not need to maintain the order between the keys.

As a second performance comparison between tree and hash sets, the plot below shows the running times of performing up to one million successful queries in a set consisting of roughly 91 thousand Finnish words (from kotus sanalista). Again hash sets perform better than tree sets.

Contents:
Material in the book Introduction to Algorithms (Aalto access):
Sections 11.1–11.4
Similar material elsewhere:
Section 3.4 in Algorithms, 4th ed. (quadratic probing etc are not covered)
Some other external links: