„Hash tables” are one type of data structure, they are good at storing data in sets made up of „keys” and „values”. As an example let’s imagine the data in the next table storead as an array:
We prepare few boxes for the array and stored data in them. Next let’s imagine looking up Stefan’s age – we don’t know what number Stefan’s box is in the array, thus, we need to search in order from the beginning. This operation is called a „linear search”.
1. The key’s stored in box 0 is Dan, which isn’t Stefan.
2. Box 1 isn’t Stefan either.
3. The key stored in box 2 matched Stefan.
By extracting the corresponding value, we learn that Stefan’s age is 25. In this way, the linear search operation caries a cost proportional the size of the data, it takes time to search through data when it’s stored in array – hash table solves this problem.
We will use data structure hash table and will add a new value to in it. Suppose we want to add Grigore’s age.
1. First step we will calculate a hash value for the key using the hash function ( a more detailed explanation of the „hash function” can be found elsewhere in this blog )
2. The result obtained in this case is 37
3. The has value is divided by the number of boxes in the hash table, 16, to find the remainder, the operation that fins the remainder of a division is called the „mod” operation.
4. The mod operation resulted in a value of 5. The data of „Grigore” is stored in box 5 of the array to match the result.
The previous operation will be repeated to store the other data. When storing the data of „Dan1”, we find the key’s hash value, and perform a mod operation on it using the number of boxes in the array, 16, the result was 11.
When storing data of Andreas, we find the key’s hash value, and perform a mod operation on it using the number of boxes in the array, 16. The result was 2, but when we try to store the data of „Andreas” in the box 2 of the array, the data of „Stefan” was already stored there. When something like this happens, boxes are connected to the existing data as list. There are several types of hash table structures, but the method that uses lists is called the „chain method”.
When storing the data of „Emma” we find the key’s hash value, and perform mod operation on it using the number of boxes in array, 16. The result was 2, because of the „Stefan” and „Andreas” data is in box 2 of the array, the data „Emma” will be connected as a list.
All data is finished being stored, and the hash table is complete. Let’s imagine looking up Grigore’s age.
In order to find out what box of the array Grigore is in, we find the key’s hash value, and perform a mod operation on it using the number of boxes in array, 16. The result was 5. The key for the data stored in box 5 of the array matched „Grigore”, by extracting its corresponding value, we learn that Grigore’s age is 26.
Let’s see what happens when looking up Emma’s age.
In order to find out what box of the array Emma is in, we find the key’s hash value, and perform mod operation on it using the number of boxes as in the array, 16. The result was 2. The key for data stored in box 2 of the array was „Stefan”, no „Emma” – so linear search is performed on the list starting with the data „Stefan”:
Data with the key „Andreas” was found which is not „Emma” – so linear search will continue:
On next iteration of linear search, data with the key „Emma” was found, by extracting its corresponding value, we learn that Emma’s age is 27.
As we can see, by using the hash function, hash tables are able to quickly access data within an array. When hash values overlap, lists are used, making it possible to flexibly handle an uncertain amount of data. When the size of an array used by hash table is too small, the overlaps increase, making linear searches more prevalent. Conversely, if the array size is too large, there will be many boxes with no data stored in them, wasting memory, so discretion is needed.
Hash tables, with their flexible data storage and fast lookup, are used in the associative arrays of programming languages.