Hash Table

In this tutorial, you will learn what a hash table is. Additionally, you will discover working examples of hash table operations in C, C++, Java, and Python.

In this article, you will learn-

1 What is a hash table?
2 1. Direct Address Table
3 2. Hash Table
4 Advantages of the hash table over direct address table:
5 Collision resolution by chaining
- 5.1 Pseudocode for operations
6 Python, Java, C and C++ Implementation
7 Good Hash Functions
- 7.1 Division Method
  - 7.1.1 Multiplication Method
  - 7.1.2 Universal Hashing
8 Open Addressing
- 8.1 Linear Probing
9 Quadratic Probing
10 Double hashing
11 Hash Table Applications

What is a hash table?

A hash table is a type of data structure that stores key-values Paris. The key is shipped off a hash work that performs the arithmetic procedure on it. The outcome (generally called the hash value or hash) is the index of the key-value pair in the hash table.

A Hash table is a data structure that represents data as key-value pairs. Each key is mapped to a value in the hash table. The keys are used for indexing the values/data. A comparable approach is applied by an associative array.

Data is represented in a key value pair with the assistance of keys as demonstrated in the figure below. Each data is associated with a key. The key is an integer that highlight the data.

1. Direct Address Table

A direct address table is used when the measure of space used by the table isn’t an issue for the program. Here, we assume that

the keys are little integers
the number of keys isn’t excessively huge, and
no two data have a similar key

A pool of integers is taken called universe U = {0, 1, … ., n-1}.

Each slot of a direct address table T[0…n-1] contains a pointer to the element that corresponds to the data.

The index of the array T is the key itself and the content of T is a pointer to the set [key, element]. If there is no element for a key then, it is left as NULL.

Sometimes, the key itself is the data.

Pseudocode for operations

directAddressSearch(T, k)
  return T[k]
directAddressInsert(T, x)
  T[x.key] = x
directAddressDelete(T, x)
  T[x.key] = NIL

Limitations of a Direct Address Table

The value of the key ought to be little.
The number of keys should be little enough so it doesn’t cross the size limit of an array.

2. Hash Table

In a hash table, the keys are processed to create another index that maps to the necessary elements. This process is called hashing.

Let h(x) be a hash function and k be a key.

h(k) is calculated and it is used as an index for the element.

Limitations of a Hash Table

On the off chance that a similar index is created by the hash work for various keys at that point, conflict arises. The present circumstance is called a collision.

To keep away from this, a reasonable hash work is picked. In any case, it is difficult to create all extraordinary keys on the grounds that |U|>m. Subsequently, a decent hash function may not prevent the collisions totally anyway it can decrease the number of collisions.

Notwithstanding, we have different strategies to resolve collision.

Advantages of the hash table over direct address table:

The fundamental issues with the direct address table are the size of the array and the potentially enormous value of a key. The hash work decreases the scope of the index and hence the size of the array is likewise diminished.

For instance, If k = 9845648451321, at that point h(k) = 11 (by using some hash function). This helps in saving the memory wasted while providing the index of 9845648451321 to the array

Collision resolution by chaining

In this strategy, if a hash work produces a similar index for various elements, these elements are stored in a similar index by using a doubly linked list.

On the off chance that j is the slot for different elements, it contains a pointer to the head of the list of elements. In the event that no elements is available, j contains NIL.

Pseudocode for operations

chainedHashSearch(T, k)
  return T[h(k)]
chainedHashInsert(T, x)
  T[h(x.key)] = x //insert at the head
chainedHashDelete(T, x)
  T[h(x.key)] = NIL

Python, Java, C and C++ Implementation

Python

# Python program to demonstrate working of HashTable 

hashTable = [[],] * 10

def checkPrime(n):
    if n == 1 or n == 0:
        return 0

    for i in range(2, n//2):
        if n % i == 0:
            return 0

    return 1


def getPrime(n):
    if n % 2 == 0:
        n = n + 1

    while not checkPrime(n):
        n += 2

    return n


def hashFunction(key):
    capacity = getPrime(10)
    return key % capacity


def insertData(key, data):
    index = hashFunction(key)
    hashTable[index] = [key, data]

def removeData(key):
    index = hashFunction(key)
    hashTable[index] = 0

insertData(123, "apple")
insertData(432, "mango")
insertData(213, "banana")
insertData(654, "guava")

print(hashTable)

removeData(123)

print(hashTable)

Java

// Java program to demonstrate working of HashTable 

import java.util.*; 

class HashTable { 
  public static void main(String args[]) 
  {
    Hashtable<Integer, Integer> 
      ht = new Hashtable<Integer, Integer>(); 
  
    ht.put(123, 432); 
    ht.put(12, 2345);
    ht.put(15, 5643); 
    ht.put(3, 321);

    ht.remove(12);

    System.out.println(ht); 
  } 
}

// Implementing hash table in C

#include <stdio.h>
#include <stdlib.h>

struct set
{
  int key;
  int data;
};
struct set *array;
int capacity = 10;
int size = 0;

int hashFunction(int key)
{
  return (key % capacity);
}
int checkPrime(int n)
{
  int i;
  if (n == 1 || n == 0)
  {
    return 0;
  }
  for (i = 2; i < n / 2; i++)
  {
    if (n % i == 0)
    {
      return 0;
    }
  }
  return 1;
}
int getPrime(int n)
{
  if (n % 2 == 0)
  {
    n++;
  }
  while (!checkPrime(n))
  {
    n += 2;
  }
  return n;
}
void init_array()
{
  capacity = getPrime(capacity);
  array = (struct set *)malloc(capacity * sizeof(struct set));
  for (int i = 0; i < capacity; i++)
  {
    array[i].key = 0;
    array[i].data = 0;
  }
}

void insert(int key, int data)
{
  int index = hashFunction(key);
  if (array[index].data == 0)
  {
    array[index].key = key;
    array[index].data = data;
    size++;
    printf("\n Key (%d) has been inserted \n", key);
  }
  else if (array[index].key == key)
  {
    array[index].data = data;
  }
  else
  {
    printf("\n Collision occured  \n");
  }
}

void remove_element(int key)
{
  int index = hashFunction(key);
  if (array[index].data == 0)
  {
    printf("\n This key does not exist \n");
  }
  else
  {
    array[index].key = 0;
    array[index].data = 0;
    size--;
    printf("\n Key (%d) has been removed \n", key);
  }
}
void display()
{
  int i;
  for (i = 0; i < capacity; i++)
  {
    if (array[i].data == 0)
    {
      printf("\n array[%d]: / ", i);
    }
    else
    {
      printf("\n key: %d array[%d]: %d \t", array[i].key, i, array[i].data);
    }
  }
}

int size_of_hashtable()
{
  return size;
}

int main()
{
  int choice, key, data, n;
  int c = 0;
  init_array();

  do
  {
    printf("1.Insert item in the Hash Table"
         "\n2.Remove item from the Hash Table"
         "\n3.Check the size of Hash Table"
         "\n4.Display a Hash Table"
         "\n\n Please enter your choice: ");

    scanf("%d", &choice);
    switch (choice)
    {
    case 1:

      printf("Enter key -:\t");
      scanf("%d", &key);
      printf("Enter data -:\t");
      scanf("%d", &data);
      insert(key, data);

      break;

    case 2:

      printf("Enter the key to delete-:");
      scanf("%d", &key);
      remove_element(key);

      break;

    case 3:

      n = size_of_hashtable();
      printf("Size of Hash Table is-:%d\n", n);

      break;

    case 4:

      display();

      break;

    default:

      printf("Invalid Input\n");
    }

    printf("\nDo you want to continue (press 1 for yes): ");
    scanf("%d", &c);

  } while (c == 1);
}

C++

// Implementing hash table in C++

#include <iostream>
#include <list>
using namespace std;

class HashTable
{
  int capacity;
  list<int> *table;

public:
  HashTable(int V);
  void insertItem(int key, int data);
  void deleteItem(int key);

  int checkPrime(int n)
  {
    int i;
    if (n == 1 || n == 0)
    {
      return 0;
    }
    for (i = 2; i < n / 2; i++)
    {
      if (n % i == 0)
      {
        return 0;
      }
    }
    return 1;
  }
  int getPrime(int n)
  {
    if (n % 2 == 0)
    {
      n++;
    }
    while (!checkPrime(n))
    {
      n += 2;
    }
    return n;
  }

  int hashFunction(int key)
  {
    return (key % capacity);
  }
  void displayHash();
};
HashTable::HashTable(int c)
{
  int size = getPrime(c);
  this->capacity = size;
  table = new list<int>[capacity];
}
void HashTable::insertItem(int key, int data)
{
  int index = hashFunction(key);
  table[index].push_back(data);
}

void HashTable::deleteItem(int key)
{
  int index = hashFunction(key);

  list<int>::iterator i;
  for (i = table[index].begin();
     i != table[index].end(); i++)
  {
    if (*i == key)
      break;
  }

  if (i != table[index].end())
    table[index].erase(i);
}

void HashTable::displayHash()
{
  for (int i = 0; i < capacity; i++)
  {
    cout << "table[" << i << "]";
    for (auto x : table[i])
      cout << " --> " << x;
    cout << endl;
  }
}

int main()
{
  int key[] = {231, 321, 212, 321, 433, 262};
  int data[] = {123, 432, 523, 43, 423, 111};
  int size = sizeof(key) / sizeof(key[0]);

  HashTable h(size);

  for (int i = 0; i < n; i++)
    h.insertItem(key[i], data[i]);

  h.deleteItem(12);
  h.displayHash();
}

Good Hash Functions

A good hash work has the accompanying attributes.

It ought not to create keys that are too enormous and the bucket space is little. Space is wasted.
The keys produced ought to be neither very close nor too far in range.
The collision should be minimized however much as possible.

Some of the methods used for hashing are:

Division Method

If k is a key and m is the size of the hash table, the hash function h() is calculated as:

h(k) = k mod m

For instance, If the size of a hash table is 10 and k = 112 then h(k) = 112 mod 10 = 2. The value of m should not be the powers of 2. This is on the grounds that the powers of 2 in the binary format are 10, 100, 1000, … . At the point when we discover k mod m, we will consistently get the lower request p-bits.

if m = 22, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 01
if m = 23, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 001
if m = 24, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 0001
if m = 2p, then h(k) = p lower bits of m

Multiplication Method

h(k) = ⌊m(kA mod 1)⌋
where,

kA mod 1 gives the fractional part kA,
⌊ ⌋ gives the floor value
A is any constant. The value of A lies between 0 and 1. But, an optimal choice will be ≈ (√5-1)/2 suggested by Knuth.

Universal Hashing

In Universal hashing, the hash function is chosen at random independent of keys.

Open Addressing

Different values can be stored in a single slot in a typical hash table.

By using open addressing, each slot is either loaded up with a single key or left NIL. All the elements are stored in the hash table itself.

Not at all like chaining, numerous elements can’t be found a way into a similar opening.

Open addressing is fundamentally a collision settling method. A portion of the techniques used by open addressing are:

Linear Probing

In linear probing, collision is resolved by checking the next slot.
h(k, i) = (h′(k) + i) mod m
where,

i = {0, 1, ….}
h'(k) is a new hash function

In the event that an impact happens at h(k, 0), at that point h(k, 1) is checked. Thusly, the value of I is increased straightly.

The issue with linear testing is that a group of nearby spaces is filled. While inserting another element, the whole cluster should be navigated. This adds to the time needed to perform the procedure on the hash table.

Quadratic Probing

In quadratic probing, the spacing between the slots is increased (greater than one) by using the following relation.
h(k, i) = (h′(k) + c1i + c2i2) mod m
where,

c1 and c2 are positive auxiliary constants,
i = {0, 1, ….}

Double hashing

On the off chance that a collision happens subsequent to applying a hash function h(k), at that point another hash function is calculated for finding the next slot.

h(k, I) = (h1(k) + ih2(k)) mod m

Hash Table Applications

Hash tables are implemented where

constant time lookup and insertion is required
cryptographic applications
indexing data is required

Thanks for reading! We hope you found this tutorial helpful and we would love to hear your feedback in the Comments section below. And show us what you’ve learned by sharing your photos and creative projects with us.