In this tutorial, you will learn what the rabin-karp algorithm is. Additionally, you will discover working instances of the rabin-karp algorithm in C, C++, Java, and Python.

Rabin-Karp algorithm is an algorithm used for looking/matching patterns in the content using a hash work. Not at all like the Naive string matching algorithm, it doesn’t go through each character in the underlying stage rather it filters the characters that don’t match and afterward plays out the comparison.

A hash work is a device to map a larger input value to a smaller output value. This output value is known as the hash value.


How Rabin-Karp Algorithm Works?

A sequence of characters is taken and checked for the chance of the presence of the necessary string. Assuming the chance is discovered, character matching is performed.

  • Calculate hash value of pattern
  • Calculate hash value of first M characters of text
  • Compare both hash values
  • If they are unequal, calculate hash value for next M characters of text and compare again.
  • If they are equal, perform a brute force comparison.
hash_p = hash value of pattern
hash_t = hash value of first M letters in body of text
do
	if (hash_p == hash_t) 
		brute force comparison of pattern and selected section of text
	hash_t= hash value of next section of text, one character over
while (end of text or brute force comparison == true)

Let us to understand the algorithm with the accompanying steps:

  1. Let the text be:

And the string to be searched in the above text be:

2. Let us assign a numerical value(v)/weight for the characters we will use in the issue. Here, we have taken the initial ten alphabets just (for example A to J).

3. m be the length of the example and n be the length of the content. Here, m = 10 and n = 3.

Let d alone the quantity of characters in the input set. Here, we have taken input set {A, B, C, …, J}. Thus, d = 10. You can assume any sauitable value for d.

4. Let us calculate the hash value of the example.

hash value for pattern(p) = Σ(v * dm-1) mod 13 
                      = ((3 * 102) + (4 * 101) + (4 * 100)) mod 13 
                      = 344 mod 13 
                      = 6

In the calculation above, pick a prime number (here, 13) so that we can play out every one of the calculations with single-precision arithmetic.

The reason for calculating the modulus is given below.

5. Calculate the hash value for the text-window of size m.

For the first window ABC,
hash value for text(t) = Σ(v * dn-1) mod 13 
                  = ((1 * 102) + (2 * 101) + (3 * 100)) mod 13 
                  = 123 mod 13  
                  = 6

6. Look at the hash value of the pattern with the hash value of the content. On the off chance that they match, character-matching is performed.

In the above examples, the hash value of the first window (for example t) matches with p along these lines, go for character matching among ABC and CDD. Since they don’t match so, go for the next window.

7. We calculate the hash value of the next window by subtracting the initial term and adding the next term as demonstrated underneath.

t = ((1 * 102) + ((2 * 101) + (3 * 100)) * 10 + (3 * 100)) mod 13 
  = 233 mod 13  
  = 12

In order to optimize this process, we make use of the previous hash value in the following way.

t = ((d * (t - v[character to be removed] * h) + v[character to be added] ) mod 13  
  = ((10 * (6 - 1 * 9) + 3 )mod 13  
  = 12
Where, h = dm-1 = 103-1 = 100.
  1. For BCC, t = 12 (≠6). Therefore, go for the next window.
    After a few searches, we will get the match for the window CDA in the text.

Algorithm

n = t.length
m = p.length
h = dm-1 mod q
p = 0
t0 = 0
for i = 1 to m
    p = (dp + p[i]) mod q
    t0 = (dt0 + t[i]) mod q
for s = 0 to n - m
    if p = ts
        if p[1.....m] = t[s + 1..... s + m]
            print "pattern found at position" s
    If s < n-m
        ts + 1 = (d (ts - t[s + 1]h) + t[s + m + 1]) mod q

Python, Java, and C/C++ Examples

Python

# Rabin-Karp algorithm in python


d = 10

def search(pattern, text, q):
    m = len(pattern)
    n = len(text)
    p = 0
    t = 0
    h = 1
    i = 0
    j = 0

    for i in range(m-1):
        h = (h*d) % q

    # Calculate hash value for pattern and text
    for i in range(m):
        p = (d*p + ord(pattern[i])) % q
        t = (d*t + ord(text[i])) % q

    # Find the match
    for i in range(n-m+1):
        if p == t:
            for j in range(m):
                if text[i+j] != pattern[j]:
                    break

            j += 1
            if j == m:
                print("Pattern is found at position: " + str(i+1))

        if i < n-m:
            t = (d*(t-ord(text[i])*h) + ord(text[i+m])) % q

            if t < 0:
                t = t+q


text = "ABCCDDAEFG"
pattern = "CDD"
q = 13
search(pattern, text, q)

Java

// Rabin-Karp algorithm in Java

public class RabinKarp {
  public final static int d = 10;

  static void search(String pattern, String txt, int q) {
    int m = pattern.length();
    int n = txt.length();
    int i, j;
    int p = 0;
    int t = 0;
    int h = 1;

    for (i = 0; i < m - 1; i++)
      h = (h * d) % q;

    // Calculate hash value for pattern and text
    for (i = 0; i < m; i++) {
      p = (d * p + pattern.charAt(i)) % q;
      t = (d * t + txt.charAt(i)) % q;
    }

    // Find the match
    for (i = 0; i <= n - m; i++) {
      if (p == t) {
        for (j = 0; j < m; j++) {
          if (txt.charAt(i + j) != pattern.charAt(j))
            break;
        }

        if (j == m)
          System.out.println("Pattern is found at position: " + (i + 1));
      }

      if (i < n - m) {
        t = (d * (t - txt.charAt(i) * h) + txt.charAt(i + m)) % q;
        if (t < 0)
          t = (t + q);
      }
    }
  }

  public static void main(String[] args) {
    String txt = "ABCCDDAEFG";
    String pattern = "CDD";
    int q = 13;
    search(pattern, txt, q);
  }
}

C

// Rabin-Karp algorithm in C

#include <stdio.h>
#include <string.h>

#define d 10

void rabinKarp(char pattern[], char text[], int q) {
  int m = strlen(pattern);
  int n = strlen(text);
  int i, j;
  int p = 0;
  int t = 0;
  int h = 1;

  for (i = 0; i < m - 1; i++)
    h = (h * d) % q;

  // Calculate hash value for pattern and text
  for (i = 0; i < m; i++) {
    p = (d * p + pattern[i]) % q;
    t = (d * t + text[i]) % q;
  }

  // Find the match
  for (i = 0; i <= n - m; i++) {
    if (p == t) {
      for (j = 0; j < m; j++) {
        if (text[i + j] != pattern[j])
          break;
      }

      if (j == m)
        printf("Pattern is found at position:  %d \n", i + 1);
    }

    if (i < n - m) {
      t = (d * (t - text[i] * h) + text[i + m]) % q;

      if (t < 0)
        t = (t + q);
    }
  }
}

int main() {
  char text[] = "ABCCDDAEFG";
  char pattern[] = "CDD";
  int q = 13;
  rabinKarp(pattern, text, q);
}

C++

// Rabin-Karp algorithm in C++

#include <string.h>

#include <iostream>
using namespace std;

#define d 10

void rabinKarp(char pattern[], char text[], int q) {
  int m = strlen(pattern);
  int n = strlen(text);
  int i, j;
  int p = 0;
  int t = 0;
  int h = 1;

  for (i = 0; i < m - 1; i++)
    h = (h * d) % q;

  // Calculate hash value for pattern and text
  for (i = 0; i < m; i++) {
    p = (d * p + pattern[i]) % q;
    t = (d * t + text[i]) % q;
  }

  // Find the match
  for (i = 0; i <= n - m; i++) {
    if (p == t) {
      for (j = 0; j < m; j++) {
        if (text[i + j] != pattern[j])
          break;
      }

      if (j == m)
        cout << "Pattern is found at position: " << i + 1 << endl;
    }

    if (i < n - m) {
      t = (d * (t - text[i] * h) + text[i + m]) % q;

      if (t < 0)
        t = (t + q);
    }
  }
}

int main() {
  char text[] = "ABCCDDAEFG";
  char pattern[] = "CDD";
  int q = 13;
  rabinKarp(pattern, text, q);
}

Limitations of Rabin-Karp Algorithm

Spurious Hit

At the point when the hash value of the example matches with the hash value of a window of the content however the window isn’t the actual example then it is known as a spurious hit.

Spurioushit builds the time complexity of the algorithm. In order to minimize spurious hit, we use modulus. It greatly reduces the spurious hit.


Rabin-Karp Algorithm Complexity

The normal case and best case complexity of the Rabin-Karp algorithm is O(m + n) and the worst-case complexity is O(mn).

The worst-case complexity happens when spurious hits happen a number for every one of the windows.


Rabin-Karp Algorithm Applications

  • For pattern matching
  • For searching string in a bigger text

Thanks for reading! We hope you found this tutorial helpful and we would love to hear your feedback in the Comments section below. And show us what you’ve learned by sharing your photos and creative projects with us.