Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String based Algorithms added #654

Merged
merged 17 commits into from
Oct 14, 2023
110 changes: 110 additions & 0 deletions Algorithms/Stringbasedalgorithms/KMP/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
The Knuth-Morris-Pratt (KMP) algorithm is a pattern-matching algorithm that efficiently finds occurrences of a given pattern within a longer text.
It uses a precomputed array called the Longest Prefix Suffix (LPS) to avoid unnecessary comparisons




Complete Code:
def calculate_lps(pattern):
"""
Calculate the Longest Prefix Suffix (LPS) array for the given pattern.
Args:
pattern (str): The pattern string.
list: LPS array for the pattern.
"""
length = len(pattern)
lps = [0] * length # Initialize the LPS array with zeros.
j = 0 # Initialize j, the length of the previous longest prefix suffix.

for i in range(1, length):
# Update j based on previous values of j and the current character in the pattern.
while j > 0 and pattern[i] != pattern[j]:
j = lps[j - 1] # Move j to the previous prefix suffix.

if pattern[i] == pattern[j]:
j += 1 # Increment j if there is a match for the current character.

lps[i] = j # Assign the length of the longest prefix suffix for the current position.

return lps




Args:
text (str): The text to search in.
pattern (str): The pattern to search for.

Returns:
def kmp_search(text, pattern):
"""
Search for occurrences of the pattern in the given text using the Knuth-Morris-Pratt algorithm.
list: List of indices where the pattern is found in the text.
"""
if not text or not pattern:
return []

n = len(text)
m = len(pattern)
lps = calculate_lps(pattern) # Calculate the LPS array for the pattern.
results = []
j = 0 # Initialize j, the length of the matched prefix suffix.

for i in range(n):
# Update j based on previous values of j and the current characters in text and pattern.
while j > 0 and text[i] != pattern[j]:
j = lps[j - 1] # Move j to the previous prefix suffix.

if text[i] == pattern[j]:
j += 1 # Increment j if there is a match for the current character.

if j == m:
# Pattern found at index i - m + 1
results.append(i - m + 1)
j = lps[j - 1] # Move j to the previous prefix suffix.

return results


Example usage
text = "ABABABCABABABCABABABC"
pattern = "ABAB"

print("Text:", text)
print("Pattern:", pattern)
print("Occurrences:", kmp_search(text, pattern))



lets explain what the following code does through an example
assume you have to search a string assume ABAB in the text ABABABCABABABCABABABC
A few assumptions M is the length of the pattern and N is the length of the pattern
Step 1 Calculate LPS Array (Longest Prefix Suffix):
For pattern "ABAB":
LPS: [0, 0, 1, 2]
The LPS array tells us the length of the longest proper prefix that is also a suffix for each position in the pattern.

Step 2 KMP Search:
Start comparing the pattern with the text from left to right, keeping track of a pointer j for the pattern.

At index 0:

Compare pattern[0] (A) with text[0] (A). Match.
Move to the next character in both pattern and text.
At index 1:

Compare pattern[1] (B) with text[1] (B). Match.
Move to the next character in both pattern and text.
At index 2:

Compare pattern[2] (A) with text[2] (A). Match.
Move to the next character in both pattern and text.
At index 3:

Compare pattern[3] (B) with text[3] (B). Match.
Pattern fully matched at index 3. Add the starting index (3 - length of pattern + 1 = 0) to the result.
Continue searching for other occurrences in a similar manner.

#Result:

#The pattern "ABAB" is found at indices 0, 6, and 12 in the text.
53 changes: 53 additions & 0 deletions Algorithms/Stringbasedalgorithms/RabinKarp/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
The following code snippet implements Rabin Karp ALgorithm in Python
Rabin karp is kind of a string hashing implementation where there are also chances of spurious hits
What this algorithm does is that it calculates hash values of strings in such a fashion that the chances of collision two strings having the same hash values are zero to none
One can understand it even clearly after learning about the concept of rolling hash
Complete code :

class RabinKarp:
def __init__(self, text, pattern):
self.text = text
self.pattern = pattern
self.text_length = len(text)
self.pattern_length = len(pattern)
self.prime = 101 # A prime number to use for hashing
self.base = 256 # Number of possible characters in the input
def calculate_hash(self, string, length):
# Calculate the hash value for a given substring of specified length
hash_value = 0
for i in range(length):
hash_value = (hash_value * self.base + ord(string[i])) % self.prime
return hash_value


def rabin_karp_search(self):
if self.pattern_length > self.text_length:
return []

results = []
pattern_hash = self.calculate_hash(self.pattern, self.pattern_length)
text_hash = self.calculate_hash(self.text, self.pattern_length)

for i in range(self.text_length - self.pattern_length + 1):
if text_hash == pattern_hash and self.text[i:i+self.pattern_length] == self.pattern:
results.append(i)

if i < self.text_length - self.pattern_length:
# Update the rolling hash for the next window
text_hash = (self.base * (text_hash - ord(self.text[i]) * (self.base**(self.pattern_length - 1))) + ord(self.text[i + self.pattern_length])) % self.prime

# Ensure the hash value is non-negative
if text_hash < 0:
text_hash += self.prime

return results


Example usage
text = "ABABABCABABABCABABABC"
pattern = "ABAB"

rk = RabinKarp(text, pattern)
print("Text:", text)
print("Pattern:", pattern)
print("Pattern found at indices:", rk.rabin_karp_search())
5 changes: 5 additions & 0 deletions Algorithms/Stringbasedalgorithms/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
The following folder contains a list of extremely useful string algorithms that we can use in our code to reduce
its space and time complexity.
I have mentioned the list of algorithms down below. Any new additions can also be added in this directory and menioned here in this readme file.
### [Knuth-Morris-Pratt](KMP/readme.md)
### [RabinKarp](RabinKarp/readme.md)
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,8 @@ Sorting is the process of arranging a list of items in a particular order. For e
### [Searching](Algorithms/Searching/readme.md)
Searching is an algorithm for finding a certain target element inside a container. Searching Algorithms are designed to check for an element or retrieve an element from any data structure where it is stored.

### [StringBasedAlgorithms](Algorithms/Stringbasedalgorithms/readme.md)
Strings are one of the most used and most important data structures in programming, this repository contains a few of the most used algorithms which help in faster searching time improving our code.

### [Graph Search](Algorithms/Graph/readme.md)
Graph search is the process of searching through a graph to find a particular node. A graph is a data structure that consists of a finite (and possibly mutable) set of vertices or nodes or points, together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edges, arcs, or lines for an undirected graph, and as arrows, directed edges, directed arcs, or directed lines for a directed graph. The vertices may be part of the graph structure or may be external entities represented by integer indices or references. Graphs are one of the most useful data structures for many real-world applications. Graphs are used to model pairwise relations between objects. For example, the airline route network is a graph in which the cities are the vertices, and the flight routes are the edges. Graphs are also used to represent networks. The Internet can be modeled as a graph in which the computers are the vertices, and the links between computers are the edges. Graphs are also used on social networks like LinkedIn and Facebook. Graphs are used to represent many real-world applications: computer networks, circuit design, and aeronautical scheduling to name just a few.
Expand Down