Edit Distance Algorithm
The Edit Distance Algorithm, also known as the Levenshtein distance, is a widely-used computational method for measuring the similarity between two strings or sequences. Given two strings, the algorithm calculates the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into the other. This algorithm has numerous applications in various fields, including natural language processing, DNA sequence alignment, spell checking, and data entry validation.
The algorithm employs dynamic programming techniques to create a matrix of dimensions (m+1) x (n+1), where m and n are the lengths of the two input strings. The cells in the matrix represent the edit distance between the substrings formed by taking the first i characters from the first string and the first j characters from the second string. By iteratively filling the matrix row by row, the algorithm computes the edit distance for each cell by considering the minimum of three possible operations: insertion, deletion, or substitution. The final edit distance between the two strings is found in the bottom-right cell of the matrix. This value can be normalized by dividing it by the length of the longest string, resulting in a similarity score between 0 (completely dissimilar) and 1 (identical strings).
/**
* Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
*
* You have the following 3 operations permitted on a word:
*
* a) Insert a character
* b) Delete a character
* c) Replace a character
*
*/
#include <iostream>
#include <string>
using std::string;
int min( int a, int b, int c ) {
int minab = ( a < b ) ? a : b;
return ( minab < c ) ? minab:c;
}
int minEditDistance(string word1, string word2) {
int len1 = word1.size();
int len2 = word2.size();
int minCost;
int ** table = new int*[len1+1];
for ( int i = 0;i <= len1; ++i ) {
table[i] = new int[len2+1];
}
for (int i = 0; i <= len1; ++i) {
for (int j = 0; j <= len2; ++j ) {
if ( i == 0 ) {
table[i][j] = j;
}
else if ( j == 0 ) {
table[i][j] = i;
} else if ( word1[i-1] == word2[j-1]) {
table[i][j] = table[i-1][j-1];
} else {
table[i][j] = 1+ min(table[i-1][j-1], table[i][j-1], table[i-1][j]);
}
}
}
minCost = table[len1][len2];
for ( int i = 0; i <= len1; ++i ) {
delete table[i];
}
delete[] table;
return minCost;
}
int main() {
std::string word1, word2;
std::cout << "Enter word 1 : " ;
std::cin >> word1;
std::cout << "Enter word 2 : " ;
std::cin >> word2;
std::cout << "Min edit distance : " << minEditDistance(word1, word2) << std::endl;
return 0;
}