Introduction
Levenshtein distance (some times also called Edit distance) is an algorithm developed by Vladimir Levenshtein in the year 1965. It can be defined as the smallest number of transformations needed to changed one string to another. These transformations can be either substitution, insertion or deletion. So, in other words, if we want to convert one string to another we would do some operations. The minimum number of operations is the Levenshtein distance we’re looking for.
Levenshtein distance has a lot of applications including plagiarism detection, cryptanalysis and our dear everyday use with spelling correction.
Example
To understand it in a better way assume we have 2 strings ‘a’ and ‘b’. Now I would try to solve this in a brute force way and then get the minimum distance.
Option 1: Substitute ‘a’ with ‘b’ , therefore the distance will be 1
‘a’ => ‘b’
Option 2: Delete ‘a’ then insert ‘b’, therefore the distance will be 2
‘a’ => ” => ‘b’
It goes without saying that Option 1 is the distance we are looking for. For trivial examples like the one I just gave above there is no need for a special algorithm, but when the inputs look like this,
shsakljhfjsahfdkljshdfkjsajhsdakjfhsasjddshfkjhjsdhfkjhfksjhfjshjfhsk
uiweyriuwqerbcvnbzxgwygrsdjndsfhidfhncvmncvkljadhiahdfkljd
we can’t really do anything without the algorithm.
The algorithm
Levenshtein distance is calculated using Dynammic programming algorithm which is concerned with solving smaller (constrained) versions of the problem till we find the best solution of our problem. We will follow a simple example to show how the algorithm is used. Some forms of the algorithm give different weights for different transformations, but we will assume that all transformations have the same weight and that is equal to 1 for the sake of simplicity.
Example
Assume we have the 2 strings ‘low’ and ‘twin’. We want to find the minimum distance between them. To do this we construct a table that looks like this.
| t | w | i | n | ||
| 0 | 1 | 2 | 3 | 4 | |
| l | 1 | ||||
| o | 2 | ||||
| w | 3 |
This is your starting point. Each cell of this table is a small constrained version of the problem that we need to find its solution. Now lets start with our smallest problem, and that’s the comparison of the first 2 letters from both words. In our example ‘l’ and ‘t’ are not equal. So a transformation is needed, in other words we have the value one (if equal the value will be zero since there are no transformations needed). Now wait before you place that one in the cell. This comparison depends on comparisons from cells adjacent to it, as we are trying find the least possible transformations for the whole string. So we calculate the following,
The cell on the upper left (diagonally):
0 + transformation error(in this case one) = 0 + 1 = 1
The cell on the left(horizontal):
1 + 1 = 2
The cell directly above(vertical):
1 + 1 = 2
and since one is less than two we can safely say that this cell will have a value of one.
| t | ||
| 0 | 1 | |
| l | 1 | 1 |
We will continue to do the same with the next cell. The same as before we will consider the three cells adjacent to it.
The cell on the upper left (diagonally):
1 + 1 = 2
The cell on the left(horizontal):
1 + 1 = 2
The cell directly above(vertical):
2 + 1 = 3
From the above values the minimum transformations are 2.
| t | w | ||
| 0 | 1 | 2 | |
| l | 1 | 1 | 2 |
We will continue to do this until the table is complete,
| t | w | i | n | ||
| 0 | 1 | 2 | 3 | 4 | |
| l | 1 | 1 | 2 | 3 | 4 |
| o | 2 | 2 | 2 | 3 | 4 |
| w | 3 | 3 | 2 | 3 | 4 |
that last cell is the minimum number of transformations needed to transform ‘low’ to ‘twin’.
You can check the algorithm’s implementation in several programming languages here.
Thanks for reading and comments are really welcome.

