Short snippet to calculate the edit distance (or Hamming Distance) between two strings in Python
The Hamming distance is just the number of differing bits within the string, so given two single char strings:
The edit distance is useful in basic cryptanalysis as for some cipher types (basically mechanisms where the key is repeated, i.e. ECB) it can help you discern the length of the key used - out of multiple tries at the keysize, the smallest (normalised) edit distance probably best indicates the key used.
import binascii
def char_to_bits(text):
''' Convert a character into a string of bits
For example the letter 'a' would be returned as '01100001'
'''
bits = bin(int(binascii.hexlify(text), 16))[2:]
return bits.zfill(8 * ((len(bits) + 7) // 8))
def computeHammingDistance(str1,str2):
''' Calculate the number of bits which differ between two strings (also known as the edit distance)
'''
distance = 0
for i in range(len(str1)):
# Convert each char to a bit string
c1 = char_to_bits(str1[i])
c2 = char_to_bits(str2[i])
# Compare each of the bits and mark whether they differ
for b in range(8):
if c1[b] != c2[b]:
distance = distance + 1
return distance
>>> print computeHammingDistance('a','b')
2
>>> print computeHammingDistance('foo','fpo')
5
>>> print computeHammingDistance('foo','bar')
8