Calculate Edit Distance between two strings



Published: 2017-08-03 23:18:06 +0000
Categories: Python,

Language

Python

Description

Short snippet to calculate the edit distance (or Hamming Distance) between two strings in Python

The Hamming distance is just the number of differing bits within the string, so given two single char strings:

  • a: 01100001
  • b: 01100010
The Hamming distance would be 2 as the penultimate and last bit differ.

The edit distance is useful in basic cryptanalysis as for some cipher types (basically mechanisms where the key is repeated, i.e. ECB) it can help you discern the length of the key used - out of multiple tries at the keysize, the smallest (normalised) edit distance probably best indicates the key used.

Snippet

import binascii

def char_to_bits(text):
    ''' Convert a character into a string of bits

    For example the letter 'a' would be returned as '01100001'

    '''
    bits = bin(int(binascii.hexlify(text), 16))[2:]
    return bits.zfill(8 * ((len(bits) + 7) // 8))

def computeHammingDistance(str1,str2):
    ''' Calculate the number of bits which differ between two strings (also known as the edit distance)
    '''
    distance = 0

    for i in range(len(str1)):

        # Convert each char to a bit string
        c1 = char_to_bits(str1[i])
        c2 = char_to_bits(str2[i])

        # Compare each of the bits and mark whether they differ
        for b in range(8):
            if c1[b] != c2[b]:
                distance = distance + 1

    return distance

Usage Example

>>> print computeHammingDistance('a','b')
2

>>> print computeHammingDistance('foo','fpo')
5

>>> print computeHammingDistance('foo','bar')
8

Requires

License

BSD-3-Clause

Keywords

hamming, edit, distance, calculate, compute, analyse,

Latest Posts

Remotely backing up PFsense Configuration (BASH)
FFMPEG: Converting RMVB to X264 MP4 (BASH)
Recursively print table (print_r equivalent) (LUA)
Bulk Delete Comments from (Self-Hosted) JIRA Issues (Misc)
Add a static entry to the ARP table (BASH)
SSL Cipher Hex codes to Human Readable Names (Misc)
Convert Ascii to Binary (BASH)
Intercepting Outbound DNS Queries (BASH)
Handle Google Verification files within NGinx Configuration (NGinx)
Getting WhatsApp Rich Snippet Previews Working (Misc)

Copyright © 2018 Ben Tasker | Sitemap | Privacy Policy
Available at snippets.bentasker.co.uk and snippets.6zdgh5a5e6zpchdz.onion