Lowercase URL Encoding with Urllib in Python3



Published: 2020-09-09 16:38:17 +0000
Categories: Python3,

Language

Python3

Description

Python's urllib module is pretty flexible, and contains a bunch of useful tools - including the ability to urlencode strings via urllib.parse.quote()

Unfortunately, it doesn't allow you to specify what case should be used in URL encoding, and defaults to upper-case. Many other libraries/languages use lower case.

This becomes problematic where you're url encoding an authentication token (or similar) - when the other side mints a token for comparison, the two will not match in a like for like comparison (and if the other end urldecodes your input, they're doing more processing on untrusted input that should be necessary - similarly passing through lower will affect the entire token and massively increase the chance of collisions).

This little wrapper function allows you to do a lowercase url-encode

Snippet

import re
import urllib.parse

def quote_url_lower(url,safe='/'):

    s=urllib.parse.quote(url,safe)
    pattern = "%[A-Z,0-9][A-Z,0-9]"

    # Convert the result to a set so each entry is unique  
    result = set(re.findall(pattern,s))

    for r in result:
        s = s.replace(r,r.lower())

    return s

Usage Example

print(quote_url_lower("abcdef/ghi/123+="))
abcdef/ghi/123%2b%3d

print(quote_url_lower("abcdef/ghi/123+=",safe=''))
abcdef%2fghi%2f123%2b%3d

Keywords

urlencode, urllib, python3,

Latest Posts


Copyright © 2022 Ben Tasker | Sitemap | Privacy Policy
Available at snippets.bentasker.co.uk, http://phecoopwm6x7azx26ctuqcp6673bbqkrqfeoiz2wwk36sady5tqbdpqd.onion and http://snippets.bentasker.i2p
hit counter