[ACCEPTED]-Parse an HTTP request Authorization header with Python-http-headers
A little regex:
import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')
>>>dict(reg.findall(headers))
{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}
0
You can also use urllib2 as [CheryPy][1] does.
here 2 is the snippet:
input= """
Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
items = urllib2.parse_http_list(value)
opts = urllib2.parse_keqv_list(items)
opts['protocol'] = 'Digest'
print opts
it outputs:
{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': 'testrealm@host.com', 'response': '6629fae49393a05397450978507c4ef1'}
[1]: https://web.archive.org/web/20130118133623/http://www.google.com:80/codesearch/p?hl=en#OQvO9n2mc04/CherryPy-3.0.1/cherrypy/lib/httpauth.py&q=Authorization Digest http 1 lang:python
Here's my pyparsing attempt:
text = """Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41" """
from pyparsing import *
AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)
valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict
print authentry.parseString(text).dump()
which prints:
['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', 'testrealm@host.com'],
['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'],
['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: testrealm@host.com
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear
I'm 2 not familiar with the RFC, but I hope this 1 gets you rolling.
An older question but one I found very helpful.
(edit after recent upvote) I've 11 created a package that builds on this answer (link 10 to tests to see how to use the class in 9 the package).
pip install authparser
I needed a parser to handle 8 any properly formed Authorization header, as 7 defined by RFC7235 (raise your hand if you enjoy 6 reading ABNF).
Authorization = credentials
BWS = <BWS, see [RFC7230], Section 3.2.3>
OWS = <OWS, see [RFC7230], Section 3.2.3>
Proxy-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS
challenge ] )
Proxy-Authorization = credentials
WWW-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS challenge
] )
auth-param = token BWS "=" BWS ( token / quoted-string )
auth-scheme = token
challenge = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param ) *(
OWS "," [ OWS auth-param ] ) ] ) ]
credentials = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param )
*( OWS "," [ OWS auth-param ] ) ] ) ]
quoted-string = <quoted-string, see [RFC7230], Section 3.2.6>
token = <token, see [RFC7230], Section 3.2.6>
token68 = 1*( ALPHA / DIGIT / "-" / "." / "_" / "~" / "+" / "/" )
*"="
Starting with PaulMcG's answer, I 5 came up with this:
import pyparsing as pp
tchar = '!#$%&\'*+-.^_`|~' + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas
token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))
scheme = token('scheme')
header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))
credentials = scheme + (token68('token') ^ params('params'))
auth_parser = header + pp.Suppress(':') + credentials
This allows for parsing 4 any Authorization header:
parsed = auth_parser.parseString('Authorization: Basic Zm9vOmJhcg==')
print('Authenticating with {0} scheme, token: {1}'.format(parsed['scheme'], parsed['token']))
which outputs:
Authenticating with Basic scheme, token: Zm9vOmJhcg==
Bringing 3 it all together into an Authenticator
class:
import pyparsing as pp
from base64 import b64decode
import re
class Authenticator:
def __init__(self):
"""
Use pyparsing to create a parser for Authentication headers
"""
tchar = "!#$%&'*+-.^_`|~" + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas
token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))
scheme = token('scheme')
auth_header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))
credentials = scheme + (token68('token') ^ params('params'))
# the moment of truth...
self.auth_parser = auth_header + pp.Suppress(':') + credentials
def authenticate(self, auth_header):
"""
Parse auth_header and call the correct authentication handler
"""
authenticated = False
try:
parsed = self.auth_parser.parseString(auth_header)
scheme = parsed['scheme']
details = parsed['token'] if 'token' in parsed.keys() else parsed['params']
print('Authenticating using {0} scheme'.format(scheme))
try:
safe_scheme = re.sub("[!#$%&'*+-.^_`|~]", '_', scheme.lower())
handler = getattr(self, 'auth_handle_' + safe_scheme)
authenticated = handler(details)
except AttributeError:
print('This is a valid Authorization header, but we do not handle this scheme yet.')
except pp.ParseException as ex:
print('Not a valid Authorization header')
print(ex)
return authenticated
# The following methods are fake, of course. They should use what's passed
# to them to actually authenticate, and return True/False if successful.
# For this demo I'll just print some of the values used to authenticate.
@staticmethod
def auth_handle_basic(token):
print('- token is {0}'.format(token))
try:
username, password = b64decode(token).decode().split(':', 1)
except Exception:
raise DecodeError
print('- username is {0}'.format(username))
print('- password is {0}'.format(password))
return True
@staticmethod
def auth_handle_bearer(token):
print('- token is {0}'.format(token))
return True
@staticmethod
def auth_handle_digest(params):
print('- username is {0}'.format(params['username']))
print('- cnonce is {0}'.format(params['cnonce']))
return True
@staticmethod
def auth_handle_aws4_hmac_sha256(params):
print('- Signature is {0}'.format(params['Signature']))
return True
To test this 2 class:
tests = [
'Authorization: Digest qop="chap", realm="testrealm@example.com", username="Foobar", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"',
'Authorization: Bearer cn389ncoiwuencr',
'Authorization: Basic Zm9vOmJhcg==',
'Authorization: AWS4-HMAC-SHA256 Credential="AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request", SignedHeaders="host;range;x-amz-date", Signature="fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024"',
'Authorization: CrazyCustom foo="bar", fizz="buzz"',
]
authenticator = Authenticator()
for test in tests:
authenticator.authenticate(test)
print()
Which outputs:
Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41
Authenticating using Bearer scheme
- token is cn389ncoiwuencr
Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar
Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024
Authenticating using CrazyCustom scheme
This is a valid Authorization header, but we do not handle this scheme yet.
In future if we wish 1 to handle CrazyCustom we'll just add
def auth_handle_crazycustom(params):
If those components will always be there, then 1 a regex will do the trick:
test = '''Authorization: Digest qop="chap", realm="testrealm@host.com", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''
import re
re_auth = re.compile(r"""
Authorization:\s*(?P<protocol>[^ ]+)\s+
qop="(?P<qop>[^"]+)",\s+
realm="(?P<realm>[^"]+)",\s+
username="(?P<username>[^"]+)",\s+
response="(?P<response>[^"]+)",\s+
cnonce="(?P<cnonce>[^"]+)"
""", re.VERBOSE)
m = re_auth.match(test)
print m.groupdict()
produces:
{ 'username': 'Foobear',
'protocol': 'Digest',
'qop': 'chap',
'cnonce': '5ccc069c403ebaf9f0171e9517f40e41',
'realm': 'testrealm@host.com',
'response': '6629fae49393a05397450978507c4ef1'
}
I would recommend finding a correct library 3 for parsing http headers unfortunately I 2 can't reacall any. :(
For a while check the 1 snippet below (it should mostly work):
input= """
Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foob,ear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
field, sep, value = input.partition(":")
if field.endswith('Authorization'):
protocol, sep, opts_str = value.strip().partition(" ")
opts = {}
for opt in opts_str.split(",\n"):
key, value = opt.strip().split('=')
key = key.strip(" ")
value = value.strip(' "')
opts[key] = value
opts['protocol'] = protocol
print opts
Your original concept of using PyParsing 10 would be the best approach. What you've 9 implicitly asked for is something that requires 8 a grammar... that is, a regular expression 7 or simple parsing routine is always going 6 to be brittle, and that sounds like it's 5 something you're trying to avoid.
It appears 4 that getting pyparsing on google app engine 3 is easy: How do I get PyParsing set up on the Google App Engine?
So I'd go with that, and then implement 2 the full HTTP authentication/authorization 1 header support from rfc2617.
The http digest Authorization header field 10 is a bit of an odd beast. Its format is 9 similar to that of rfc 2616's Cache-Control and 8 Content-Type header fields, but just different 7 enough to be incompatible. If you're still 6 looking for a library that's a little smarter 5 and more readable than the regex, you might 4 try removing the Authorization: Digest part 3 with str.split() and parsing the rest with parse_dict_header() from Werkzeug's 2 http module. (Werkzeug can be installed 1 on App Engine.)
Nadia's regex only matches alphanumeric 18 characters for the value of a parameter. That 17 means it fails to parse at least two fields. Namely, the 16 uri and qop. According to RFC 2617, the 15 uri field is a duplicate of the string in 14 the request line (i.e. the first line of 13 the HTTP request). And qop fails to parse 12 correctly if the value is "auth-int" due 11 to the non-alphanumeric '-'.
This modified 10 regex allows the URI (or any other value) to 9 contain anything but ' ' (space), '"' (qoute), or 8 ',' (comma). That's probably more permissive 7 than it needs to be, but shouldn't cause 6 any problems with correctly formed HTTP requests.
reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')
Bonus 5 tip: From there, it's fairly straight forward 4 to convert the example code in RFC-2617 3 to python. Using python's md5 API, "MD5Init()" becomes 2 "m = md5.new()", "MD5Update()" becomes "m.update()" and 1 "MD5Final()" becomes "m.digest()".
If your response comes in a single string 2 that that never varies and has as many lines as there are expressions to match, you can split it into an array 1 on the newlines called authentication_array
and use regexps:
pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}
for line in authentication_array:
pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
match = re.search(re.compile(pattern), line) # make the match
if match:
parsed_dict[match.group(1)] = match.group(2)
i += 1
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.