[ACCEPTED]-Regex: How to match sequence of key-value pairs at end of string-key-value
The negative zero-width lookahead is (?!pattern)
.
It's 15 mentioned part-way down the re module documentation page.
(?!...)
Matches if 14 ... doesn’t match next. This is a negative 13 lookahead assertion. For example, Isaac 12 (?!Asimov) will match 'Isaac ' only if it’s 11 not followed by 'Asimov'.
So you could use 10 it to match any number of words after a 9 key, but not a key using something like 8 (?!\S+:)\S+
.
And the complete code would look like this:
regex = re.compile(r'''
[\S]+: # a key (any word followed by a colon)
(?:
\s # then a space in between
(?!\S+:)\S+ # then a value (any word not followed by a colon)
)+ # match multiple values if present
''', re.VERBOSE)
matches = regex.findall(my_str)
Which 7 gives
['key1: val1-words ', 'key2: val2-words ', 'key3: val3-words']
If you print the key/values using:
for match in matches:
print match
It 6 will print:
key1: val1-words
key2: val2-words
key3: val3-words
Or using your updated example, it 5 would print:
Thème: O sombres héros
Contraintes: sous titrés
Author: nicoalabdou
Tags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise
Posted: 06 June 2009
Rating: 1.3
Votes: 3
You could turn each key/value 4 pair into a dictionary using something like 3 this:
pairs = dict([match.split(':', 1) for match in matches])
which would make it easier to look 2 up only the keys (and values) you want.
More 1 info:
- Python re module documentation
- Python Regular Expression HOWTO
- Perl Regular Expression Reference "perlreref"
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.