[ACCEPTED]-Weird characters in string-python
You have UTF-8 encoded data. You could decode 12 the data:
with open(filename) as f:
for line in f:
print line.decode('utf8')
or use io.open()
to have Python decode the 11 contents for you, as you read:
import io
with io.open(filename, encoding='utf8') as f:
for line in f:
print line
Your data, decoded:
>>> print 'tamb\xc3\xa9m'.decode('utf8')
também
>>> print 'f\xc3\xbcr'.decode('utf8')
für
>>> print 'cari\xc3\xb1o'.decode('utf8')
cariño
You 10 appear to have printed string representations, (the output 9 of the repr()
function), which produces string 8 literal syntax suitable for pasting back 7 into your Python interpreter. \xhh
hex codes 6 are used for characters outside of the printable 5 ASCII range. Python containers such as list
or 4 dict
also use repr()
to show their contents, when 3 printed.
You may want to read up on Unicode, and 2 how it interacts with Python. See:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel 1 Spolsky
Pragmatic Unicode by Ned Batchelder
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.