Pragmatic Unicode, or, How do I stop the pain?
At some point the following started to happen in my small httplib2-based script:
Traceback: ... File "/usr/lib/python2.7/httplib.py", line 996, in _send_request self.endheaders(body) File "/usr/lib/python2.7/httplib.py", line 958, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 816, in _send_output msg += message_body UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 245: ordinal not in range(128)
I know that earlier I would start putting .encode()
and .decode()
randomly to
make it run but now I am much better at understanding the reason of the failure
after watching an awesome talk by Net Batchelder titled “Pragmatic Unicode, or,
How do I stop the pain?”
Now it took me mere seconds to find the reason. In the traceback above,
UnicodeDecodeError
was raised because msg
was already a unicode
object, and
message_body
was a str
. It happened because the URL supplied to the request
method was unicode
. Python 2.7 was trying to concatenate unicode
and str
,
decided that it’s best way to make message_body
a unicode
string using the
default ascii
encoding, but the content was full of symbols outside ASCII
space. Converting URL to str fixed the issue as URLs are not good candidates to
be passed around decoded.