Commit 89a85683 authored by Alex Vandiver's avatar Alex Vandiver
Browse files

Note that HTTP output still incorrectly relies on is_utf8

Currently, any string which has the "UTF-8" flag is encoded as UTF-8
before being sent to the browser.  This requires that any output which
is binary, or has already been encoded to bytes, _not_ have the flag
accidentally set.

It also requires that all output character strings have the "UTF-8" flag
enabled; while necessary for codepoints > 255, it is not strictly
required for codepoints between 127 and 255.  As RT now consistently
uses Encode::decode() to produce character strings, which sets the
"UTF-8" flag even for characters in that range, this is likely safe.

The most correct fix would be to explicitly flag output that needs to be
encoded.  However, doing so in a backwards compatible manner is
extremely difficult; as is_utf8 is unlikely to be incorrect in this
context, the small potential additional correctness is deemed unworth
the cost of requiring all external modules to flag their binary (or
character) output as such.
parent d91b4169
...@@ -388,7 +388,11 @@ sub _psgi_response_cb { ...@@ -388,7 +388,11 @@ sub _psgi_response_cb {
$cleanup->(); $cleanup->();
return ''; return '';
} }
return utf8::is_utf8($_[0]) ? Encode::encode( "UTF-8", $_[0]) : $_[0]; # XXX: Ideally, responses should flag if they need
# to be encoded, rather than relying on the UTF-8
# flag
return Encode::encode("UTF-8",$_[0]) if utf8::is_utf8($_[0]);
return $_[0];
}; };
}); });
} }
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment