Commit 17702cde authored by Alex Vandiver's avatar Alex Vandiver
Browse files

Verify that MIME::Entity bodies are bytes, and remove _utf8_off call

Use the newly-added RT::Util::assert_bytes function to verify that the
body is indeed bytes, and not characters.

We also remove the _utf8_off call -- because, contrary to what the
comment implies, the presence or absence of the "UTF8" flag does _not_
determine if a string is "encoded as octets and not as characters"; it
merely states that the string is capable of holding codepoints > 255.
If it happens to not contain any, the _utf8_off does nothing.  If it
does, it effectively encodes all codepoints > 127 in UTF-8.

Given the premise that the string contains bytes in some (probably
non-UTF-8) encoding, re-encoding some bytes of it as UTF-8 cannot
possibly produce valid output.  The flaw in this situation cannot be
fixed by a simple _utf8_off, but instead must be fixed by ensuring that
the body always contains bytes, not wide characters -- as it now does,
thanks to the prior commits.  The call to RT::Util::assert_bytes serves
as an additional safeguard against backsliding on that assumption.
parent a21eb81c
...@@ -291,13 +291,12 @@ sub SetMIMEEntityToEncoding { ...@@ -291,13 +291,12 @@ sub SetMIMEEntityToEncoding {
if ( $body && ($enc ne $charset || $enc =~ /^utf-?8(?:-strict)?$/i) ) { if ( $body && ($enc ne $charset || $enc =~ /^utf-?8(?:-strict)?$/i) ) {
my $string = $body->as_string or return; my $string = $body->as_string or return;
RT::Util::assert_bytes($string);
$RT::Logger->debug( "Converting '$charset' to '$enc' for " $RT::Logger->debug( "Converting '$charset' to '$enc' for "
. $head->mime_type . " - " . $head->mime_type . " - "
. ( Encode::decode("UTF-8",$head->get('subject')) || 'Subjectless message' ) ); . ( Encode::decode("UTF-8",$head->get('subject')) || 'Subjectless message' ) );
# NOTE:: see the comments at the end of the sub.
Encode::_utf8_off($string);
my $orig_string = $string; my $orig_string = $string;
( my $success, $string ) = EncodeFromToWithCroak( $orig_string, $charset => $enc ); ( my $success, $string ) = EncodeFromToWithCroak( $orig_string, $charset => $enc );
if ( !$success ) { if ( !$success ) {
...@@ -328,19 +327,6 @@ sub SetMIMEEntityToEncoding { ...@@ -328,19 +327,6 @@ sub SetMIMEEntityToEncoding {
} }
} }
# NOTES: Why Encode::_utf8_off before Encode::from_to
#
# All the strings in RT are utf-8 now. Quotes from Encode POD:
#
# [$length =] from_to($octets, FROM_ENC, TO_ENC [, CHECK])
# ... The data in $octets must be encoded as octets and not as
# characters in Perl's internal format. ...
#
# Not turning off the UTF-8 flag in the string will prevent the string
# from conversion.
=head2 DecodeMIMEWordsToUTF8 $raw =head2 DecodeMIMEWordsToUTF8 $raw
An utility method which mimics MIME::Words::decode_mimewords, but only An utility method which mimics MIME::Words::decode_mimewords, but only
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment