lib/RT/Record.pm · 18c810d0822b5f433a9c7a070d90635092304171 · best-practical / rt

Respect the database Content-Type header in decoding textual parts · 18c810d0

Alex Vandiver authored May 12, 2014

The definition of "texual" data has changed over time.  Specifically,
7365c08a caused text/html messages to begin being stored as utf-8 in the
database; prior to that, the claimed "charset" and bytes in the body
were left unmolested during insertion.

text/html attachments inserted into the database prior to 7365c08a,
however, are now expected to be utf-8 when being extracted from the
database.  This causes PERLQQ'd garbage to be displayed for the non-UTF8
content stored in the database.  This type of error is likely to also
re-occur in the future whenever the definition of "textual" data
(i.e. data we transcode on insertion) changes.

Respect the Content-Type header when decoding data from the database, or
guess its value from the body; this mirrors the logic in
RT::I18N::SetMIMEEntityToEncoding, which is what is done for
currently-detected-as-textual parts on insert.  In cases like text/html
prior to 7365c08a, the Content-Type header was not altered during
database insertion -- and at worst, the claimed character set is
incorrect and decoding will result in PERLQQ'd garbage.  This is no
worse than said message were detected, received, converted, and stored
in the database as text.

18c810d0

Admin message