Skip to content
  • Alex Vandiver's avatar
    Respect the database Content-Type header in decoding textual parts · 18c810d0
    Alex Vandiver authored
    The definition of "texual" data has changed over time.  Specifically,
    7365c08a caused text/html messages to begin being stored as utf-8 in the
    database; prior to that, the claimed "charset" and bytes in the body
    were left unmolested during insertion.
    
    text/html attachments inserted into the database prior to 7365c08a,
    however, are now expected to be utf-8 when being extracted from the
    database.  This causes PERLQQ'd garbage to be displayed for the non-UTF8
    content stored in the database.  This type of error is likely to also
    re-occur in the future whenever the definition of "textual" data
    (i.e. data we transcode on insertion) changes.
    
    Respect the Content-Type header when decoding data from the database, or
    guess its value from the body; this mirrors the logic in
    RT::I18N::SetMIMEEntityToEncoding, which is what is done for
    currently-detected-as-textual parts on insert.  In cases like text/html
    prior to 7365c08a, the Content-Type header was not altered during
    database insertion -- and at worst, the claimed character set is
    incorrect and decoding will result in PERLQQ'd garbage.  This is no
    worse than said message were detected, received, converted, and stored
    in the database as text.
    18c810d0