1. 03 Sep, 2014 18 commits
    • Alex Vandiver's avatar
      Add RT::Util::assert_bytes checks to _EncodeLOB and _DecodeLOB · 9cc181ba
      Alex Vandiver authored
      929b4231 specifically documented _EncodeLOB and _DecodeLOB to take
      bytes; enforce that by checking the arguments they are passed using
    • Alex Vandiver's avatar
      Always decode data in %ARGS as UTF-8 in DecodeArgs · 3ac9388d
      Alex Vandiver authored
      There is no need to check is_utf8 on the arguments passed to DecodeArgs,
      as it is the first and only line of decoding of arguments over HTTP.
    • Alex Vandiver's avatar
      Move comment from PreprocessTimeUpdates to DecodeArgs, where it belongs · f67c72a2
      Alex Vandiver authored
      c95221e4 moved this comment from html/autohandler, but associated it
      with the code it was above rather than the code above it.  Move it to
      where it belongs, and update it slightly.
    • Alex Vandiver's avatar
      _utf8_on in EncodeToMIME is needless and incorrect; remove it · 2be0797a
      Alex Vandiver authored
      66930fd8 switched from an explicit _utf8_off to an explicit _utf8_on, in
      an attempt to switch from splitting on bytes to splitting on characters.
      However, the "UTF8" flag does not magically determine if a string is
      bytes or characters.  Instead, only consistency in calling convention
      can do so.  All callsites of RT::Interface::Email::EncodeToMIME and
      RT::Action::SendEmail::MIMEEncodeString now pass character strings; all
      that _utf8_on can do is incorrectly "decode" those strings as UTF-8 if
      they happen to not have the "UTF8" flag set.
    • Alex Vandiver's avatar
      Tests: WWW::Mechanize correctly returns characters now · b2db8fc6
      Alex Vandiver authored
      While there ay have been bugs surrounding encodings in LWP or
      WWW::Mechanize previously, the ->content method correctly returns
      characters and not bytes for all modern versions.  Remove the explicit
    • Alex Vandiver's avatar
      Dashboard: decode bytes in query parameters into characters · fb58e26e
      Alex Vandiver authored
      The dashboard mailer code replicates many of the original steps of Mason
      parameter parsing -- but omitted the important step of decoding the
      bytes once they had been un-URI-encoded.
    • Alex Vandiver's avatar
      Remove remaining cases of "use utf8" · 7548587b
      Alex Vandiver authored
      All remaining cases of "use utf8" lie in the testsuite.  As "use utf8"
      changes the semantics of dealing with Unicode strings, remove it to
      allow programmers to always assume that literals are interpreted as
      bytestrings, not characters.  To do otherwise means that one must always
      ask if "use utf8" is in scope before performing operations on any
      literals; instead, simply make the encodings and decodings explicit.
      Note that wide characters may appear in editors, and that the encoding
      of the characters on _disk_ will always be UTF-8.  The removal of "use
      utf8" merely means that perl will generate a two-byte string from "é",
      and not a one-character string.
    • Alex Vandiver's avatar
      Remove "use utf8" from RT::I18N::fr, making NBSP explicit · ed0458d7
      Alex Vandiver authored
      "use utf8" causes the sourcecode (including all strings) to be
      interpreted by perl as characters encoded in UTF-8, not bytes.  In
      lib/RT/I18N/fr.pm, this was being used to substitude codepoint 160
      (NO-BREAK SPACE, U+00A0) for commas.  The fact that the space character
      was not 0x20, but rather 0xA0, was mostly hidden by use of "use utf8".
      Remove the "use utf8" and make the replacement character clear.
    • Alex Vandiver's avatar
      Standardize on the stricter Encode::encode("UTF-8", ...) everywhere · 1d18663b
      Alex Vandiver authored
      This is not only for code consistency, but also for consistency of
      output.  Encode::encode_utf8(...) is equivalent to
      Encode::encode("utf8",...) which is the non-"strict" form of UTF-8.
      Strict UTF-8 encoding differs in that (from `perldoc Encode`):
          ...its range is much narrower (0 ..  0x10_FFFF to cover only 21 bits
          instead of 32 or 64 bits) and some sequences are not allowed, like
          those used in surrogate pairs, the 31 non-character code points
          0xFDD0 .. 0xFDEF, the last two code points in any plane (0xXX_FFFE
          and 0xXX_FFFF), all non-shortest encodings, etc.
      RT deals with interchange with databases, email, and other systems.  In
      dealing with encodings, it should ensure that it does not produce byte
      sequences that are invalid according to official Unicode standards.
    • Alex Vandiver's avatar
      Verify that MIME::Entity headers are bytes, and remove _utf8_off call · ba110857
      Alex Vandiver authored
      See the prior commit for reasoning, which applies just as much to the
      header as the body.
    • Alex Vandiver's avatar
      Verify that MIME::Entity bodies are bytes, and remove _utf8_off call · 17702cde
      Alex Vandiver authored
      Use the newly-added RT::Util::assert_bytes function to verify that the
      body is indeed bytes, and not characters.
      We also remove the _utf8_off call -- because, contrary to what the
      comment implies, the presence or absence of the "UTF8" flag does _not_
      determine if a string is "encoded as octets and not as characters"; it
      merely states that the string is capable of holding codepoints > 255.
      If it happens to not contain any, the _utf8_off does nothing.  If it
      does, it effectively encodes all codepoints > 127 in UTF-8.
      Given the premise that the string contains bytes in some (probably
      non-UTF-8) encoding, re-encoding some bytes of it as UTF-8 cannot
      possibly produce valid output.  The flaw in this situation cannot be
      fixed by a simple _utf8_off, but instead must be fixed by ensuring that
      the body always contains bytes, not wide characters -- as it now does,
      thanks to the prior commits.  The call to RT::Util::assert_bytes serves
      as an additional safeguard against backsliding on that assumption.
    • Alex Vandiver's avatar
      Add a utility method to check that an input is bytes · a21eb81c
      Alex Vandiver authored
      Note that it is impossible to verify that an input is characters; here,
      we can only validate if it _could_ be bytes.
      First, any string with the "UTF8" flag off cannot contain codepoints
      above 255, and as such is safe.  Additionally, if the "UTF8" flag is on,
      having no codepoints above 127 means the bytes are unambigious.  Having
      codepoints above 255 is guaranteedly a sign that the input is not a byte
      This leaves only the case of a string with the "UTF8" flag on, and
      codepoints above 127 but below 255.  The "UTF8" flag is a sign that they
      were _likely_ touched by character data at some point.  In such cases we
      warn, suggesting that the bytes have the "UTF8" flag disabled by means
      of utf8::downgrade, if they are indeed bytes.
    • Alex Vandiver's avatar
      Make RT::Action::SendEmail->SetHeader take characters, not bytes · 12c2671c
      Alex Vandiver authored
      This helper method is used in a number of places in
      RT::Action::SendEmail, often without remembering that it should be
      passed bytes, not characters.  Change it to always take characters, and
      modify the two callsite which (correctly) passed it bytes to no longer
      do so.
    • Alex Vandiver's avatar
      Ensure all MIME::Entity headers are UTF-8 encoded bytes · 41d084f1
      Alex Vandiver authored
      Placing wide characters into MIME::Entity objects can lead to
      double-encoding, as discovered most recently in d469cacc.  Explicitly
      decode all headers as UTF-8 when retrieving them with ->get(), and
      encode them as UTF-8 before updating them with ->set() or ->replace().
      This also applies to headers passed to ->build().  The only exceptions
      to this are fixed strings in the source (which, in the absence of "use
      utf8", are always bytes).
      While the majority of these headers will never have wide characters in
      them, always decoding and encoding ensures the proper disipline to
      guarantee that strings with the "UTF8" flag do not get placed in a
      header, which can cause double-encoding.
    • Alex Vandiver's avatar
      Ensure all MIME::Entity bodies are UTF-8 encoded bytes · 6d9bd63c
      Alex Vandiver authored
      Placing wide characters into MIME::Entity objects can lead to
      double-encoding.  Always treat them as byte stores, encoding as UTF-8
      and noting their character set.
      In the case of Approvals/index.html, there was no need for an explicit
      MIME::Entity object; ->Correspond creates one as needed from a "Content"
    • Alex Vandiver's avatar
      The alluded-to deficiency is not a concern in perl ≥ 5.8.3 · 18ef9b24
      Alex Vandiver authored
      The comment and code were added in RT 2.1.38, which only required perl
      5.6.1; the perl version was increased to 5.8.3 to cover a large number
      of encoding bugs, such as the one this comment was likely alluding to.
    • Alex Vandiver's avatar
      Always log bytes, not characters · a275a7fa
      Alex Vandiver authored
      Ensure that we always send UTF-8 encoded bytes to loggers, and not wide
      characters.  This is correctly done via an explicit call to
      Encode::encode, and not via checks of utf8::is_utf8 (which may be false
      for character strings with codepoints > 127 but < 256), and not via
      _utf8_off (which would fail similarly for such characters).
    • Alex Vandiver's avatar
      Modernize and condense t/mail/sendmail.t and t/mail/sendmail-plaintext.t · 15dde68b
      Alex Vandiver authored
      t/data/emails/text-html-in-russian was removed because the original
      purpose of the test was removed in 46fd04d9, after 90f9c190 stopped
      attaching text/html incoming mail to autoreplies.
  2. 02 Sep, 2014 4 commits
  3. 28 Aug, 2014 1 commit
    • Kevin Falcone's avatar
      Allow modification of which types of links are shown · 191f789c
      Kevin Falcone authored
      There's a callback to add data at the end of the Links portlet, but
      nothing to allow you to change what's display.  This provides a callback
      that would let you remove (for example) the Parents from the listing, so
      that you can list them in a separate portlet later, with a custom
  4. 27 Aug, 2014 4 commits
  5. 25 Aug, 2014 2 commits
  6. 19 Aug, 2014 2 commits
  7. 18 Aug, 2014 6 commits
  8. 15 Aug, 2014 3 commits