lib/RT/Util.pm · a21eb81ce17c0835988831aa44a4fb0315bf7b73 · best-practical / rt

Add a utility method to check that an input is bytes · a21eb81c

Alex Vandiver authored Aug 08, 2014

Note that it is impossible to verify that an input is characters; here,
we can only validate if it _could_ be bytes.

First, any string with the "UTF8" flag off cannot contain codepoints
above 255, and as such is safe.  Additionally, if the "UTF8" flag is on,
having no codepoints above 127 means the bytes are unambigious.  Having
codepoints above 255 is guaranteedly a sign that the input is not a byte
string.

This leaves only the case of a string with the "UTF8" flag on, and
codepoints above 127 but below 255.  The "UTF8" flag is a sign that they
were _likely_ touched by character data at some point.  In such cases we
warn, suggesting that the bytes have the "UTF8" flag disabled by means
of utf8::downgrade, if they are indeed bytes.

a21eb81c