Commit a21eb81c authored by Alex Vandiver's avatar Alex Vandiver
Browse files

Add a utility method to check that an input is bytes

Note that it is impossible to verify that an input is characters; here,
we can only validate if it _could_ be bytes.

First, any string with the "UTF8" flag off cannot contain codepoints
above 255, and as such is safe.  Additionally, if the "UTF8" flag is on,
having no codepoints above 127 means the bytes are unambigious.  Having
codepoints above 255 is guaranteedly a sign that the input is not a byte

This leaves only the case of a string with the "UTF8" flag on, and
codepoints above 127 but below 255.  The "UTF8" flag is a sign that they
were _likely_ touched by character data at some point.  In such cases we
warn, suggesting that the bytes have the "UTF8" flag disabled by means
of utf8::downgrade, if they are indeed bytes.
parent 12c2671c
......@@ -133,6 +133,23 @@ sub mime_recommended_filename {
sub assert_bytes {
my $string = shift;
return unless utf8::is_utf8($string);
return unless $string =~ /([^\x00-\x7F])/;
my $msg;
if (ord($1) > 255) {
$msg = "Expecting a byte string, but was passed characters";
} else {
$msg = "Expecting a byte string, but was possibly passed charcters;"
." if the string is actually bytes, please use utf8::downgrade";
$RT::Logger->warn($msg, Carp::longmess());
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment