How can I determine if a string contains non-printable characters/is likely binary data?
This is for unit testing/debugging -- it doesn't need to be exact.
How can I determine if a string contains non-printable characters/is likely binary data?
This is for unit testing/debugging -- it doesn't need to be exact.
This will have to do.
function isBinary($str) {
return preg_match('~[^\x20-\x7E\t\r\n]~', $str) > 0;
}
After a few attempts using ctype_ and various workarounds like removing whitespace chars and checking for empty, I decided I was going in the wrong direction. The following approach uses mb_detect_encoding (with the strict flag!) and considers a string as "binary" if the encoding cannot be detected.
So far i haven't found a non-binary string which returns true, and the binary strings that return false only do so if the binary happens to be all printable characters.
/**
* Determine whether the given value is a binary string by checking to see if it has detectable character encoding.
*
* @param string $value
*
* @return bool
*/
function isBinary($value): bool
{
return false === mb_detect_encoding((string)$value, null, true);
}
To search for non-printable characters, you can use ctype_print (http://php.net/manual/en/function.ctype-print.php).
From Symfony database debug tool:
if (!preg_match('//u', $params[$index])) // the string is binary
Detect if a string contains non-Unicode characters.
A hacky solution (which I have seen quite often) would be to search for NUL \0 chars.
if (strpos($string, "\0")===FALSE) echo "not binary";
A more sophisticated approach would be to check if the string contains valid unicode.
I would use a simple ctype_print. It works for me:
public function is_binary(string $string):bool
{
if(!ctype_print($string)){
return true;
}
return false
}
My assumption is that what the OP wants to do is the following:
$hex = hex2bin(“0588196d706c65206865782064617461”);
// how to determine if $hex is a BINARY string or a CHARACTER string?
Yeah, this is not possible. Let’s look at WHY:
$string = “1234”
In binary this would be 31323334. Guess what you get when you do the following?
hex2bin(‘31323334’) == ‘1234’
You get true. But wait, you may be saying, I specified the BINARY and it should be the BINARY 0x31 0x32 0x33 0x34! Yeah, but PHP doesn’t know the difference. YOU know the difference, but how is PHP going to figure it out?
If the idea is to test for non-printable because reasons, that’s quite different. But no amount of Regex voodoo will allow the code to magically know that YOU want to think of this as a string of binary.
TRy a reg exp replace, replacing '[:print:]' with "", and if the result is "" then it contains only printable characters, else it contains non-printable characters as well.