6

I'm trying to split a string with binary into an array of repeated characters.

For example, an array of 10001101 split with this function would be:

    $arr[0] = '1';
    $arr[1] = '000';
    $arr[2] = '11';
    $arr[3] = '0';
    $arr[4] = '1';

(I tried to make myself clear, but if you still don't understand, my question is the same as this one but for PHP, not Python)

Community
  • 1
  • 1
R__
  • 183
  • 2
  • 9
  • 1
    Try using https://github.com/CHH/itertools/blob/master/lib/itertools.php its the same tool ported from python to php from which you have referenced. – sodhancha Oct 18 '15 at 10:46

4 Answers4

5

You can use preg_split like so:

Example:

$in = "10001101";
$out = preg_split('/(.)(?!\1|$)\K/', $in);

print_r($out);

Output:

Array
(
    [0] => 1
    [1] => 000
    [2] => 11
    [3] => 0
    [4] => 1
)

The regex:

  • (.) - match a single character and capture it
  • (?!\1|$) - look at the next position and match if it's not the same as the one we just found nor the end of the string.
  • \K - keeps the text matched so far out of the overall regex match, making this match zero-width.

Note: this does not work in PHP versions prior to 5.6.13 as there was a bug involving bump-along behavior with \K.


An alternative regex that works in earlier versions as well is:

$out = preg_split('/(?<=(.))(?!\1|$)/', $in);

This uses a lookbehind rather that \K in order to make the match zero-width.

user3942918
  • 24,679
  • 11
  • 53
  • 67
  • 1
    Aww darn, I was just a few seconds too slow. Here's mine: ```$arg="10001101"; preg_match_all("@(\w)\\1*@", $arg, $matches); print_r($matches[0]);``` – Sean Johnson Oct 18 '15 at 10:54
  • @paul : Something wrong.. look last element `Array ( [0] => 1 [1] => 000 [2] => 11 [3] => 01 )` – Subin Thomas Oct 18 '15 at 10:57
  • @SeanJohnson Works perfectly, thanks. The answer itself did not work, though. – R__ Oct 18 '15 at 10:57
2
<?php
$s = '10001101';
preg_match_all('/((.)\2*)/',$s,$m);
print_r($m[0]);
/*
Array
(
    [0] => 1
    [1] => 000
    [2] => 11
    [3] => 0
    [4] => 1
)
*/
?>

Matches repeated character sequences of 1 or more. The regex stores the subject character into the second capture group ((.), stored as $m[1]), while the first capture group contains the entire repeat sequence (((.)\2*), stored as $m[0]). With preg_match_all, it does this globally over the entire string. This can be applied for any string, e.g. 'aabbccddee'. If you want to limit to just 0 and 1, then use [01] instead of . in the second capture group.

Keep in mind $m may be empty, to first check if the result exists, i.e. isset($m[0]), before you use it.

zamnuts
  • 9,272
  • 3
  • 35
  • 45
0

I'm thinking something like this. The code id not tested, I wrote it directly in the comment, so it might have some errors, you can adjust it.

$chunks = array();
$index = 0;
$chunks[$index] = $arr[0];
for($i = 1; $i < sizeof($arr) - 1; $i++) {
  if( $arr[$i] == $arr[$i-1] ) {
    $chunks[$index] .= $arr[$i];
  } else {
    $index++;
    $chunks[$index] = $arr[$i];
  }
}
sticksu
  • 3,462
  • 3
  • 25
  • 39
0

I wouldn't bother looking for the end-of-string in the pattern.

Most succinctly, capture the first occurring character then allow zero or more repetitions of the captured character, then restart the fullstring match with \K so that no characters are lost in the explosions.

Code: (Demo)

var_export(
    preg_split('~(.)\1*\K~', '10001101', 0, PREG_SPLIT_NO_EMPTY)
);

Output:

array (
  0 => '1',
  1 => '000',
  2 => '11',
  3 => '0',
  4 => '1',
)

If you don't care for regular expressions, here is a way of iterating through each character, comparing it to the previous one and conditionally concatenating repeated characters to a reference variable.

Code: (Demo) ...same result as first snippet

$array = [];
$lastChar = null;
foreach (str_split('10001101') as $char) {
    if ($char !== $lastChar) {
        unset($ref);
        $array[] = &$ref;
        $ref = $char;
        $lastChar = $char;
    } else {
        $ref .= $char;
    }
}
var_export($array);
mickmackusa
  • 37,596
  • 11
  • 75
  • 105
  • @R__ I see that you've been online since I posted my answer. Is there any chance of you accepting my answer so that researchers can more easily find a refined solution? – mickmackusa Jul 22 '21 at 00:46