6

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.

$string = "http://example.com/foo/12/jacket Input/Output";
    match------------------------^--^

The length of the words between slashes should not matter.

Regex: (?<=.com\/\w)(\/) results:

$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";   
    matches--------------------^

Regex: (?<=\/\w)(\/) results:

$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
    matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
    matches--------------------^-^--------------^                    

Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.

Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?

NOTE: tagged with PHP because the regex should work in any of the preg_* functions.

Jay Blanchard
  • 33,530
  • 16
  • 73
  • 113
  • 2
    The`preg_match` function returns one match. You say you need to match all characters there are after some pattern. You should use `preg_match_all`. – Wiktor Stribiżew Feb 11 '16 at 17:58
  • I still have an impression it is an XY problem. What are you trying to achieve? Why match those slashes? You could url_parse the URL, and then do whatever you please. Explode, e.g. – Wiktor Stribiżew Feb 11 '16 at 18:29
  • No, it isn't an XY problem @WiktorStribiżew as the regex should work in *any* of the `preg_*` functions. – Jay Blanchard Feb 11 '16 at 18:42

3 Answers3

3

Use \K here along with \G.grab the groups.

^.*?\.com\/\w+\K|\G(\/)\w+\K

See demo.

https://regex101.com/r/aT3kG2/6

$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m"; 
$str = "http://example.com/foo/12/jacket Input/Output"; 

preg_match_all($re, $str, $matches);

Replace

$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m"; 
$str = "http://example.com/foo/12/jacket Input/Output"; 
$subst = "|"; 

$result = preg_replace($re, $subst, $str);
vks
  • 65,133
  • 10
  • 87
  • 119
  • For whatever reason this is not working in the context of `preg_match()` – Jay Blanchard Feb 11 '16 at 17:52
  • OK - `preg_match_all()` works but other `preg....` functions fail - like `preg_replace()`. – Jay Blanchard Feb 11 '16 at 18:03
  • @JayBlanchard the only problem with replace is one extra replacement will happend at the end....guess that will have to be dealt separately – vks Feb 11 '16 at 18:06
  • I got this as the return `http://example.com/foo|/12/jacket Input/Output`using your code. Note that none of the slashes have been removed, but a pipe has been added. Even in your substitution example the slashes are left in place. I *really* appreciate your help with this, it seems that we're both beating our heads against the wall. One of the +1's is mine. – Jay Blanchard Feb 11 '16 at 18:08
  • @JayBlanchard we can do it like this ...u just need to remove an extra `|` later. https://regex101.com/r/aT3kG2/8 – vks Feb 11 '16 at 18:16
  • Can you also copy those regex's into your answer. They may not stay on regex101 for ever! Which would make your answers useless to others – RiggsFolly Feb 11 '16 at 18:37
  • That last example you provded added spaces and an extra pipe @vks `http://example.com/foo||12|jacket|wow` – Jay Blanchard Feb 11 '16 at 18:46
3

If you want to use preg_replace then this regex should work:

$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output

Thus replacing each / by a | after first / that appears after starting .com.

Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.

RegEx Demo

anubhava
  • 713,503
  • 59
  • 514
  • 593
1

Another \G and \K based idea.

$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';

See demo at regex101

Community
  • 1
  • 1
bobble bubble
  • 11,625
  • 2
  • 24
  • 38