2

I have the following string

huile contains rgbgrbrb9gr && huile contains fcecec

I use this regex in order to capture a block of condition:

(.+) (contains) (.+)

It works with one block "huile contains rgbgrrb9gr" but if i add another condition with && or || operator, the two operators are captured. What i'm expecting to capture if the two blocks excluding && and || operator.

Can someone have any idea how to achieve this?

Yanis600
  • 31
  • 7
  • What is the actual output you want here? – Tim Biegeleisen Dec 24 '21 at 10:59
  • First output: huile contains rgbgrbrb9gr Second output: huile contains fcecec – Yanis600 Dec 24 '21 at 11:00
  • **Side note**: regex is only suitable for matching lexical tokens. If you have a context-free grammar that you need to parse, you need to look for tools such as yacc or bison. – DannyNiu Dec 24 '21 at 11:10
  • Additionally you might want to indicate which dialect of regex you're using. JavaScript RegExp? Perl-compatible? POSIX? – DannyNiu Dec 24 '21 at 11:52
  • i'm working on Qt so i'm using qregularexpression – Yanis600 Dec 24 '21 at 11:55
  • @Yanis600 Do you want to prevent matching `&&` and `||` or do you not want to match a `&` char and a `|` char? Can you add to your question an example what should match and what should not match? Do you want 3 capture groups in the result, or only matches? – The fourth bird Dec 24 '21 at 12:33
  • I have the following filter string as an example: 'oil & blah'blah' contains 'oil&blah'blah' && 'oil & blah'blah' contains 'oil blah's' What i want to catch is the following patterns 'string' contains 'substring' the ' && ' or ' || ' must be excluded, and the pattern mentionned above has three matches The & and | must only be captured in string i want to search and the string where to search – Yanis600 Dec 24 '21 at 12:42
  • 1
    @Yanis600 Perhaps like this? https://regex101.com/r/taFZUP/1 – The fourth bird Dec 24 '21 at 13:17
  • Wonderful, thank you so much :) I assume that regex are really hard to built. – Yanis600 Dec 24 '21 at 13:32

4 Answers4

0

Regex normally matches the longest input it finds.

You need to exclude & and | from your input, like this:

([^&|]+) (contains) ([^&|]+)

If you instead desire to exclude double-character && and ||, I suggest spliting your string based on those delimiters first, then matching using regex, as complex parsing is really beyond the realm of regex (they're grammars actually).

But, a regex solution is nontheless possible

The rough idea is that, you want to match a string with

  1. an optional prefix consisting of no & or |
  2. a single & or | followed by a non-empting string
  3. repeating 2 for non-zero number of times.

the subpattern would be something like this:

(([^&|]+)?([&|][^&|]+)+)

additionally, you'll want something like the egrep's x flag, to match the entire string, otherwise it'll be possible that an empty string turns up.

The full regex would look something like this (capture groups're re-numbered)

(([^&|]+)?([&|][^&|]+)+) (contains) (([^&|]+)?([&|][^&|]+)+)
DannyNiu
  • 880
  • 7
  • 21
  • it works but if i need to search for & or | operator, i got a wrong capture. What i need to capture is for example: 'oil & other' contains 'oil &' – Yanis600 Dec 24 '21 at 11:13
0

After reading the post comments, the desired result was more clear.

This one could work too:

(?<=^|(?:&&|\|\|) )(.+?) (contains) (.+?)(?= (?:&&|\|\|)|$)

https://regex101.com/r/YDFpN9/2

Jean Will
  • 493
  • 3
  • 10
0

What about using:

(.*?) contains (.*?)\s*(?:([|&])\3|$)\s*

See the online demo

  • (.*?) - 1st Capture group to catch whatever comes before 'contains' (lazy).
  • contains - Literally ' contains ', with leading and trailing space char.
  • (.*?) - 2nd Capture group to catch whatever comes after'contains' (lazy).
  • \s* - 0+ Space chars.
  • (?:([|&])\3|$) - A non-capture group with an alternation inside:
    • ([|&])\3 - Either a double pipe-symbol or ampersand;
    • $ - Or the end-string anchor.
  • \s* - 0+ Space chars.

Your substring will be captured in both 1st and 2nd capture group. And if you really want to capture 'contains' to then it's an easy fix inside the pattern.

JvdV
  • 53,146
  • 6
  • 36
  • 60
0

If you want 3 capture groups, you could match what you don't want first, and then capture in groups what you want to keep making use of a tempered greedy token approach to not cross matching && or || or the word contains.

\|{2,}|&{2,}|((?:(?!&&|\|\||\bcontains\b).)*) (contains) ((?:(?!&&|\|\||\bcontains\b).)*)

The pattern matches:

  • \|{2,}|&{2,} Match either 2 or more pipe chars or ampersands (what you don't want to keep)
  • | Or
  • ( Capture group 1
    • (?:(?!&&|\|\||\bcontains\b).)* Match any char except a newline if what is directly to the right is not && || or contains
  • ) Close group 1
  • (contains) Match the word contains in group 2 between spaces
  • ( Capture group 3
    • (?:(?!&&|\|\||\bcontains\b).)* Same approach as above
  • ) Close group 3

Regex demo

The fourth bird
  • 127,136
  • 16
  • 45
  • 63