0

I would like to verify the syntax of an input field with a regex. The field should accept text like the following examples:

Something=Item1,Item2,Item3
someOtherThing=Some_Item

There has to be a word, a = sign and a list of comma separated words. The list must contain at least one entry. So abc= should be invalid, but abc=123 is valid.

I am using a framework which allows a regular expression (Java) to mark the input field as valid or invalid. How can I express this rule in a regex?

With the aid of https://stackoverflow.com/a/65244969/7821336, I am able to validate the comma separated list. But as soon as I prepend my stuff with the assignment, the regex does not work any longer:

(\w+)=((?:\w+)+),?   // does not work!
mre
  • 53
  • 8
  • are empty words accepted on right side `abc=def,,,123` ? Why not: one word; `=`; one word; any number of (`,` followed by one word) – user16320675 May 30 '22 at 14:04
  • I have to process the input text anyway with more complex logic in the backend. So I don't care if there are empty values or not. I prefer a cleaner and easier to read (and easier to understand) regex. So be free to allow or deny empty values based on "easy first". – mre May 30 '22 at 14:09
  • Like `^(\w+)=(\w+)(,\w+)*$` ? – mre May 30 '22 at 14:12
  • I don't understand why all the capturing groups since you wanted to "Validate", but , despite of that, it is (almost) exactly what I meant - if you need to capture, I would use two capturing groups one before and one after the `=` (but that depends on what further processing is required) || if empties are allowed: `\w+=[\w,]+` – user16320675 May 30 '22 at 14:14
  • You are right. I had a different approach in my mind and wanted to use the same regex in the backend. But while asking my question, I realized that is much easier to cut the string at the `=` sign, then at the `,` signs in the backend. So there is really no need for capture groups any more. – mre May 30 '22 at 14:18
  • Must the first part be only letters, or can numbers be used too. Ie is `123=abc` valid? – Bohemian May 30 '22 at 15:29
  • First part must be `\w`. So `123=abc` is valid. – mre May 31 '22 at 14:10

3 Answers3

2

I used this code, but it does not use any regex. Code:

import java.util.*;

public class MyClass {

    public static void main(String[] args) {
        String something1 = "Something=Item1,Item2,Item3";
        String something2 = "Something=";
        String something3 = "Something";
        String something4 = "=Item1,Item2,Item3";
        
        System.out.println(isValid(something1));
        System.out.println(isValid(something2));
        System.out.println(isValid(something3));
        System.out.println(isValid(something4));
    }
    
    public static boolean isValid(String string) {
        
        boolean checkPart1Correct = string.contains("="); // check if it has = sign
        if(!checkPart1Correct) return false;
        
        //now we will split and see it it has items and the text before the = sign is not empty
        String[] partsOfString = string.split("=");
        if(partsOfString[0].trim().isEmpty()) return false;
        try {
            if(partsOfString[1] == null) return false;
        }catch(Exception e) {
            return false;
        }
        if(partsOfString[1] == null) return false;
        String[] items = partsOfString[1].split(",");
        if(items.length == 0) return false;
        
        //now, we will make the items into a list, and then you can do whatever you want
        List<String> itemsList = Arrays.asList(items);
        
        //you can do whatever you want with that list
        
        return true;        
    }
}

After testing it here, you can see it in action. Also, these are the checks done in this code:

  1. It will check if the text before the = sign is not empty.
  2. It will check if it has the = sign.
  3. It will check if the items are not empty
  4. It will also give us the list of the items in that list.
Sambhav. K
  • 3,046
  • 2
  • 5
  • 29
  • but the question clearly stated: "*I am using a framework which allows a regular expression (Java) to mark the input field as valid or invalid. How can I express this rule in a regex?*" – user16320675 May 30 '22 at 16:43
  • I know. But, for someone just looking for it without regex it might work well. – Sambhav. K May 30 '22 at 16:53
  • @user16320675 And, ppl might look for without regex even though it is longer if they don't understand regex well, just like me. – Sambhav. K May 30 '22 at 17:00
2

You are not repeating the comma in the group, that is why it does not work when having multiple comma separated values.

If you want to get separate matches for the key and the values, you can use the \G anchor.

(?:^(\w+)=|\G(?!^))(\w+)(?:,|$)

Explanation

  • (?: Non capture group
    • ^(\w+)= Assert start of string and capture 1+ word chars in group 1
    • | Or
    • \G(?!^) Assert the postion at the end of the previous match, not at the start
  • ) Close non capture group
  • (\w+) Capture group 2, match 1+ word characters
  • (?:,|$) Match either , or assert end of string

Regex demo | Java demo

For example:

String regex = "(?:^(\\w+)=|\\G(?!^))(\\w+)(?:,|$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String[] strings = {"Something=Item1,Item2,Item3", "someOtherThing=Some_Item", "Something="};

for (String s : strings) {
    Matcher matcher = pattern.matcher(s);

    while (matcher.find()) {
        String gr1 = matcher.group(1);
        String gr2 = matcher.group(2);

        if (gr1 != null) {
            System.out.println("Group 1: " + gr1);
        }
        if (gr2 != null) {
            System.out.println("Group 2: " + gr2);
        }
    }
}

Output

Group 1: Something
Group 2: Item1
Group 2: Item2
Group 2: Item3
Group 1: someOtherThing
Group 2: Some_Item
The fourth bird
  • 127,136
  • 16
  • 45
  • 63
1

Try this regex:

\w+=\w+(,\w+)*

which is used like this in Java:

if (input.matches("\\w+=\\w+(,\\w+)*")) {
    // input is OK
}

If the first part should not have numbers, use this instead:

[a-zA-Z_]+=\w+(,\w+)*

Or if just the first character should not be a number (ie it should be a valid Java variable name), use this:

[a-zA-Z_]\w*=\w+(,\w+)*
Bohemian
  • 389,931
  • 88
  • 552
  • 692