2

I'm struggling with finding the right regex for parsing a string containing key/value pairs. The string should be split on space when not surrounded by double quotes.

Example string:

2013-10-26    15:16:38:011+0200 name="twitter-message" from_user="MyUser" in_reply_to="null" start_time="Sat Oct 26 15:16:21 CEST 2013" event_id="394090123278974976" text="Some text" retweet_count="1393"

Desired output should be

2013-10-26
15:16:38:011+0200
name="twitter-message"
from_user="MyUser" 
in_reply_to="null" 
start_time="Sat Oct 26 15:16:21 CEST 2013" 
event_id="394090123278974976" 
text="Some text" 
retweet_count="1393"

I found this answer to get me near the desired result Regex for splitting a string using space when not surrounded by single or double quotes with regex :

Matcher m = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'").matcher(str);
        while (m.find())
            list.add(m.group());

This gives a list of:

2013-10-26
15:16:38:011+0200
name=
"twitter-message"
from_user=
"MyUser"
in_reply_to=
"null"
start_time=
"Sat Oct 26 15:16:21 CEST 2013"
event_id=
"394090123278974976"
text=
"Some text"
retweet_count=
"1393"

It splits on = sign so there is still something missing to get to the desired output.

Community
  • 1
  • 1
Preben
  • 63
  • 1
  • 9

3 Answers3

0

Try: Matcher m = Pattern.compile("(?:[^\\s\"']|\"[^\"]*\"|'[^']*')+").matcher(str);

Your original regex could be understood as "match either a series of non-whitespace characters, or a quoted string". This one is "match a series of either non-whitespace characters or quoted strings".

pobrelkey
  • 5,768
  • 18
  • 29
0

Try maybe with this

[^\\s=]+(=\"[^\"]+\")?
  • [^\\s=]+ will find everything that is not space or = so for start_time="Sat Oct 26 15:16:21 CEST 2013" it will match start_time part.
  • (=\"[^\"]+\")? is optional and it will match ="zzz" part (where z can't be ")

Example

Matcher m = Pattern.compile("[^\\s=]+(=\"[^\"]+\")?").matcher(str);
while (m.find())
    System.out.println(m.group());

Output:

2013-10-26
15:16:38:011+0200
name="twitter-message"
from_user="MyUser"
in_reply_to="null"
start_time="Sat Oct 26 15:16:21 CEST 2013"
event_id="394090123278974976"
text="Some text"
retweet_count="1393"
Pshemo
  • 118,400
  • 24
  • 176
  • 257
0

This should work for you:

// if your string is str

// split on space if followed by even number of quotes
String[] arr = str.split(" +(?=(?:([^\"]*\"){2})*[^\"]*$)");
for (String s: arr)
   System.out.printf("%s%n", s);

OUTPUT:

2013-10-26
15:16:38:011+0200
name="twitter-message"
from_user="MyUser" 
in_reply_to="null" 
start_time="Sat Oct 26 15:16:21 CEST 2013" 
event_id="394090123278974976" 
text="Some text" 
retweet_count="1393"
anubhava
  • 713,503
  • 59
  • 514
  • 593