-1

I'm trying to split the text below, so the string is split by spaces, unless the token is between quotation marks. The unexpected result is that it is also split for the . character which I do not want.

string txt = "PROGRAM \"My ETABS\" VERSION \"9.7.4\" MERGETOL 0.1";

string[] split = Regex.Matches(txt, "(\\w+|\".*?\")")
                      .Cast<Match>()
                      .Select(m => m.Value)
                      .Select(o => o.Replace("\"", ""))
                      .ToArray();

What I get:

PROGRAM  
My ETABS
VERSION 
9.7.4"  
MERGETOL
0
1

What I need:

PROGRAM  
My ETABS
VERSION 
9.7.4"  
MERGETOL
0.1
Vahid
  • 4,690
  • 11
  • 61
  • 131

1 Answers1

2

You can swap the sub expressions then substitute \S in place of \w and
it should work. (".*?"|\S+)

To do it without capturing the quotes, this "(.*?)"|(\S+) where only
one group will contain data. For this you'd need a find next until done.
Each find you can concat the two groups.

  • Since `"` is a single-character, wouldn't `("[^"]*"|\S+)` be better? – Regular Jo Feb 26 '17 at 20:37
  • @cfqueryparam - Better? I don't know. They are different. I could benchmark the two if you think that would make a difference. –  Feb 26 '17 at 20:52
  • Completed iterations: 300 / 300 ( x 1000 ) Matches found per iteration: 6 Regex1: ("[^"]*"|\S+) Elapsed Time: 2.99 s, 2988.36 ms, 2988362 µs Regex2: (".*?"|\S+) Elapsed Time: 2.87 s, 2873.48 ms, 2873477 µs –  Feb 26 '17 at 20:58
  • As the saying goes, `touché`. I thought it would be better of being able to skip the lazy quantifier. Anyway, either variant, this is the better answer imo. – Regular Jo Feb 26 '17 at 21:23