0

I don't have experience with regex in java, but I think that we can solve this using regexp and it can be easier than my examples. I have a text with double || symbols. Text can looks like:
1)aaa||bbb||ccc,
2)aaa||||ccc,
3)||bbb||ccc,
4)|| ||cccc etc.
I want to extract text after first || -bbb , andr after second || - ccc. I did:

Pattern p = Pattern.compile("||",Pattern,DOTALL);
String types[] = p.split(stringToParse);

but this is not working when string doesn't have 3 parts.

Second idea is:

Pattern p = Pattern.compile("||",Pattern,DOTALL);
Matcher m= p.matcher(strToParse);
while (m.find()) {
 System.out.println(m.group() + " " + m.start() + " " + m.end());
}

then I know when || occures and is possible to do substring. Does exist easier and simpler way to solve this problem?

tostao
  • 2,585
  • 3
  • 34
  • 58
  • 7
    DON'T! Use [HTML parser](http://stackoverflow.com/questions/2168610/which-html-parser-is-best) – Maroun Oct 31 '13 at 11:57
  • 1
    Would using `split("
    ")` on the `String` object containing the input work? This gives you an array with the tokens of text before, in between and after the tag. The downside is that you will end up with one massive token in the beginning and end, if your input is an HTML page.
    – Birb Oct 31 '13 at 12:01
  • 1
    What Maroun Maroun says, and [this](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags?lq=1). – Mena Oct 31 '13 at 12:03
  • I changed symbol from
    to || because it is not a html text.
    – tostao Oct 31 '13 at 12:59

3 Answers3

0

As above People said don't use it for HTML parser.

Pattern p = Pattern.compile("(<br>)\\w*(<br>)");
Matcher m= p.matcher(c);
while (m.find()) {
 System.out.println(m.group().replace("<br>", ""));// replace <br>.
}
Sumit Singh
  • 24,095
  • 8
  • 74
  • 100
0

This:

String[] data = { 
        "aaa||bbb||ccc", 
        "aaa||||ccc", 
        "||bbb||ccc", 
        "|| ||cccc" 
};
for (String string : data) {
    String[] split = string.split(Pattern.quote("||"));
    System.out.println("0:"+split[0] + ", 1:" + split[1] + " 2:" + split[2]);
}

gives:

0:aaa, 1:bbb 2:ccc
0:aaa, 1: 2:ccc
0:, 1:bbb 2:ccc
0:, 1:  2:cccc

Note the escaping of the regex using Pattern.quote(), as | is a special regex characters.

rzymek
  • 8,644
  • 2
  • 43
  • 56
0

You've misunderstood the docs for split. This will split the string between on stringToParse:

String types[] = between.split(stringToParse);

You probably want to split the string stringToParse on the sentinel between:

String types[] = stringToParse.split(between);

ex:

String s = "a:b:c";
String letters[] = s.split(":"); 
cs_alumnus
  • 1,459
  • 15
  • 23