0

I need to split the following string only the data between the "CHAR" tabs:

Input:

<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>

Expected output: Number 7015:188188

I am looking for something efficient.

Any recommendation ?

Thanks

angus
  • 3,040
  • 9
  • 38
  • 62

5 Answers5

1

It is good practice to avoid parsing XML/HTML with regex. Instead you can use proper XML parser? I like to use jsoup so here is example how it can be done with this libraryL:

String data = "<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>";

Document doc = Jsoup.parse(data, "", Parser.xmlParser());
String charText = doc.select("CHAR").text();

System.out.println(charText);

Output: Number 7015:188188

Community
  • 1
  • 1
Pshemo
  • 118,400
  • 24
  • 176
  • 257
0

I think you meant to capture the content between tags than splitting the string.

It's well known that you should NOT use a regex to parse xhtml since you can get w͈̦̝͉̬͔͕͡ͅe̴͏̰̜͖̗̤̙̖̕i̧̩̭̳̱̖̦͠ͅŗ̴̼̺̻͕̀d̶̩̖̦̖̲̣̺̫͘ ̡͇̥̩͓c͕̻̫͉̞͝ͅo̯̗͜͜͝ṇ̠͘t̛̬̮̞̥͕̙̞e̷̸̗̼͟ͅn̡͎̖̜̱͟͢t̨̙̫̻̱̺͈̗͝. Although, if you still want a regex you can use a regex like this:

<CHAR>(.*?)<\/CHAR>

Working demo

And you can have this java code:

String line = "<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>";
Pattern pattern = Pattern.compile("<CHAR>(.*?)<\\/CHAR>");
Matcher matcher = pattern.matcher(line);

String result = "";
while (matcher.find()) {
    result += matcher.group(1) + " ";
}
System.out.println(result); //Prints: Number 7015:188188

Update: as Pshemo pointed in his comment:

/ is not special character in Java regex engine. You don't have to escape it

So, you can use:

Pattern pattern = Pattern.compile("<CHAR>(.*?)</CHAR>");

Btw, I really like Pshemo answer, it's a nice approach to solve this without regex and xhtml

Community
  • 1
  • 1
Federico Piazza
  • 28,830
  • 12
  • 78
  • 116
0

In case you know the tag value is always some digit, then an optional colon with digits, and it is the only <CHAR> tag that has such a numeric value, you may want to use this regex:

 (?<=<CHAR>)\d+(?::\d+)?(?=<\/CHAR>)

Java string:

 String pattern = "(?<=<CHAR>)\\d+(?::\\d+)?(?=</CHAR>)";

Sample code:

String str = "<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>";
Pattern ptrn = Pattern.compile("(?<=<CHAR>)\\d+(?::\\d+)?(?=</CHAR>)");
Matcher matcher = ptrn.matcher(str);
if (matcher.find()) {
   System.out.println(matcher.group(0));
}

Output:

7015:188188
Wiktor Stribiżew
  • 561,645
  • 34
  • 376
  • 476
0
String s = inputString;
String result="";
while(s.indexOf("<CHAR>") != -1)
{
    result += s.substring(s.indexOf("<CHAR>") + "<CHAR>".length(), s.indexOf("</CHAR>")) + " ";
    s = s.substring(s.indexOf("</CHAR>") + "</CHAR>".length());
}

//result is now the desired output
Andy Brunner
  • 11
  • 1
  • 3
0

Regex for that is : (.*?)</CHAR>

However, it is better to use an XML parser for that.

shepard23
  • 128
  • 1
  • 10