-1

I've got string <strong>Foo</strong>. I want to delete HTML tags from this string even with it's content. In this example expression must return "" (empty string). How should I do this?

Richard Sitze
  • 7,886
  • 3
  • 32
  • 47
Tony
  • 3,337
  • 14
  • 47
  • 83

2 Answers2

2

If the html you're trying to remove wouldn't have any nested html tags; here's a simple regex based solution. You can assign tag name to tag for convenience and the regex would adjust accordingly.

String tag = "strong";
String str = "This is <strong>Foo</strong>Bar.";

String regex = "<\\s*" + tag + "[^>]*>[^<]*</\\s*" + tag + "\\s*>";

System.out.println(str.replaceAll(regex, "")); // This is Bar.

The regex accommodates for any extra tag attributes like <strong class="bold"> etc. but could break if and is updated to take care of slightly ill-formatted html like unnecessary white spaces or new lines here and there.

Ravi K Thapliyal
  • 49,621
  • 9
  • 73
  • 89
0

Since you are claiming that you don't have nested tags you can try using "<([^>]+)>.*?</\\1>

String data = "bar<strong>foo</strong>yyy<strong>zzz</strong>";
System.out.println(data.replaceAll("<([^>]+)>.*?</\\1>", ""));

ouptut

baryyy 
Pshemo
  • 118,400
  • 24
  • 176
  • 257