0

I want to find and replace values like:

<TAG>heading<foo></foo></TAG><foo>juergen</foo>

goal:

<TAG>heading</TAG><foo>juergen</foo>

I want to remove the <foo> Tags between <TAG></TAG>

Here is my attempt:

replaceAll("</?foo\\b[^>]*>", "");
Dave Newton
  • 156,572
  • 25
  • 250
  • 300
user1181110
  • 57
  • 1
  • 7
  • 1
    Nice attempt. What's going wrong? – jlordo May 27 '13 at 22:54
  • all foo Tags are deleted. but i need only to delete the tags between – user1181110 May 27 '13 at 22:55
  • 1
    Use an XML Parser for that problem. Regex is not the right tool for that job. – jlordo May 27 '13 at 22:57
  • 2
    You aren't trying to [parse HTML with regex](http://stackoverflow.com/a/1732454/712765), are you? – Old Pro May 27 '13 at 22:58
  • 1
    [Please stop generically whining about things that very often have a perfectly valid use case](http://stackoverflow.com/a/1733489/1729885). Parsing HTML with a regex is sometimes a good idea, sometimes not, stop trying to pass it off as evil by definition. – Niels Keurentjes May 27 '13 at 23:36

3 Answers3

1

Assuming that foo is empty, you can use:

<([^/][^>]*)></\1>

This searches for an opening tag with an adjacent closing tag of the same name.

You could augment it to allow for whitespace in the middle with:

<([^/][^>]*)>\s*</\1>
Dancrumb
  • 25,148
  • 10
  • 67
  • 128
1

Possible duplicate RegEx match open tags except XHTML self-contained tags

Otherwise, here is the regex, do not even ask me to explain, I barely know myself (this is in javascript, some corrections may need to be made for java):

var txt = "<TAG>a<foo>b</foo>c</TAG>d<foo>e</foo>f<TAG>g<foo>h</foo>i</TAG>j<TAG>k</TAG>";
var res = txt.replace(/(<TAG>.*?)<foo>.*?<\/foo>(.*?<\/TAG>)/gm,"$1$2");
//                     (   $1   )               (    $2    )
Community
  • 1
  • 1
Isaac
  • 10,812
  • 5
  • 30
  • 43
1
String result = searchText.replaceAll("(<f.*><.*o>)(?=<)", "");