-1


I need your help to remove all characters using a Javascript Regex in string HTML Document except <body></body> and whole string inside body tag.

I tried to use this but doesn't work:

var str = "<html><head><title></title></head><body>my content</body></html>"
str.replace(/[^\<body\>(.+)\<\\body\>]+/g,'');

I need the body content only, other option will be to use DOMParser:

var oParser = new DOMParser(str);
var oDOM = oParser.parseFromString(str, "text/xml");

But this throws an error parsing my string document loaded via Ajax.
Thanks in advance for your suggestions!

joseluisq
  • 478
  • 1
  • 6
  • 18

3 Answers3

1
var str = "<html><head><title></title></head><body>my content</body></html>"

str=str.match(/<(body)>[\s\S]*?<\/\1>/gi);

//also you can try this:
//str=str.match(/<(body)>.*?<\/\1>/gis);

Regular expression visualization

Debuggex Demo

Tim.Tang
  • 3,118
  • 1
  • 14
  • 17
1

You could try this code,

> var str = "<html><head><title></title></head><body>my content</body></html>"
undefined
> str.replace(/.*?(<body>.*?<\/body>).*/g, '$1');
'<body>my content</body>'

DEMO

Avinash Raj
  • 166,785
  • 24
  • 204
  • 249
0

You can't (or at least shouldn't) do this with replace; try match instead:

var str = "<html><head><title></title></head><body>my content</body></html>"
var m = str.match(/<body>.*<\/body>/);
console.log(m[0]); //=> "<body>my content</body>"

If you have a multiline string, change the . (which does not include \n) to [\S\s] (not whitespace OR whitespace) or something similar.

tckmn
  • 55,458
  • 23
  • 108
  • 154