3

HTML:

<div class="someclass">
    <h3>First</h3> 
    <strong>Second</strong> 
    <hr>
    Third
    <br>
    Fourth
    <br>
    <em></em>
    ...
</div>

From above div node I want to get all child text nodes after hr ("Third", "Fourth", ... and there might be more)

If I do

document.querySelectorAll('div.someclass>hr~*')

I get NodeList [ br, br, em, ... ] - no text nodes

With below

document.querySelector('div.someclass').textContent

I get all text nodes as single string

I can get each text node as

var third = document.querySelector('div.someclass').childNodes[6].textContent
var fourth = document.querySelector('div.someclass').childNodes[8].textContent

so I tried

document.querySelector('div.someclass').childNodes[5:]  # SyntaxError

and slice()

document.querySelector('div.someclass').childNodes.slice(5)  # TypeError

So is there any way I can get all child text nodes starting from hr node?

UPDATE

I forgot to mention that this question is about web-scraping, but not web-development... I cannot change HTML source code

BoltClock
  • 665,005
  • 155
  • 1,345
  • 1,328
Andersson
  • 49,746
  • 15
  • 64
  • 117

1 Answers1

3

You can get the content and use split with hr to get the html after the hr and then replace this content within a div and you will be able to manipulate this div to get your content:

var content = document.querySelector('.someclass').innerHTML;
content = content.split('<hr>');
content = content[1];

document.querySelector('.hide').innerHTML = content;
/**/

var nodes = document.querySelector('.hide').childNodes;
for (var i = 0; i < nodes.length; i++) {
  console.log(nodes[i].textContent);
}
.hide {
  display: none;
}
<div class="someclass">
  <h3>First</h3>
  <strong>Second</strong>
  <hr> Third
  <br> Fourth
  <br>
  <em></em> ...
</div>
<div class="hide"></div>
Temani Afif
  • 211,628
  • 17
  • 234
  • 311