1

I am trying to extract list of DIVs whose class = "child", and associate the "child" with a DIV whose class = "header" that occur before it.

For example:

<div class=header>HEADER A</div>
<div class=child>CHILD A.1</div>
<div class=child>CHILD A.2</div>
<div class=child>CHILD A.3</div>
<div class=header>HEADER B</div>
<div class=child>CHILD B.1</div>
<div class=child>CHILD B.2</div>
<div class=child>CHILD B.3</div>

I expect to have something like below

HEADER A --> CHILD A.1
HEADER A --> CHILD A.2
HEADER A --> CHILD A.3
HEADER B --> CHILD B.1
HEADER B --> CHILD B.2
HEADER B --> CHILD B.3
Phrogz
  • 284,740
  • 104
  • 634
  • 722
iwan
  • 6,759
  • 15
  • 47
  • 63
  • Note that if you convert your HTML to use semantic headers (e.g. `

    `) I have [an answer here](http://stackoverflow.com/questions/7827562/use-xpath-to-group-siblings-from-an-html-xml-document/7829248#7829248) that automatically groups headers and following siblings into sections.

    – Phrogz Oct 26 '11 at 13:34
  • …and you can easily switch to them via: `doc.css('div.header').each{ |head| head.name = 'h1' }` ;) – Phrogz Oct 27 '11 at 03:14

2 Answers2

2

Just store the previous header element:

header = ""
xml.xpath("//div").each{ |node|
  if node['class'] =~ /header/
    header = node.text
  else
    puts header + " --> " + node.text
  end
}
Tatu Lahtela
  • 4,484
  • 29
  • 29
  • thanks Tatu, i believed your suggestion work for simplified case as in question description. cheers – iwan Oct 25 '11 at 12:36
2

A more 'xpathy' version:

doc.xpath('//div[@class="child"]').each do |node|
    header = node.at('./preceding-sibling::div[@class="header"][1]')
    puts header.text + " --> " + node.text
end
pguardiario
  • 51,516
  • 17
  • 106
  • 147