0

I am trying to get the text between two tag.

<b> foo</b>bar<br/> => bar

I tried using '<b>asdasd</b>qwe<br/>'.scan(/<b>[a-zA-Z0-9]*<\/b>(.*)<br\/>/) and it gives me proper result.

but when I try this :

'<b>exclude</b>op1<br/>exclude 2<b>exclude</b>op2<br/>exclude 2<b>exclude</b>op3<br/>exclude 2'.scan(/<b>[a-zA-Z0-9]*<\/b>(.*)<br\/>/) { |ele|
puts ele
}

It matches the first <b> tag and the last <br/> tag and returns the whole string I was expecting an array of matches

Andrew Grimm
  • 74,534
  • 52
  • 194
  • 322
Gaurav Shah
  • 5,073
  • 7
  • 40
  • 70
  • 1
    Related question: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Andrew Grimm Nov 25 '11 at 06:56

2 Answers2

9

Instead of using regex on html use nokogiri:

Nokogiri::HTML.fragment(str).css('b').each do |b|
    puts b.next.text
end
pguardiario
  • 51,516
  • 17
  • 106
  • 147
8

Change (.*) to (.*?) to make it ungreedy

/<b>[a-zA-Z0-9]*<\/b>(.*?)<br\/>/

Test

[2] pry(main)> '<b>exclude</b>op1<br/>exclude 2<b>exclude</b>op2<br/>exclude 2<b>exclude</b>op3<br/>exclude 2'.scan(/<b>[a-zA-Z0-9]*<\/b>(.*?)<br\/>/) { |ele|
[2] pry(main)*   puts ele
[2] pry(main)* }  
op1
op2
op3
Dogbert
  • 200,802
  • 40
  • 378
  • 386