Method to parse HTML document in Ruby?

Question

like DOMDocument class in PHP, is there any class in RUBY (i.e the core RUBY), to parse and get node elements value from a HTML Document.

score 48 · Accepted Answer · edited Sep 10 '12 at 05:40

48

There is no built-in HTML parser (yet), but some very good ones are available, in particular Nokogiri.

Meta-answer: For common needs like these, I'd recommend checking out the Ruby Toolbox site. You'll notice that Nokogiri is the top recommendation for HTML parsers

edited Sep 10 '12 at 05:40

the Tin Man

155,156
41
207
295

answered Mar 31 '10 at 17:16

Marc-André Lafortune

75,965
16
156
164

score 9 · Answer 2 · answered Mar 31 '10 at 17:04

9

You should check out hpricot. It's exceedingly good. It's not 'core' ruby, but it's a commonly used gem.

answered Mar 31 '10 at 17:04

Peter

121,125
53
174
208

2

Hpricot sadly is no more. Nokogiri is now the preferred solution. – superluminary Oct 14 '13 at 11:27

score 5 · Answer 3 · answered Aug 06 '15 at 14:04

5

You can also try Oga by Yorick Peterse.

It is an XML/HTML parser written in Ruby that does not require system libraries such as libxml. You can find it here. https://github.com/YorickPeterse/oga

answered Aug 06 '15 at 14:04

microspino

7,584
3
45
47

dineshsprabu · Answer 4 · 2017-02-11T08:02:20.873

4

Ruby Cheerio - A jQuery style HTML parser in ruby. A most simplified version of Nokogiri for crawlers. This is the ruby version of most popular NodeJS package cheerio.

Follow the link for a simple crawler example.

gem install ruby-cheerio

require 'ruby-cheerio'

jQuery = RubyCheerio.new("<html><body><h1 class='one'>h1_1</h1><h1>h1_2</h1></body></html>")

jQuery.find('h1').each do |head_one|
    p head_one.text
end

# getting attribute values like jQuery.
p jQuery.find('h1.one')[0].prop('h1','class')

# function chaining similar to jQuery.
p jQuery.find('body').find('h1').first.text

edited Feb 11 '17 at 08:02

answered Feb 08 '17 at 16:42

dineshsprabu

135
1
4

Very good approach! Nice recommendation! Thanks @dineshsprabu. – Fernando Kosh Apr 18 '17 at 19:22
Thanks Fernando Kosh – dineshsprabu Apr 19 '17 at 07:29

Method to parse HTML document in Ruby?

4 Answers4

Linked

Related