2

Is there a web crawler library available for PHP or Ruby? a library that can do it depth first or breadth first... and handle the links even when href="../relative_path.html" and base url is used.

nonopolarity
  • 139,253
  • 125
  • 438
  • 698

5 Answers5

5

http://phpcrawl.cuab.de/

ist_lion
  • 3,047
  • 5
  • 40
  • 72
3

Check this page out for a Ruby library: Ruby Mechanize

I'd like to mention that you would still be responsible for the way in which your crawler traverses sites.

AlbertoPL
  • 11,426
  • 5
  • 45
  • 73
0

If you'd like to learn basic web crawler & search things, you can start look at "luna engine".

Happyday
  • 31
  • 5
0

If you need to scrape web pages that use javascript you can use Capybara with a driver which will spin up a real browser, such as poltergeist. Its usually used with a testing framework for acceptance testing, but can also be used outside a testing framework.

Kris
  • 18,210
  • 7
  • 86
  • 105
0

you can go for webrat or watir in ruby, much easier than mechanize

fenec
  • 5,347
  • 10
  • 52
  • 80