Is there a web crawler library available for PHP or Ruby? a library that can do it depth first or breadth first... and handle the links even when href="../relative_path.html" and base url is used.
Asked
Active
Viewed 2,283 times
5 Answers
3
Check this page out for a Ruby library: Ruby Mechanize
I'd like to mention that you would still be responsible for the way in which your crawler traverses sites.
AlbertoPL
- 11,426
- 5
- 45
- 73
0
If you'd like to learn basic web crawler & search things, you can start look at "luna engine".
Happyday
- 31
- 5
0
If you need to scrape web pages that use javascript you can use Capybara with a driver which will spin up a real browser, such as poltergeist. Its usually used with a testing framework for acceptance testing, but can also be used outside a testing framework.
Kris
- 18,210
- 7
- 86
- 105