3

I'm looking for a good open source web crawler and i found these:

DataparkSearch, GNU Wget, GRUB, Heritrix, ht://Dig, HTTrack, ICDL, mnoGoSearch, Nutch, Open Search Server, PHP-Crawler, tkWWW Robot, Scrapy, Seeks, YaCy.

But I can not decide which is the best to search for products and prices.

Does anyone have experience with web crawler and could help me?

Only a tip for where i can read something helpful will help so much.

If this question was asked in wrong StackExchange site please please tell me which one is correct.

Heberfa
  • 131
  • 2
  • 1
    wget is not a crawler - it is a software that can fetch webpages - "GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc." – elssar Mar 14 '13 at 13:52
  • If the listed do not suite your needs why not learn how to build one in python? Also @elssar is right, wget is a software for retrieving pages. If you read the wget manual you can code WGET to act like a crawler. – DᴀʀᴛʜVᴀᴅᴇʀ Mar 14 '13 at 14:58
  • What feature do you need that these don't have? For example do you need it to be directed to only the "product pages" of a website so that it doesn't crawl the entire site? Do you need a crawler that can extract the product and price data from the pages into a more machine readable format? – Stephen Ostermiller Mar 14 '13 at 15:03
  • 1
    it depends what you would like to do with it? scrape? in what language do you want to develop or are you looking for a search engine, complete with crawler, like Sphider? – David K. Mar 14 '13 at 15:04
  • 1
    elssar: wget has a recursive option that does turn it into a crawler. – Stephen Ostermiller Mar 15 '13 at 02:25

0 Answers0