0

I'm trying to scrape 1265 html files at once to get name and descriptions of items I have on a website.
I have permission from the wholesaler to copy there data but I don't want to spend days just to get descriptions, so is there a way to scrape the data in the following format ?

    <h1 class="CWproductName">ADINA BLACK TV UNIT</h1>

and

    <div id="CWproductInfo">


 <br />Adina Black TV Unit<br> Oak Finish<br>800W x 500D x 560H<br><br />
                <p class="CWcontShop">

what i wish to do is copy the information between

  <div id="CWproductInfo"> and <p class="CWcontShop">

so i am left with

    <h1 class="CWproductName">ADINA BLACK TV UNIT</h1>

  <br />Adina Black TV Unit<br> Oak Finish<br>800W x 500D x 560H<br><br />

but from multiple pages at once even better if it could put in to a spreadsheet

Shekhar
  • 5,069
Jamie
  • 1
  • you'll need to write a parser script, but seems trivial if you are decent with bash/powershell or python or whatever. – Frank Thomas Jul 26 '14 at 02:02

1 Answers1

1

I would try the Power Query Add-In for this - it can loop over website pages and extract data from them, as long as the pages and their URLs are consistent.

Here's an example:

http://kzhendev.wordpress.com/2014/04/14/scraping-the-web-with-power-query/

Mike Honey
  • 2,562