-6

I am a novice using C# to scrape sites. I understand how to find hrefs and how to handle really simple tables.

Now I want to parse this .. and just pick out the first text i.e. 'office manager' and the href.

<tr>
  <td>Office Manager</td>
  <td>Office & Admin</td>
  <td>Cambridge</td>
  <td class="btn-wrapper desktop-btn"><a href="http://www.itoworld.com/office-manager/" class="std-btn">Find out more</a></td>
</tr>
<tr class="mobile-btn">
  <td colspan="3" class="btn-wrapper"><a href="http://www.itoworld.com/office-manager/" class="std-btn">Find out more</a></td>
</tr>

Also can folk recommend a site where I can learn my way into the world of nodes, tds and trs?

GSerg
  • 73,524
  • 17
  • 153
  • 317
Peter
  • 1
  • 2

2 Answers2

0

You may use CsQuery library (available in nuget) to parse HTML using jQuery syntax:

var page = new CQ(html);
var firstManagerHref = page.Find("a.std-btn:first()").Attr("href");
opewix
  • 4,825
  • 1
  • 16
  • 42
-1

If you want to retrieve information from HTML I'd recommend using a library like this one:

http://html-agility-pack.net/

FunkyPeanut
  • 1,097
  • 2
  • 9
  • 28