0

Yesterday I had the following results when searching in Google using the some keywords.

For example:

hate (About 397,000,000 results)

Now I want to write a program that search with google some words and store the result count.

How can I do this?

Saleh
  • 2,911
  • 5
  • 33
  • 59
  • User WebRequest class to query the data, and a regular expression to get the resulting count from the html you receive. – Andrew Savinykh May 12 '11 at 09:23
  • 1
    @zespri: I am sorry, but I feel that is mandatory in this case to supply this link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 ;-) – Fredrik Mörk May 12 '11 at 09:24
  • Is there any better way? – Saleh May 12 '11 at 09:25
  • @LightWing Please, edit your question. Don't know why you gave us 11 examples. Only one is enough. – Oscar Mederos May 12 '11 at 09:26
  • You can look for HTML parser service or application it will cost you more than to develop this by yourself or getting programmer to do that. Moreover once you will start quering Google for your SEO needs you will be blocked or will get random results as Google like to do to guys like you. – eugeneK May 12 '11 at 09:28
  • @Fredrik Mörk: as a software developer you should be able to identify when it is appropriate to use a technology and when it is not. I appreciate the humour, but this question is an example of when using a regex for extracting a part of html is perfectly valid. I used myself regex for web scraping on numerous occasions and it has always been a success. Regex is not appropriate for *proper parsing* of xml/html. It is perfectly good for limited web scraping. – Andrew Savinykh May 12 '11 at 09:32
  • @zespri Did you take a look at my example? I used regex, but for parsing a text, not for parsing HTML. There is something I still don't understand: **Why use Regex for parsing HTML when there are HTML Parsers**? – Oscar Mederos May 12 '11 at 09:39
  • @Oscar Mederos: sometimes a task is that simple that it does not warrant usage of an external library such as HtmlAgilityPack. Regex that is part of .net framework can be more than enough. – Andrew Savinykh May 12 '11 at 10:01
  • @zespri: I agree with you (hence the smiley), and I have myself (more than once) used regex for such tasks. As you say, *" to identify when it is appropriate to use a technology and when it is not"* is very important. – Fredrik Mörk May 12 '11 at 10:41
  • @Fredrik Mörk: yep, I was just worried that we got LightWing totally confused by this exchange =) – Andrew Savinykh May 12 '11 at 10:42

2 Answers2

6

You can use the official Google API for this task. You will get a 100 queries for free per day. More queries cost.

Daniel Hilgarth
  • 166,158
  • 40
  • 312
  • 426
3

First of all, download HtmlAgilityPack. Once you reference it in your project, you can do:

var doc = new HtmlWeb().Load("http://www.google.com/search?q=love");
var div = doc.DocumentNode.SelectSingleNode("//div[@id='resultStats']");
var text = div.InnerText;

Text will contain About 4,350,000,000 results (0.07 seconds) 

All you have to do is parse the number now.

var matches = Regex.Matches(text, @"About ([0-9,]+) ");
var total = matches[0].Groups[1].Value;

You will have the number in total.

Note
If Google provides an API for this purpose, use it.

Also, make sure that scraping Google results isn't prohibited.
This is just an example of how to use an HTML Parser in C#.

Oscar Mederos
  • 28,017
  • 21
  • 79
  • 123
  • 2
    I need to mention, that this is against their Terms of Service: http://www.google.com/support/websearch/bin/answer.py?answer=86640 – Daniel Hilgarth May 12 '11 at 09:32
  • it seems really efficient and simple – Saleh May 12 '11 at 09:34
  • @LightWing It's efficient until your network is captcha'd and subsequently altogether blocked from using Google. ;) @Oscar The question isn't about screen-scraping in general, it's about Google predictions. – bzlm May 12 '11 at 09:35
  • it means i can use this solution or can not? – Saleh May 12 '11 at 09:35
  • @LightWing Read my note. As @Daniel said, this is against their Terms of Service. – Oscar Mederos May 12 '11 at 09:36
  • **Honestly people, should I delete this answer?** – Oscar Mederos May 12 '11 at 09:37
  • 2
    @Oscar: No! It is a good signpost to show people that things that are technically possible, are still not always the best solutions. – Daniel Hilgarth May 12 '11 at 09:38
  • @DanielHilgarth Okay. The thing is that I received a downvote, and this code is fully working, so there should be because of Google's Terms of Service `:)` – Oscar Mederos May 12 '11 at 09:40
  • @Oscar: Probably, the downvote is because of the violation of Google ToS. I still think, the answer should stay, because it shows a simple and working example of screen scraping and it highlights the problems with screenscraping google through its comments. – Daniel Hilgarth May 12 '11 at 09:44
  • Hi everyone, for some reason when i make the request, no resultStats div back with the html, and worst, no result number appear in any place in the html... some idea?. – Juan Ruiz de Castilla May 01 '21 at 17:39