0

I want to match a web address through regex which should capture http://www.google.com as well as www.google.com i.e. with and without protocol.

GEOCHET
  • 20,745
  • 15
  • 72
  • 98
shabby
  • 2,606
  • 3
  • 34
  • 56

4 Answers4

3

Well it's going to depend on exactly what you want to capture ("FTP"? "/index.htm"?) because a general URI capture based on the RFC standard is very hard, but you could start with:

/^((https?\:\/\/)?([\w\d\-]+\.){2,}([\w\d]{2,})((\/[\w\d\-\.]+)*(\/[\w\d\-]+\.[\w\d]{3,4}(\?.*)?)?)?)$/

Complicated see?

annakata
  • 72,622
  • 16
  • 112
  • 180
2

Try RegexLib.

Mitch Wheat
  • 288,400
  • 42
  • 452
  • 532
1

Read RFC 3986. It is not just as easy as you might think it is. The job is easier if you only have a small set of URLs to parse.

dirkgently
  • 104,737
  • 16
  • 128
  • 186
  • You can get 'good enough' though, so this answer isn't particularly helpful – John Sheehan Mar 09 '09 at 22:16
  • Its about as good an answer as there was one without full problem specification. To the extent of being a competitor for the top answer. The problem is few people read the RFCs and I having read one and written a IPV6 parser know how hard the job is. – dirkgently Mar 10 '09 at 06:22
0

Why not

/google\.com/

?

It catches http://www.google.com , www.google.com , and even google.com for free! :-)

Igor
  • 25,778
  • 27
  • 87
  • 113
  • 1
    It also catches "Well I guess I could try searching for this regex on google.com, nah SO is better than google these days. Hmm, I wonder what's for lunch. Mmmm. Bacon" – annakata Feb 20 '09 at 11:01
  • Which, if you enter it in most browsers will bring you to google :) – MSalters Feb 20 '09 at 13:58
  • SO is meant to be a reference so that when you search google, you end up here instead of another crappy site. So this question is fine. – John Sheehan Mar 09 '09 at 22:17
  • 1
    @John: Please stop making paranoic comments and donevotes. This was a legitimate answer, advising how to match specific domain names (e.g. google.com). – Igor Mar 10 '09 at 08:46