0

I want to get all the websites from HTML code. The problem is that I have a regex which takes all the URLs but there needs to be www in the address. What kind of regex I need to use to get the URLs without www in the content?

update: The regex I am using is:

string anchorPattern = 
  @"(?<Protocol>\w+)://(?<Domain>[\w@][\w.:@]+)/?[\w.?=%&=\-@/$,&amp;+]*'";
Alan Moore
  • 71,299
  • 12
  • 93
  • 154
Laziale
  • 7,633
  • 43
  • 138
  • 244

2 Answers2

1

add (?=www) for only urls that have www

@"(?<Protocol>\w+)://(?=www)(?<Domain>[\w@][\w.:@]+)/?[\w.?=%&=\-@/$,&amp;+]*"

or add (?!www) for no www urls

@"(?<Protocol>\w+)://(?!www)(?<Domain>[\w@][\w.:@]+)/?[\w.?=%&=\-@/$,&amp;+]*"
Peter
  • 81
  • 1
  • 2
0

One like you have, but without the part of the regex that looks like www\.

chaos
  • 119,149
  • 33
  • 300
  • 308