1

How can I match the href and 'a' vlaue in a link ?

So extract 'www.google.com' & 'test' from below :

<A HREF="www.google.com/test.html" title="test">test</A>

Here is what I am trying : '<A HREF=(.+).html' but it is not matching ?

Andy Lester
  • 86,927
  • 13
  • 98
  • 148
blue-sky
  • 49,326
  • 140
  • 393
  • 691
  • 2
    Do NOT use regular expressions for parsing HTML. There are plenty of HTML parsers out there for various languages. Which one are you using? – pemistahl Jan 16 '13 at 21:12
  • 1
    To the user's defense, sometimes all you want is a quick dirty regex because you're processing something one-off and you know the tags are always structured in a particular way... But the given regex is not a very good start for the problem at hand. – paddy Jan 16 '13 at 21:16
  • 2
    Things never end up as easy as they start off, but a regex for _this exact case_ would be something like [`\(.*\)`](http://refiddle.com/gjv). Use at own peril :) – Joachim Isaksson Jan 16 '13 at 21:37
  • @Peter Stahl im using it for scala – blue-sky Jan 16 '13 at 21:50
  • @Joachim Isaksson put your last comment in an answer and ill accept ? – blue-sky Jan 16 '13 at 22:39
  • @PeterStahl, most often than not you would be right. However I've used regex successfully many times for quick and dirty job. This is usually much faster than wiring up an html parser. And sometimes it's all that is required. – Andrew Savinykh Jan 17 '13 at 04:46

3 Answers3

1

Try this:

<A.*HREF\s*=\s*(?:"|')([^"']*)(?:"|').*>(.*)<\/A>

Group1 and Group2 will give you the desired result.

prageeth
  • 6,961
  • 7
  • 43
  • 69
  • Note that it will ONLY work on this one specific tag, which clearly isn't even a real example because the URL is incorrect. – Andy Lester Jan 17 '13 at 04:31
1

Regular expressions for HTML can be brittle to change, but a regex for this exact case would be;

<A HREF="\(.*\)" .*>\(.*\)</A>

Joachim Isaksson
  • 170,943
  • 22
  • 265
  • 283
0

Because the text html does not appear in your tag.....

paddy
  • 55,641
  • 6
  • 55
  • 99