10

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I have a HTML page with

<a class="development" href="[variable content]">X</a>

The [variable content] is different in each place, the rest is the same.
What regexp will catch all of those links? (Although I am not writing it here, I did try...)

Community
  • 1
  • 1
Itay Moav -Malimovka
  • 50,983
  • 60
  • 187
  • 270

5 Answers5

4

Try this regular expression:

<a class="development" href="[^"]*">X</a>
Gumbo
  • 620,600
  • 104
  • 758
  • 828
  • single-quoted attributes are also valid html. and, depending on the source, you can even have invalid html, by which point you're screwed. – kch May 04 '09 at 20:02
4

What about the non-greedy version:

<a class="development" href="(.*?)">X</a>
vrish88
  • 18,535
  • 8
  • 37
  • 56
  • You're doing a capture that likely won't be used. Other than that, I dont't see much difference in using this or Gumbo's version. – kch May 04 '09 at 20:08
4

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Community
  • 1
  • 1
Chas. Owens
  • 62,926
  • 18
  • 128
  • 221
1

Regex is generally a bad solution for HTML parsing, a topic which gets discussed every time a question like this is asked. For example, the element could wrap onto another line, either as

<a class="development" 
  href="[variable content]">X</a>

or

<a class="development" href="[variable content]">X
</a>

What are you trying to achieve?

Using JQuery you could disable the links with:

$("a.development").onclick = function() { return false; }

or

$("a.development").attr("href", "#");
Gene Gotimer
  • 7,051
  • 2
  • 29
  • 45
  • this solution would assume that Itay Moav is using the jquery library and that it's a client side parsing that he wishes to acheive – vrish88 May 04 '09 at 17:19
  • @vrish88: Correct. Thus the question "What are you trying to achieve?" and the comment "Using JQuery you could..." – Gene Gotimer May 04 '09 at 18:24
1

Here's a version that'll allow all sorts of evil to be put in the href attribute.

/<a class="development" href=(?:"[^"]*"|'[^']*'|[^\s<>]+)>.*?<\/a>/m

I'm also assuming X is going to be variable, so I added a non-greedy match there to handle it, and the /m means . matches line-breaks too.

kch
  • 74,261
  • 45
  • 130
  • 148