0

I just want to match & in the url but not the xml entities like &< etc.

<a href="/test/test2">Contact Us</a>
<a href="http://www.testassociation.com/test.html?ab=5&cd=5&ab=c" target="_blank">Customer Association</a>&amp;

http://www.testassociation.com/test.html?ab=5&cd=5&ab=c

I want to replace the & with &amp; but not disturb the other entities.

Sorry I am not getting idea how to do it.

I tried this:

(&)([a-z][^;]*)

Is there a better way.

mpapec
  • 49,466
  • 8
  • 63
  • 119
Susheel Singh
  • 3,793
  • 4
  • 28
  • 65

2 Answers2

1
(?!&amp|&lt)&

You can use something like this.You will have to list all &amp like words you want to miss.I have listed two.

See demo.

http://regex101.com/r/tA9uG5/1

Edit

&(?=\w\w=)

use this if you dont want to list all.

vks
  • 65,133
  • 10
  • 87
  • 119
1

The only way to be completely accurate is like @vks says including all the list of entities.

You can find this list in the wikipedia: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

If you don't need to be so accurate, and having the longest entity &thetasym; with 8 characters you can use negative lookahead:

(?!&\w{1,8};)&

Demo

Taking in mind that you will also miss everything with the form &dffa; even if it is not a valid entity

Oscar Hermosilla
  • 421
  • 5
  • 20