5

I want to replace certain characters with their respective HTML entities in an HTML response inside a filter. Characters include <, >, &. I can't use replaceAll() as it will replace all characters, even those that are part of HTML tags.

What is the best approach for doing so?

Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
user1448652
  • 149
  • 2
  • 4
  • 9
  • If a single string has already been formed that contains a mixture of HTML tags and standalone characters such as ` – Damien_The_Unbeliever Jun 11 '12 at 10:23
  • My application boundaries doesn't allow me to do it earlier :( – user1448652 Jun 11 '12 at 11:10
  • 1
    But just think - if it was *possible* to do this reliably with fully formed strings, you wouldn't *need* to do encoding - web browsers would use whatever this magical technique is to distinguish tags from general text. – Damien_The_Unbeliever Jun 11 '12 at 11:16
  • That is what I need to do. so far what i am doing is to traverse the HTML character by character and checking for ''. Considering it as tag (ignoring the attributes), I am checking it in pre-defined tag list. If match does not found I am encoding both ''. I don't weather it is right approach... – user1448652 Jun 11 '12 at 12:43

3 Answers3

12

From Java you may try Apache Commons Lang (legacy v2) StringEscapeUtils.escapeHtml(). Or with commons-lang3: StringEscapeUtils.escapeHtml4().

Please note this also converts à to &agrave; & such.

Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
sangupta
  • 2,376
  • 3
  • 22
  • 36
  • This is the best, IMHO, the best solution – Jean-Rémy Revy Jun 12 '12 at 11:39
  • Simple, clean and works just fine in Groovy as well. – The Unknown Dev Aug 13 '14 at 15:35
  • 4
    Also worth noting: if you're (already) using a web framework, there's a good chance a similar function is already built into the framework. Spring, for example, has HtmlUtils.htmlEscape(), documented here: http://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/web/util/HtmlUtils.html – Josh1billion Jul 27 '15 at 21:33
1

If you're using a technology such as JSTL, you can simply print out the value using <c:out value="${myObject.property}"/> and it will be automatically escaped.

The attribute escapeXml is true by default.

escapeXml - Determines whether characters <,>,&,'," in the resulting string should be converted to their corresponding character entity codes. Default value is true.

http://docs.oracle.com/javaee/5/jstl/1.1/docs/tlddocs/

adarshr
  • 59,379
  • 22
  • 134
  • 163
0

When developing in Spring ecosystem, one can use HtmlUtils.htmlEscape() method.

For full apidocs, visit https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/util/HtmlUtils.html

Mišo Stankay
  • 339
  • 1
  • 8