0

In one of my projects String are send to a old teletext system that's not capable of showing any modern characters. (Teletext was created in the 1970's). Since content that's send into that system is coming from external sources (web parsing, rss feeds, etc.) there is no control over the incoming data. For years I'm using a long list of all the characters I ever encountered that are not allowed. It's working well, but I think my solution is ugly and not efficient.

What would be ways to improve my solution to make it more efficient?

public static String removeSpecialCharactersAndHTML(String text) {
    String result = text;

    result = result.replace(">", ">");
    result = result.replace("&lt;", "<");
    result = result.replace("&#38;", "&");
    result = result.replace("&quot;", "\"");
    result = result.replace("&nbsp;", " ");
    result = result.replace("&amp;", "&");

    result = result.replace("]]>", "");
    result = result.replace("‘", "'");
    result = result.replace("’", "'");
    result = result.replace("`", "'");
    result = result.replace("´", "'");
    result = result.replace("“", "\"");

    // .....

    result = result.replace("”", "\"");
    result = result.replace("³", "3");
    result = result.replace("²", "2");

    return result 
}
Stefan1991
  • 183
  • 1
  • 7
  • There are more characters that are special than non special. What characters *can* it handle? – Bohemian Sep 21 '16 at 17:36
  • 1
    Paste your code in the question in text form with proper formatting. Do not provide links to external sources such as github. – progyammer Sep 21 '16 at 17:37
  • I think [that](http://stackoverflow.com/a/10574318/1402861) may answer your question ; ) – WrRaThY Sep 21 '16 at 17:40
  • Basically any characters that were around in the 90's (a-Z, A-Z, numbers and some basic special characters, like !@#$%^&*() I probably should keep track of a list of allowed characters, and replace any characters not in that list. – Stefan1991 Sep 21 '16 at 17:41

1 Answers1

1

For removing HTML from a string, you should not write your own code but instead use some existing library. They will not do the many bugs that are in your code.

The approach of replacing certain characters is fine. But at the end, you must remove all characters from the string that will not be handled by the terminal. That is, rather than defining the forbidden characters, define the allowed characters.

Roland Illig
  • 39,148
  • 10
  • 81
  • 116