48

Possible Duplicates:
Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars
Is there a way to get rid of accents and convert a whole string to regular letters?

How can i do this? Thanks for the help

Community
  • 1
  • 1
lacas
  • 13,555
  • 29
  • 107
  • 179

3 Answers3

147

I think your question is the same as these:

and hence the answer is also the same:

String convertedString = 
       Normalizer
           .normalize(input, Normalizer.Form.NFD)
           .replaceAll("[^\\p{ASCII}]", "");

See

Example Code:

final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ";
System.out.println(
    Normalizer
        .normalize(input, Normalizer.Form.NFD)
        .replaceAll("[^\\p{ASCII}]", "")
);

Output:

This is a funky String

Community
  • 1
  • 1
Sean Patrick Floyd
  • 284,665
  • 62
  • 456
  • 576
12

You can use java.text.Normalizer to separate base letters and diacritics, then remove the latter via a regexp:

public static String stripDiacriticas(String s) {
    return Normalizer.normalize(s, Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
Michael Borgwardt
  • 335,521
  • 76
  • 467
  • 706
  • 1
    I used something similar that did the job: Pattern.compile("\\p{InCombiningDiacriticalMarks}+").matcher(nfdNormalizedString).replaceAll(""); – Adrien Be Mar 04 '13 at 15:15
11

First - you shouldn't. These symbols carry special phonetic properties which should not be ignored.

The way to convert them is to create a Map that holds each pair:

Map<Character, Character> map = new HashMap<Character, Character>();
map.put('á', 'a');
map.put('é', 'e');
//etc..

and then loop the chars in the string, creating a new string by calling map.get(currentChar)

Bozho
  • 572,413
  • 138
  • 1,043
  • 1,132