How do I convert Æ and á into a regular English char with Java ? What I have is something like this : Local TV from Paraná. How to convert it to [Parana] ?
Asked
Active
Viewed 5,116 times
4
Frank
- 29,646
- 56
- 159
- 233
-
This question is duplicate of http://stackoverflow.com/questions/1008802/converting-symbols-accent-letters-to-english-alphabet Please refer to that question for an answer – brianpeiris Dec 26 '09 at 18:13
-
Æ corresponds to the char with int value 198. – Thorbjørn Ravn Andersen Dec 26 '09 at 21:29
2 Answers
6
Look at icu4j or the JDK 1.6 Normalizer:
public String removeAccents(String text) {
return Normalizer.normalize(text, Normalizer.Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
GAMA
- 5,874
- 13
- 76
- 124
bmargulies
- 94,623
- 39
- 172
- 299
-
You probably meant "Normalizer.normalize(text, Normalizer.Form.NFD)" instead of "Normalizer.decompose(text, false, 0)" – Steve Emmerson Dec 26 '09 at 18:58
-
I think I accidentally put in the old sun. class scheme instead. Thanks for catching it. – bmargulies Dec 26 '09 at 19:42
-
Normalizer.Form.NFKD may be better than Normalizer.Form.NFD for his purposes, depending on how he wants to treat ligatures. eg: NFKD will transform `"fi"` into `"fi"`. – Laurence Gonsalves Dec 26 '09 at 21:34
-
http://stackoverflow.com/a/3322174/535203 says `replaceAll("[^\\p{ASCII}]", "");` – Anthony O. Jan 08 '13 at 16:37
0
As far as I know, there's no way to do this automatically -- you'd have to substitute manually using String.replaceAll.
String str = "Paraná";
str = str.replaceAll("á", "a");
str = str.replaceAll("Æ", "a");
Kaleb Brasee
- 50,055
- 8
- 105
- 112