I have to work with codepoints above 0FFFF (specifically math scripted characters) and have not found simple tutorials on how to do this. I want to be able to (a) create Strings with high codepoints and (b) iterate over the characters in them. Since char cannot hold these points my code looks like:
@Test
public void testSurrogates() throws IOException {
// creating a string
StringBuffer sb = new StringBuffer();
sb.append("a");
sb.appendCodePoint(120030);
sb.append("b");
String s = sb.toString();
System.out.println("s> "+s+" "+s.length());
// iterating over string
int codePointCount = s.codePointCount(0, s.length());
Assert.assertEquals(3, codePointCount);
int charIndex = 0;
for (int i = 0; i < codePointCount; i++) {
int codepoint = s.codePointAt(charIndex);
int charCount = Character.charCount(codepoint);
System.out.println(codepoint+" "+charCount);
charIndex += charCount;
}
}
I don't feel comfortable that this is either fully correct or the cleanest way to do this. I would have expected methods such as codePointAfter() but there is only a codePointBefore(). Please confirm that this is the right strategy or give an alternate one.
UPDATE: Thanks for the confirmation @Jon. I struggled with this - here are two mistakes to avoid:
- there is no direct index into the code points (i.e. no
s.getCodePoint(i))- you have to iterate through them - using
(char)as a cast will truncate integers above0FFFFand it's not easy to spot