3

I'm looking at the IsCharAlphaNumeric Windows API function. As it only takes a single TCHAR, it obviously can't make any decisions about surrogate pairs for UTF16 content. Does that mean that there are no alphanumeric characters that are surrogate pairs?

Puppy
  • 141,834
  • 35
  • 244
  • 454

3 Answers3

5

Characters outside the BMP can be letters. (Michael Kaplan recently discussed a bug in the classification of the character U+1F48C.) But IsCharAlphaNumeric cannot see characters outside the BMP (for the reasons you noted), so you cannot obtain classification information for them that way.

If you have a surrogate pair, call GetStringType with cchSrc = 2 and check for C1_ALPHA and C1_DIGIT.

Edit: The second half of this answer is incorrect GetStringType does not support surrogate pairs.

Community
  • 1
  • 1
Raymond Chen
  • 43,603
  • 11
  • 89
  • 129
0

You can determine yourself by looking at the Unicode plane assignment what you are missing by not being able to inspect non-BMP codepoints.

For example, you won't be able to identify imperial Aramaic characters as alphanumeric. Shame.

Kerrek SB
  • 447,451
  • 88
  • 851
  • 1,056
0

Does that mean that there are no alphanumeric characters that are surrogate pairs?

No, there are supplementary code-points that are in the letter group.

Comparing a char to a code-point?

For example, Character.isLetter('\uD840') returns false, even though this specific value if followed by any low-surrogate value in a string would represent a letter.

Community
  • 1
  • 1
Mike Samuel
  • 114,030
  • 30
  • 209
  • 240