2

I'm doing a rewrite of this question.

I want to create a string with a unicode escaped character such as "\u03B1" using an integer constant. For example, this string is the greek letter alpha.

const char *alpha = "\u03B1"

I want to construct the same string using a call to printf using the integer value 0x03B1. For this example it can be done like this but I'm not sure to get those two numbers from 0x03B1.

printf("%c%c", 206, 177);

This link explains what to do but I'm not sure how to do it. http://www.fileformat.info/info/unicode/utf8.htm

For characters equal to or below 2047 (hex 0x07FF), the UTF-8 representation is spread across two bytes. The first byte will have the two high bits set and the third bit clear (i.e. 0xC2 to 0xDF). The second byte will have the top bit set and the second bit clear (i.e. 0x80 to 0xBF).

NOTE: I do not want to create the string "\\u03B1" with a backslash. This is different than "\u03B1" which is an escaped unicode character.

Berry Blue
  • 14,160
  • 17
  • 57
  • 97

2 Answers2

3

It appears that even the most recent C and C++ standards are a bit disappointing in their handling of Unicode.

For those who are confused about the example in the question, like I was:

const char *alpha = "\u03B1"

In C99, this will store a pointer to the string "α" (U+03B1) in alpha. In C89, this is invalid syntax.

I could not find a way to use the \u syntax with a variable or integer constant, like what the question was requesting. You may be better off using a library to add better Unicode support to your program. I have not used the ICU library, but it sounds promising.

Community
  • 1
  • 1
yellowantphil
  • 1,461
  • 5
  • 21
  • 30
  • Sorry, please see my edited question above. I want to create an escaped unicode character not a string with a backslash. – Berry Blue Nov 07 '14 at 01:49
1

I figured it out.

The first byte contains the 5 upper bits 0x7c0 is 11111000000 and the second byte contains the lower 5 bits 0x3f is 00000111111 of the unicode value.

The first byte uses the mask 0xc0 is 11000000 to set the two high bits and the second byte uses 0x80 is 10000000 to set the first high bit.

int alpha = 0x03B1; // 945
char byte1 = 0xc0 | ((alpha & 0x7c0) >> 6); // 206
char byte2 = 0x80 | (alpha & 0x3f); // 177
printf("%c%c", byte1, byte2);
Berry Blue
  • 14,160
  • 17
  • 57
  • 97