8

How to check if a unicode character is full width?

I use Win32 / MFC

For example, is full width, A is not full width, is full width, F is not full width.

linquize
  • 19,090
  • 9
  • 57
  • 80
  • you do realize that the question is nonsense. A string means nothing if you don't know anything about it's encoding. Please check out this nice artice which should shed some light on the matter -> http://www.joelonsoftware.com/articles/Unicode.html – Pandrei Dec 18 '13 at 15:36
  • 1
    +1 Your question made me go and learn something today! – Roddy Dec 18 '13 at 16:06
  • @Pandrei I would _not_ recommend that article. While it makes one or two useful points, there are also a couple of errors in it: for starters, the author doesn't seem to understand the difference _UCS_ and _UTF_. – James Kanze Dec 18 '13 at 16:26
  • What do you mean by Full width? – CloudyMarble Jan 09 '14 at 07:27

2 Answers2

8

What you need is to retrieve the East Asian Width of the character. You can do it by parsing the EastAsianWidth.txt file from the Unicode Character Database. I could not find a Win32 API that returns this info, but in Python, for example, you can use unicodedata.east_asian_width(unichr).

See the Annex #11 for the background of the problem and more information.

Igor Skochinsky
  • 23,838
  • 1
  • 67
  • 104
  • This is the correct answer. FWIW: the various files from the Unicode consortium have been designed for easy parsing, so it shouldn't be too difficult to machine generate a C++ table from it. (I've done this for a number of other such files.) – James Kanze Dec 18 '13 at 16:30
  • Are there any other languages other than East Asian have full width characters? – linquize Dec 20 '13 at 02:04
  • For a more complete discussion, see this answer: http://stackoverflow.com/a/9145712/53974 – Blaisorblade May 17 '14 at 14:34
-3

What do you mean by "full width"? The width of a character depends on the font it is being displayed in.

If you mean whether it is a single byte character or not, it's still not clear. A single byte character in what encoding? In UTF-8, it will be a single byte character if (and only if) the code point is less than 128; if you're using UTF-16 (probable, since you're under Windows), just compare the character with 128. A single byte encoding in ISO 8859-1 (another wide spread encoding): compare with 256. For anything less than 256, the UTF-16 unit will be numerically identical to the code point in ISO 8859-1 (sometimes known as Latin-1). For the single byte encoding ASCII (almost never used today, but most of the common encodings are identical with it for the first 128 code points), anything less that 128 is good.

James Kanze
  • 146,674
  • 16
  • 175
  • 326
  • @Roddy That makes more sense. I should have looked up his second full-width character in my Unicode encoding. (Of course, it basically means that there isn't a simple answer.) – James Kanze Dec 18 '13 at 16:00