Questions tagged [unicode]

Unicode is the standard for computer representation of plain text. It encompasses the Universal Character Set, intended to unambiguously represent all characters used in human writing systems in any language, Unicode Transformation Formats (UTFs), defining standardized formats for storing and transmitting Unicode text, and standards for processing and manipulating text.

Unicode is the standard for computer representation of plain . It encompasses:

  • the Universal Character Set (UCS), intended to unambiguously represent all characters used in human writing systems in any language,
  • Unicode Transformation Formats (UTFs), defining standardized formats for storing and transmitting Unicode text, and
  • standards for processing and manipulating Unicode text.

The latest version is 6.0, published in 2011.

The Universal Character Set

Unicode assigns each character an integer code point (from 0 to 0x10FFFF) in the UCS to act as a unique reference. For example:

  • U+0041 A
  • U+0042 B
  • U+0043 C
  • ...
  • U+039B Λ
  • U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

Specification

The Unicode Consortium also defines standards for sorting and collation algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Identifying Characters

709 questions
143
votes
8 answers

Impossible to put a zero after an aleph?

Me and a friend were joking about aleph's. Upon trying to type א0 (switch those 2 chars), they switched themselves! Any sequence of symbols does not stop this effect. Why is this!?? Try to type these with the 0 and א reversed (c&p for א): א0 א - …
Gradyn Wursten
  • 1,819
  • 2
  • 15
  • 24
121
votes
7 answers

Is there a unicode character for the Windows key?

I'm trying to communicate over text the Windows keyboard shortcuts. For the ones that use the Windows key, I don't want to type "Windows key +" each time. Is there a unicode character for the Windows key?
Gabriel Fair
  • 3,806
35
votes
3 answers

Is there a text-caret Unicode character?

Is there a text caret Unicode glyph? I can’t imagine that such a typical character would not be present (especially considering some of the many much less useful code-points). I have tried searching with every term for it that I can think of (below)…
Synetech
  • 68,827
27
votes
4 answers

Unicode character equivalent to the underscore on top

Is there a character equivalent to the underscore " _ " which occupies the upper position? Dash " - " will not do. If it were me, I would call it upper-score. But I do not see this anywhere.
j0h
  • 1,460
18
votes
4 answers

What is the '' character?

An e-mail from a colleague contained the character at the end of a sentence, in a context where one might expect punctuation or a smiley. What is this character? It has zero google results and unicodelookup.com doesn't make me wiser either. Does it…
gerrit
  • 1,618
17
votes
3 answers

What are the closest Unicode symbols to represent these icons (double arrow, bulb, CLI interface, multiple tabs)?

I am not really familiar with Unicode. I have some icons as .png files and I need to find Unicode symbols for them. Not sure if they exist. Not sure how to do a smart search for this kind of information. These are the icons: There is a German…
Pedro Delfino
  • 317
  • 2
  • 8
14
votes
2 answers

iconv generating UTF-16 with BOM

Inspired by this question, can I use the iconv command to generate UTF-16 output with a BOM and with specified endianness? The iconv command converts text from one encoding to another. For example: echo hello | iconv -f ascii -t utf-16 generates a…
12
votes
2 answers

In Unicode, is these a way to superimpose a character over another?

I am looking for a joining character that works a bit like overset in LaTeX. It also works a bit like the many combining Unicode characters, but I would like it to work for any alphabetic letter. For example, I'd like to put a small capital A over…
Maarten
  • 603
7
votes
1 answer

Any chance to get a superscript directly on top of subscript in Unicode?

Unicode provides subscripts and superscripts, so I can do this: x² And this: x₅ However, combining these two I get: x²₅ or x₅² Which looks badly. Any chance to get the superscript directly on top of the subscript in Unicode? For clarity, this is…
gaazkam
  • 1,003
5
votes
1 answer

Does unicode have a ee character like ꜳ (U+A733, latin aa)?

Does unicode have a ee character like ꜳ (U+A733, latin aa)? Either Google is no good with stuff under 3 letters, or it really doesn’t exist.
theL
  • 53
4
votes
1 answer

How can I draw nice boxes with rounded corners using unicode characters?

I tried using the box drawing characters from Wikipedia, but I couldn't find vertical lines which fit the rounded corner (as you can see below, they are either off, or not connected), and the horizontal lines are too thick. Are there characters…
4
votes
0 answers

Why is λ called "LAMDA" instead of "LAMBDA" in Unicode

Looking at the Unicode Character Database, I noticed that the λ character, which is usually called "lambda" in English, is in fact spelled "lamda" (or GREEK SMALL LETTER LAMDA, to be exact). Even though I understand that the transliteration of λ in…
m4tx
  • 143
3
votes
1 answer

Unicode backspace key symbol ⌫

Is there a Unicode symbol for the backspace key ⌫, the x inside a left-pointing arrow? I know that Unicode has a "BACKSPACE" control character (U+0008) that it inherited from ASCII, and it has a "SYMBOL FOR BACKSPACE" character "␈" (typically…
Jo Liss
  • 4,279
3
votes
1 answer

Create symbol using unicode characters

How would I go about creating the following symbol in unicode? The closest I've gotten is U+9712 (WHITE SQUARE WITH UPPER LEFT QUADRANT): Is there any way I could use a combining character to get the result I'm after?
2
votes
1 answer

Are Unicode characters like MUSICAL SYMBOL COMBINING STEM or RECYCLING SYMBOL FOR TYPE-1 PLASTICS used anywhere for their semantic value?

I was browsing the Unicode 6.1 core specification chapter on Symbols, and found that there exist many specialized semantic symbols, such as plastics recycling: The seven numbered logos encoded from U+2673 to U+2679, ♳♴♵♶♷♸♹, are from “The Plastic…
jtbandes
  • 8,870
1
2 3