21

I know the answer seems trivial but believe me, it is not! In Unicode There are different characters for Roman numerals. For example, one is not i but which is a different character; or a better example, two is not ii (that is a string of two characters juxtaposed) but (that is a single character).

Here are the roman numerals for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50, 100, 500, and 1,000 respectively: , , , , , , , , , , , , , , , (non-capitalized: , , , , , , , , , , , , , , , ). But the question is how to construct the numerals not present in this series (13 is just an example).

One way to write 13 is ⅹⅲ that juxtaposes and (13=10+3) and another way is ⅻⅰ that juxtaposing and (13=12+1). If the base of roman numeric system is 12, then the latter makes more sense.

Mehdi Abbassi
  • 311
  • 2
  • 7

2 Answers2

33

In most cases, you should write 13 as XIII and not use any of the precomposed numbers, because the precomposed numbers up to 12 in the Unicode standard are intended for a small set of special use cases only. As you can read in the Unicode Standard 6.0, chapter 15.3,

For most purposes, it is preferable to compose the Roman numerals from sequences of the appropriate Latin letters. However, the uppercase and lowercase variants of the Roman numerals through 12, plus L, C, D, and M, have been encoded for compatibility with East Asian standards. Unlike sequences of Latin letters, these symbols remain upright in vertical layout. Additionally, in certain locales, compact date formats use Roman numerals for the month, but may expect the use of a single character.

For dates, you do not need the number 13 (unless you use a calendar with more than 12 months, but in any event you are out of luck then). If you want to use the precomposed symbols for use in a vertically laid out Asian text, I suppose there is no “correct way” to do it, and you can do what you find more pleasing visually (probably X + III).

Sebastian Koppehel
  • 34,011
  • 2
  • 58
  • 110
  • Thank you for the answer. One reason to use precomposed characters instead of alphabetic characters is the TTS engines that sometimes read the precomposed characters more correctly. – Mehdi Abbassi Apr 02 '23 at 15:42
  • @MehdiAbbassi But in that case I'm afraid you're out of luck too, because the text-to-speech engine would read "ten three" or "twelve one." – Sebastian Koppehel Apr 02 '23 at 15:58
  • 1
    The engine could be a bit "smarter" and read "thirteen" but for example when it arrives in XI maybe it is just the Chinese president in uppercase or it could be eleven. – Mehdi Abbassi Apr 02 '23 at 16:03
  • 2
    @MehdiAbbassi See https://stackoverflow.com/a/28788246/3527940 – jcaron Apr 03 '23 at 22:44
12

Sebastian Koppehel has already supplied a very good answer (the current version of the Unicode standard is 15.0.0 and he linked to version 6.0.0 but the specs are unchanged in this respect). However, I would like to add one detail more: all those precomposed characters for Roman numerals have a compatibility decomposition to the usual sequences of plain Latin letters. Without entering the intricacies of Unicode, this basically means that if you replace the precomposed characters with the corresponding sequence of Latin letters, you end up with a text equivalent to the original, and archival systems and the like are allowed to treat them as the same text, but for two facts: (i) the rendering might be somewhat different, as is the case for Asian typography, and (ii) the precomposed characters have the category letter number while the ordinary characters have the category uppercase/lowercase letter and this might be useful for text analysis and processing - speech synthesis is just a possible application. Canonical decomposition would yield a stronger equivalence, but the very reason to have those precomposed characters is not to have exact equivalents.

Dario
  • 3,246
  • 15
  • 25