11

I found this awesome way to detect emojis using a regex that doesn't use "huge magic ranges" by using a Unicode property escape:

console.log(/\p{Emoji}/u.test('flowers ')) // true
console.log(/\p{Emoji}/u.test('flowers')) // false

But when I shared this knowledge in this answer, @Bronzdragon noticed that \p{Emoji} also matches numbers! Why is that? Numbers are not emojis?

console.log(/\p{Emoji}/u.test('flowers 123')) // unexpectdly true

// regex-only workaround by @Bonzdragon
const regex = /(?=\p{Emoji})(?!\p{Number})/u;
console.log(
  regex.test('flowers'), // false, as expected
  regex.test('flowers 123'), // false, as expected
  regex.test('flowers 123 '), // true, as expected
  regex.test('flowers '), // true, as expected
)

// more readable workaround
const hasEmoji = str => {
  const nbEmojiOrNumber = (str.match(/\p{Emoji}/gu) || []).length;
  const nbNumber = (str.match(/\p{Number}/gu) || []).length;
  return nbEmojiOrNumber > nbNumber;
}
console.log(
  hasEmoji('flowers'), // false, as expected
  hasEmoji('flowers 123'), // false, as expected
  hasEmoji('flowers 123 '), // true, as expected
  hasEmoji('flowers '), // true, as expected
)
Boann
  • 47,128
  • 13
  • 114
  • 141
Nino Filiu
  • 12,893
  • 9
  • 48
  • 65
  • 3
    Note that the workaround also fails for '123 flowers ' for example - that *should* return true, as it definitely has emoji. – Jon Skeet Oct 16 '20 at 12:38
  • 3
    why not just remove all numbers then do the check? – Lawrence Cherone Oct 16 '20 at 12:40
  • 2
    The question is not how to fix it ([here is a fix](https://stackoverflow.com/a/48148218/3832970)), the question is **why**. Else, let's close it. – Wiktor Stribiżew Oct 16 '20 at 12:43
  • @WiktorStribiżew indeed, I am asking **why**, also I don't want to use one of these range-based regex because they're extremely long, unreadable, magic, and not resilient to the adding of new emojis – Nino Filiu Oct 16 '20 at 13:18
  • 1
    I think the answer is [here](https://github.com/mathiasbynens/emoji-regex/issues/33#issuecomment-373674579) and all thread after that post. *This is not a bug. `#` and `0-9` are `Emoji` characters with a text representation by default, per the Unicode Standard.* – Wiktor Stribiżew Oct 16 '20 at 13:25
  • 1
    [This post](https://github.com/mathiasbynens/emoji-regex/issues/33#issuecomment-374176872) goes into more detail and you probably can use the `/\p{Extended_Pictographic}/u` regex to match emojis except for some keycap base characters that are still emojis. – Wiktor Stribiżew Oct 16 '20 at 13:35

1 Answers1

11

According to this post, digtis, #, *, ZWJ and some more chars contain the Emoji property set to Yes, which means digits are considered valid emoji chars:

0023          ; Emoji_Component      #  1.1  [1] (#️)       number sign
002A          ; Emoji_Component      #  1.1  [1] (*️)       asterisk
0030..0039    ; Emoji_Component      #  1.1 [10] (0️..9️)    digit zero..digit nine
200D          ; Emoji_Component      #  1.1  [1] (‍)        zero width joiner
20E3          ; Emoji_Component      #  3.0  [1] (⃣)       combining enclosing keycap
FE0F          ; Emoji_Component      #  3.2  [1] ()        VARIATION SELECTOR-16
1F1E6..1F1FF  ; Emoji_Component      #  6.0 [26] (..)    regional indicator symbol letter a..regional indicator symbol letter z
1F3FB..1F3FF  ; Emoji_Component      #  8.0  [5] (..)    light skin tone..dark skin tone
1F9B0..1F9B3  ; Emoji_Component      # 11.0  [4] (..)    red-haired..white-haired
E0020..E007F  ; Emoji_Component      #  3.1 [96] (..)      tag space..cancel tag

For example, 1 is a digit, but it becomes an emoji when combined with U+FE0F and U+20E3 chars: 1️⃣:

console.log("1\uFE0F\u20E3 2\uFE0F\u20E3 3\uFE0F\u20E3 4\uFE0F\u20E3 5\uFE0F\u20E3 6\uFE0F\u20E3 7\uFE0F\u20E3 8\uFE0F\u20E3 9\uFE0F\u20E3 0\uFE0F\u20E3")

If you want to avoid matching digits, use Extended_Pictographic Unicode category class:

The Extended_Pictographic characters contain all the Emoji characters except for some Emoji_Components.

So, you may use either /\p{Extended_Pictographic}/gu to most emojis proper, or /\p{Extended_Pictographic}/u to test for a single emoji proper, or use /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u to match emojis proper and light skin to dark skin mode chars and red-haired to white-haired chars:

const regex_emoji = /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u;
console.log( regex_emoji.test('flowers 123') );     // => false
console.log( regex_emoji.test('flowers ') ); // => true
Wiktor Stribiżew
  • 561,645
  • 34
  • 376
  • 476