17

For long time we used naive approach to split strings in JS:

someString.split('');

But popularity of emoji forced us to change this approach - emoji characters (and other non-BMP characters) like are made of two "characters'.

String.fromCodePoint(128514).split(''); // array of 2 characters; can't embed due to StackOverflow limitations

So what is modern, correct and performant approach to this task?

Ginden
  • 4,852
  • 32
  • 65
  • I'm curious. Which StackOverflow limitations are you talking about? – Mr Lister Feb 05 '16 at 11:44
  • It seems like I couldn't post question with result of `JSON.stringify(String.fromCodePoint(128514).split(''))` expression - it caused "Malformed URI" error thrown from jQuery and disallowed to post question. – Ginden Feb 05 '16 at 11:48
  • @MrLister: [I have added Meta post](http://meta.stackexchange.com/questions/274191/cant-post-result-alone-surrogates-because-of-jquery-raising-malformed-uri-bug). – Ginden Feb 05 '16 at 11:56
  • 1
    see https://mathiasbynens.be/notes/javascript-unicode for the big picture – georg Feb 05 '16 at 11:57

4 Answers4

14

Using spread in array literal :

const str = "";
console.log([...str]);

Using for...of :

function split(str){
  const arr = [];
  for(const char of str)
    arr.push(char)
   
  return arr;
}

const str = "";
console.log(split(str));
Omkar76
  • 1,051
  • 1
  • 6
  • 20
  • 2
    I think it should be noted that this unfortunately still doesn't cover a lot of emojis currently in use. E.g. `[...'‍♀️']` becomes `["", "", "‍", "♀", "️"]`. Which means no e.g. straightforward string reversal or symbol-wise comparison is possible. – AndyO Aug 26 '21 at 10:50
  • 1
    See https://github.com/orling/grapheme-splitter as an example library, mind the open issues regarding zero-width-joiners. Maybe there's a newer library out there. – Manuel Nov 06 '21 at 17:50
8

The best approach to this task is to use native String.prototype[Symbol.iterator] that's aware of Unicode characters. Consequently clean and easy approach to split Unicode character is Array.from used on string, e.g.:

const string = String.fromCodePoint(128514, 32, 105, 32, 102, 101, 101, 108, 32, 128514, 32, 97, 109, 97, 122, 105, 110, 128514);
Array.from(string);
Ginden
  • 4,852
  • 32
  • 65
4

A flag was introduced in ECMA 2015 to support unicode awareness in regex.

Adding u to your regex returns the complete character in your result.

const withFlag = `ABDE`.match(/./ug);
const withoutFlag = `ABDE`.match(/./g);

console.log(withFlag, withoutFlag);

There's a little more about it here

robstarbuck
  • 5,087
  • 1
  • 35
  • 35
0

I did something like this somewhere I had to support older browsers and a ES5 minifier, probably will be useful to other

    if (Array.from && window.Symbol && window.Symbol.iterator) {
        array = Array.from(input[window.Symbol.iterator]());
    } else {
        array = ...; // maybe `input.split('');` as fallback if it doesn't matter
    }
Ebrahim Byagowi
  • 9,074
  • 4
  • 60
  • 74