27

I use the below function to export an array to a csv files in JavaScript, but the Chinese characters become messy code with Microsoft Excel 2013 in Windows7.

I open the exported file with a notepad but it displays finely.

function arrayToCSVConvertor(arrData, reportTitle) {
    var CSV='';
    arrData.forEach(function(infoArray, index){
        var dataString = infoArray.join(",");
        dataString= dataString.split('\n').join(';');
        CSV += dataString+ "\n";
    });

    if (CSV == '') {
        alert("Invalid data");
        return;
    }

    //create a link and click, remove
    var link = document.createElement("a");
    link.id="lnkDwnldLnk";

    //this part will append the anchor tag and remove it after automatic click
    document.body.appendChild(link);

    var csv = CSV;

    var blob = new Blob([csv], { type: ' type: "text/csv;charset=UTF-8"' });//Here, I also tried charset=GBK , and it does not work either
    var csvUrl = createObjectURL(blob);

    var filename = reportTitle+'.csv';

    if(navigator.msSaveBlob){//IE 10
        return navigator.msSaveBlob(blob, filename);
    }else{
        $("#lnkDwnldLnk")
            .attr({
                'download': filename,
                'href': csvUrl
            });
        $('#lnkDwnldLnk')[0].click();
        document.body.removeChild(link);
    }
}
JaskeyLam
  • 14,373
  • 17
  • 110
  • 142

3 Answers3

61

Problem solved by adding BOM at the start of the csv string:

var csv = "\ufeff"+CSV;
JaskeyLam
  • 14,373
  • 17
  • 110
  • 142
17

This is my solution:

var blob = new Blob(["\uFEFF"+csv], {
    type: 'text/csv; charset=utf-18'
});
Marcello B.
  • 3,822
  • 10
  • 45
  • 61
Santy SC
  • 417
  • 4
  • 4
0

According to RFC2781, the byte order mark (BOM) 0xFEFF is the BOM for UTF-16 little endian encoding (UTF16-LE). While adding the BOM may resolve the issue for Windows, the problem still exists if one is about to open the generated CSV file using Excel on MacOS.

A solution for writing a multibyte CSV file that works across different OS platforms (Windows, Linux, MacOS) applies these three rules:

  1. Separate the field with a tab character instead of comma
  2. Encode the content with UTF16-LE
  3. Prefix the content with UTF16-LE BOM, which is 0xFEFF

More detailed elaboration, sample code, and use cases can be seen in this article

mikaelfs
  • 191
  • 2
  • 2