32

I was wondering if there is a recommended 'cross' Windows and Linux method for the purpose of converting strings from UTF-16LE to UTF-8? or one should use different methods for each environment?

I've managed to google few references to 'iconv' , but for somreason I can't find samples of basic conversions, such as - converting a wchar_t UTF-16 to UTF-8.

Anybody can recommend a method that would be 'cross', and if you know of references or a guide with samples, would very appreciate it.

Thanks, Doori Bar

hippietrail
  • 14,735
  • 16
  • 96
  • 147
DooriBar
  • 387
  • 1
  • 3
  • 8

9 Answers9

12

Change encoding to UTF-8 with PowerShell:

Get-Content PATH\temp.txt -Encoding Unicode | Set-Content -Encoding UTF8 PATH2\temp.txt
Ben Thul
  • 28,821
  • 4
  • 42
  • 65
user4657497
  • 129
  • 1
  • 2
6

The open source ICU library is very commonly used.

Hans Passant
  • 897,808
  • 140
  • 1,634
  • 2,455
6

If you don't want to use ICU,

  1. Windows: WideCharToMultiByte
  2. Linux: iconv (Glibc)
Alex B
  • 79,216
  • 40
  • 196
  • 275
5
#include <iconv.h>

wchar_t *src = ...; // or char16_t* on non-Windows platforms
int srclen = ...;
char *dst = ...;
int dstlen = ...;
iconv_t conv = iconv_open("UTF-8", "UTF-16");
iconv(conv, (char*)&src, &srclen, &dst, &dstlen);
iconv_close(conv);
Remy Lebeau
  • 505,946
  • 29
  • 409
  • 696
4

I have run into this problem too, I solve it by using boost locale library

try
{           
    std::string utf8 = boost::locale::conv::utf_to_utf<char, short>(
                        (short*)wcontent.c_str(), 
                        (short*)(wcontent.c_str() + wcontent.length()));
    content = boost::locale::conv::from_utf(utf8, "ISO-8859-1");
}
catch (boost::locale::conv::conversion_error e)
{
    std::cout << "Fail to convert from UTF-8 to " << toEncoding << "!" << std::endl;
    break;
}

The boost::locale::conv::utf_to_utf function try to convert from a buffer that encoded by UTF-16LE to UTF-8, The boost::locale::conv::from_utf function try to convert from a buffer that encoded by UTF-8 to ANSI, make sure the encoding is right(Here I use encoding for Latin-1, ISO-8859-1).

Another reminder is, in Linux std::wstring is 4 bytes long, but in Windows std::wstring is 2 bytes long, so you would better not use std::wstring to contain UTF-16LE buffer.

Daniel King
  • 397
  • 3
  • 10
4

If you have MSYS2 installed then the iconv package (which is installed by default) lets you use:

 iconv -f utf-16le -t utf-8 <input.txt >output.txt
M.M
  • 134,614
  • 21
  • 188
  • 335
2

There's also utfcpp, which is a header-only library.

Kevin Smyth
  • 1,832
  • 20
  • 21
1

Another portable C possibility to convert string between UTF-8, UTF-16, UTF-32, wchar - is mdz_unicode library.

maxdz
  • 11
  • 1
  • 2
0

Thanks guys, this is how I managed to solve the 'cross' windows and linux requirement:

  1. Downloaded and installed: MinGW , and MSYS
  2. Downloaded the libiconv source package
  3. Compiled libiconv via MSYS.

That's about it.

n0p
  • 3,377
  • 2
  • 27
  • 47
DooriBar
  • 387
  • 1
  • 3
  • 8