1

I am trying to set my site's encoding to UTF-8 but I fail. Environment: Ubuntu Desktop 13.04, Apache 2.2, PHP+FPM. Added a .htacces: AddDefaultCharset UTF-8

Meta:

file -bi * output: inode/directory; charset=binary inode/directory; charset=binary text/html; charset=utf-8 inode/directory; charset=binary inode/directory; charset=binary text/x-php; charset=utf-8 inode/directory; charset=binary text/x-php; charset=us-ascii inode/directory; charset=binary text/x-php; charset=us-ascii

(similar in subdirectories)

System encoding is also UTF-8.

PHP header: header("Content-Type:text/html;charset=UTF-8");

The problem itself: When I echo accent letters like this: echo "ÁÁÁ
\n"; it prints out them well. But when I send these letters through a post form, they become these: �

Maybe my browsers encoding messes up everything? I really hope no, because everything is on default. Anyway, w3c validator says it is a valid HTML5 UTF-8 encoded page.

A hope anyone can help me. Thanks in advance.

  • Are you storing this text in a database before it gets corrupted? – Stephen Ostermiller Aug 04 '13 at 23:24
  • Thanks for your answer! Although I use MySQL (also UTF-8), these POST variables are not stored in database, so there is interaction with it. Anyway I am using gedit with UTF-8 encoding (I am not sure) it may happen it uses BOM. I tried to check it but I could not find anything. Now I am trying Kate. – user2302838 Aug 04 '13 at 23:32

1 Answers1

1

From: http://allseeing-i.com/How-to-setup-your-PHP-site-to-use-UTF8

Unicode is not quite a first class citizen in PHP, so you'll have to do some tweaking to get it to grok UTF-8.

Firstly, you need to ensure that you have MBString enabled in your copy of PHP. If you're on Linux and using a packaged PHP, it may be installed by default. If not, it's probably just a case of adding it with:

$ yum install php-mbstring

...

Assuming you have multi-byte support built-in, now you need to make sure PHP knows that you want to handle text as UTF-8 internally. Add the following to an include that gets parsed before anything else, and you should be good to go:

//setup php for working with Unicode data
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');

If you're doing anything with strings other than reading them from a database and outputing them, you'll probably want to read about PHP's multi-byte functions. Basically, many string functions have multi-byte capable alternatives, with the prefix 'mb_'. So, substr() becomes mb_substr()

Stephen Ostermiller
  • 98,758
  • 18
  • 137
  • 361
  • This is a really good advice. Unfortunately it did not help either. Anyway I already have had mbstring enabled, but did not included the code above. Now I fixed it but this time it prints '?' instead of '?' in a black square. It does not mean anything for me. Any ideas? – user2302838 Aug 05 '13 at 15:58
  • I don't know what Kate did (or not) but I installed Notepad++ with wine. I checked the encoding of every files and it says they are UTF-8 without BOM. So there is no problem with the files I suppose. The other thing I realized is not the POST that messes up everything. However I still don't know what it is. The situation:

    WORKING:

    • echo "ÁÁÁÓŰŰÓÓŰ";
    • $a="ÚÚŰŰÓÓÍ";"function p($s){echo $s;} p();

    NOT WORKING:

    • $a="ÚÚŰŰÓÓÍ";"function p($s){echo $s[0];} p();

    Problem: it prints out question marks.

    What the hell?

    – user2302838 Aug 05 '13 at 18:03