21

I'm on MySQL 5.5.21, and trying to insert the '\xF0\x9F\x98\x8A' smiley face character. But for the life of me, I can't figure out how to do it.

According to various forums which I've been reading, it is possible. But whenever I try it, the data just gets truncated.

mysql> INSERT INTO hour  (  `title`,   `content`,   `guid` ,  `published` , `lang` ,  `type` ,  
       `indegree` ,  `lon` ,  `lat` ,  `state` ,  `country` , `hour`  )   
       VALUES ( "title" ,  "content   content" ,  "guid" ,  1,  1,   
                     "WEBLOG",  1,  1,  1,  "state" ,  "country" ,  1 );
Query OK, 1 row affected, 2 warnings (0.00 sec)

mysql> show warnings;
+---------+------+-------------------------------------------------------------------------------+
| Level   | Code | Message                                                                       |
+---------+------+-------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xF0\x9F\x98\x8A  ...' for column 'content' at row 1 |
| Warning | 1265 | Data truncated for column 'published' at row 1                                |
+---------+------+-------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> select LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
|           687302 |
+------------------+
1 row in set (0.00 sec)

mysql> select * from hour where id = 687302;
+--------+-------+----------+------+---------------------+
| id     | title | content  | guid | published           |
+--------+-------+----------+------+---------------------+
| 687302 | title | content  | guid | 0000-00-00 00:00:00 |
+--------+-------+----------+------+---------------------+
1 row in set (0.00 sec)

But my table definition is as follows.

CREATE TABLE `hour` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `title` varchar(255) CHARACTER SET utf8 NOT NULL,
  `content` text CHARACTER SET utf8 NOT NULL,
  `guid` varchar(255) CHARACTER SET utf8 NOT NULL,
  `published` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `lang` tinyint(3) unsigned NOT NULL,
  `type` enum('WEBLOG','MICROBLOG') CHARACTER SET utf8 DEFAULT NULL,
  `indegree` int(4) unsigned NOT NULL,
  `lon` float DEFAULT NULL,
  `lat` float DEFAULT NULL,
  `state` varchar(50) CHARACTER SET utf8 DEFAULT '',
  `country` varchar(50) CHARACTER SET utf8 DEFAULT '',
  `hour` int(2) DEFAULT NULL,
  `gender` enum('MALE','FEMALE') CHARACTER SET utf8 DEFAULT NULL,
  `time_zone` varchar(45) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MEMORY AUTO_INCREMENT=687560 DEFAULT CHARSET=utf8mb4 KEY_BLOCK_SIZE=288 

One can see that I'm using CHARSET=utf8mb4. Surely this corrects issues around the use of multi-byte characters?

Ok, so I didn't notice:

  `content` text CHARACTER SET utf8 NOT NULL,

I've corrected that now, but still get funky results.

CREATE TABLE `hourtmp` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `title` varchar(255) CHARACTER SET utf8 NOT NULL,
  `content` text NOT NULL,
  `guid` varchar(255) CHARACTER SET utf8 NOT NULL,
  `published` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `lang` tinyint(3) unsigned NOT NULL,
  `type` enum('WEBLOG','MICROBLOG') CHARACTER SET utf8 DEFAULT NULL,
  `indegree` int(4) unsigned NOT NULL,
  `lon` float DEFAULT NULL,
  `lat` float DEFAULT NULL,
  `state` varchar(50) CHARACTER SET utf8 DEFAULT '',
  `country` varchar(50) CHARACTER SET utf8 DEFAULT '',
  `hour` int(2) DEFAULT NULL,
  `gender` enum('MALE','FEMALE') CHARACTER SET utf8 DEFAULT NULL,
  `time_zone` varchar(45) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MEMORY AUTO_INCREMENT=687563 DEFAULT CHARSET=utf8mb4 KEY_BLOCK_SIZE=288 |

 mysql> INSERT INTO hourtmp  (  `title`,   `content`,   `guid` ,  `published` , `lang` ,  `type` ,  `indegree` ,  
 `lon` ,  `lat` ,  `state` ,  `country` , `hour`  )   VALUES ( "title" ,  "content   content" ,  
 "guid" ,  1,  1,   "WEBLOG",  1,  1,  1,  "state" ,  "country" ,  1 );
 Query OK, 1 row affected, 2 warnings (0.00 sec)

 mysql> show warnings;

 | Level   | Code | Message                                                                       |

 | Warning | 1366 | Incorrect string value: '\xF0\x9F\x98\x8A  ...' for column 'content' at row 1 |
 | Warning | 1265 | Data truncated for column 'published' at row 1                                |

 2 rows in set (0.00 sec)

 mysql> select * from hourtmp;
 +--------+-------+-----------------------+
 | id     | title | content               |
 +--------+-------+-----------------------+
 | 687560 | title | content ????  content |
 | 687561 | title | content ????  content |
 +--------+-------+-----------------------+
Bryan Hunt
  • 313
  • 1
  • 2
  • 7
  • I'm obliterating all special characters in the application layer now, so it isn't so much of a problem for me. But, I would like to know if it's possible to somehow get the data in and out of MySQL. – Bryan Hunt Jul 11 '12 at 14:14
  • Not a MySQL guy, but can't you specify uft8 for the TEXT field as well – JNK Jul 11 '12 at 20:26
  • have you run set names utf8mb4; from your client before issuing the insert? – atxdba Jul 12 '12 at 03:46
  • JNK,the text field is using the table default, in this case utf8mb4. – Bryan Hunt Jul 12 '12 at 10:01
  • atxdba. Thanks for the suggestion, still comes out as ?, which presumably means corrupted. Damn those emoticons/decepticons ! ;) – Bryan Hunt Jul 12 '12 at 10:02

3 Answers3

29

I’ve recently written a detailed guide on how to switch from MySQL’s utf8 to utf8mb4. If you follow the steps there, everything should work correctly. Here are direct links to each individual step in the process:

I suspect that your problem can be solved by following step 5. Hope this helps!

Mathias Bynens
  • 2,450
  • 1
  • 21
  • 14
  • 1
    Have left that job so unable to test/verify. However I suspect that the collation-server = utf8mb4_unicode_ci setting is what was missing. Good tutorial! – Bryan Hunt Aug 05 '12 at 09:27
  • Very nice Mathias. A reminder to people, your client connection details matter. I'm using the NPM module mysql from Node, and needed to specify charset: 'utf8mb4' in my createConnection() call, otherwise inserting true UTF8 characters still failed with the Incorrect string value error, even after converting the table and column to utf8mb4 character set and collation. I expect your client config level details in Step 5 would have similar effect. – Neek Aug 28 '18 at 00:44
3

Do the Following things:

  1. Set the database charset to utf8mb4

  2. Set the charset of column to utf8mb4

like below query:

ALTER TABLE `comments` CHANGE `text` `text` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL;
András Váczi
  • 31,278
  • 13
  • 101
  • 147
Poonam Gupta
  • 131
  • 1
  • Are these steps really sufficient? The accepted answer has many more. – Colin 't Hart Feb 03 '17 at 12:46
  • Depends on what the problem was, if the problem was on the database side this is enough. But it can very well be a client connection problem too. – spydon Jul 16 '18 at 19:26
0

In Node.js

  1. Setup Mysql connection
const mysql = require('mysql');

const con = mysql.createConnection({ host: "localhost", user: "yourusername", password: "yourpassword", charset : 'utf8mb4' // This mandatory to allow Emoji });

  1. Set charaset of column to uft8mb4
ALTER TABLE your_table MODIFY your_column text
 CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Rohit Gupta
  • 1,626
  • 6
  • 17
  • 19