MySQL utf8 vs. utf8mb4 – What’s the difference? (


MySQL has caused so much pain with their decision to make latin1 the default encoding.I’ve seen so many PHP apps that store UTF-8 encoded text in Latin1 fields, and it kind of works, until you correctly configure your client library, or try sorting the data, or check for equality…

The problem is so common, that the popular MySQL client app Sequel Pro even has an option to set the encoding to “UTF8 via Latin1”, which tells the server to output Latin1 but then treats the result as UTF-8.

So once you learned this the hard way, you’ll change the db encoding to utf8, and you’ll be glad to have finally fixed the encoding mess, only to stumble across the fact from the original article that utf8 is not really utf8 in MySQL — oh the horror!

Use utf8mb4. Use utf8mb4_unicode_520_ci. Alter tables usingALTER TABLE database_name.table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci.

Verify that all of your columns are utf8mb4 with:

SELECT table_schema, table_name, character_set_name, collation_name FROM information_schema.columns WHERE table_schema NOT IN (‘mysql’,’information_schema’,’sys’,’performance_schema’) AND character_set_name is not NULL AND character_set_name != ‘utf8mb4’;

You may have to update your client libraries (like Perl-DBD) if you’re still using something like CentOS 6.

If only they had named them utf8mb3 and utf8. Would have been a lot clearer that one was optimized, but not fully utf8.