I had to play a bit with MySQL, saving and restoring databases with
mysqldump
and mysql
. These two tools have one
severe discrepancy, which I would like to comment here because I think
it is an excellent example of what not to do when designing tools to be
used together.
Save
mysqldump(1)
is a
database backup tool, that takes a MySQL database and prints the
necessary SQL statements to restore it to a new server. It has one interesting option:
--default-character-set=charset_name
Use charset_name as the default character set. See Section 10.5, “Character Set Configuration”. If no character set is specified, mysqldump uses utf8.
As you can see, it is quite modern in its default choice: if you do not specify which encoding you want to encode your database content into, it uses UTF-8. This way, it can save any database content, whatever characters it may contain.
A more sensible choice may have been to use the LC_CTYPE
environment variable.
Restore
mysql(1)
is the command-line MySQL client, which can is used to restore such dumps. It has a similar option:
--default-character-set=charset_name
Use charset_name as the default character set for the client and connection.
A common issue that can occur when the operating system uses utf8 or another multi-byte character set is that output from the mysql client is formatted incorrectly, due to the fact that the MySQL client uses the latin1 character set by default. You can usually fix such issues by using this option to force the client to use the system character set instead.
As you can see, this utility is still has an antique design: if you do not specify a character set, it assumes you are giving it data encoded in latin-1.
Discrepancy
You can see the obvious discrepancy here: mysqldump
encodes your database backups in UTF-8 by
default, and mysql
restores them
as if they were encoded in latin-1. This way, when you have to restore
your databases, if you thought you could do it without wondering,
congratulations: voilà , you have just fucked your databases
up! And as an additional present, chances are high that this mistake is
discreet enoughto be detected only after a while, when modifications
have been done to the restored database and the original backups have
been deleted.
So, if you are dealing with MySQL databases, remember that you must take care of the encoding of your backups, because MySQL utilities will not by itself, or rather it will do just what should not be done. And if you are designing similar tools, please think, and if you want an example of what not to do, here is one!
6 comments
tuesday 17 july 2012 à 07:20 Marcus said : #1
tuesday 17 july 2012 à 10:27 Nicolas said : #2
tuesday 17 july 2012 à 13:36 mirabilos said : #3
tuesday 17 july 2012 à 13:54 Tanguy said : #4
wednesday 18 july 2012 à 14:21 mirabilos said : #5
wednesday 18 july 2012 à 14:58 Tanguy said : #6