I’ve been reading about character encoding recently, in particular to the various unicode standards. I’ve been rather pissed off with setting up the wrong collation in MySQL, I just realized that at my other blog, I have posts that are in utf8_unicode_ci, latin1_general_ci and utf_general_ci. This is what you get when you migrate database blindly without knowing what is character set. I regret not reading enough. Now I set everything to utf8_general_ci.
Anyway, something about another encoding set - GB2312 - caught my attention.
Here’s a trivia, the older Chinese encoding GB2312 cannot write the former Chinese Premier Zhu Rongji’s name. His name has often appeared as 朱熔基. Zhu disapproves of this and prefers the correct version, 朱镕基. (more…)