November 9th, 2010


kindle 3 and chinese text

A few weeks ago I started a beginner level Chinese (Mandarin) evening class at the local community college. We're about four weeks in and I'm really enjoying it, will definitely do more.

I thought it would be an interesting project to load a Chinese-English dictionary onto my Kindle for reference. I've already played with kindlegen, which takes a collection of HTML files along with some additional metadata and creates a MobiPocket .mobi format file for the Kindle. (I've got several ideas for reference documents that could be loaded onto the Kindle, more on that later.)

Anyway, I've now got a prototype Chinese dictionary loaded onto the Kindle. However, although many of the characters are displayed correctly, quite a few are just empty boxes (in the form of I ⬜ Unicode). I couldn't find any particular pattern to the missing characters, for example 他 (he) displayed correctly but 她 (she) did not.

Some research with Google showed that some people had hacked the Kindle and found a way to load different font files onto the device. I didn't try doing that, becuase I thought there should be a better solution. After all, Chinese text is displayed correctly in the Kindle web browser! Furthermore, if I copy a UTF-8 format file directly to the Kindle for display, it will have the missing character problem. But if I email the file to Amazon where they automatically reformat it to a .azw book file and deliver it to the Kindle, the characters show up correctly. If I convert to a MobiPocket book file myself, missing characters.

After laboriously paging through dozens of threads on (typical internet "forum" software really is awful), I finally found a solution. It uses an undocumented debug feature of the Kindle. Press <Home> <Search> and enter the following commands:

    ;debugOn <Enter>
    ~setLocale zh-CN <Enter>

That's it, that is all that was required. I have no idea what this actually does or how it changes the Kindle's interpretation of UTF-8 documents (UTF-8 is supposed to be an unambiguous encoding of Unicode). But I now have a very basic Chinese-English dictionary in my Kindle.